AI Research Intern
Hanwha Vision America
Santa Clara, CA
Hanwha Vision America
Santa Clara, CA
University of Texas at Arlington
Arlington, TX
TwinBit
Dhaka, Bangladesh
Ph.D. in Computer Science
University of Texas at Arlington
Bachelor of Science in Computer Science and Engineering
Shahjala University of Science and Technology
Reconstruction of 3D open surfaces (e.g., non-watertight meshes) is an underexplored area of computer vision. Recent learning-based implicit techniques have removed previous barriers by enabling reconstruction in arbitrary resolutions. Yet, such approaches often rely on distinguishing between the inside and outside of a surface in order to extract a zero level set when reconstructing the target. In the case of open surfaces, this distinction often leads to artifacts such as the artificial closing of surface gaps. However, real-world data may contain intricate details defined by salient surface gaps. Implicit functions that regress an unsigned distance field have shown promise in reconstructing such open surfaces. Nonetheless, current unsigned implicit methods rely on a discretized representation of the raw data. This not only bounds the learning process to the representation’s resolution, but it also introduces outliers in the reconstruction. To enable accurate reconstruction of open surfaces without introducing outliers, we propose a learning-based implicit point-voxel model (IPVNet). IPVNet predicts the unsigned distance between a surface and a query point in 3D space by leveraging both raw point cloud data and its discretized voxel counterpart. Experiments on synthetic and real-world public datasets demonstrates that IPVNet outperforms the state of the art while producing far fewer outliers in the resulting reconstruction.
Accurate reconstruction of both the geometric and topological details of a 3D object from a single 2D image embodies a fundamental challenge in computer vision. Existing explicit/implicit solutions to this problem struggle to recover self-occluded geometry and/or faithfully reconstruct topological shape structures. To resolve this dilemma, we introduce LIST, a novel neural architecture that leverages local and global image features to accurately reconstruct the geometric and topological structure of a 3D object from a single image. We utilize global 2D features to predict a coarse shape of the target object and then use it as a base for higher-resolution reconstruction. By leveraging both local 2D features from the image and 3D features from the coarse prediction, we can predict the signed distance between an arbitrary point and the target surface via an implicit predictor with great accuracy. Furthermore, our model does not require camera estimation or pixel alignment. It provides an uninfluenced reconstruction from the input-view direction. Through qualitative and quantitative analysis, we show the superiority of our model in reconstructing 3D objects from both synthetic and real-world images against the state of the art.
Real-world 3D data may contain intricate details defined by salient surface gaps. Automated reconstruction of these open surfaces (e.g., non-watertight meshes) is a challenging problem for environment systhesis in mixed reality applications. Current learning-based implicit techniques can achieve high fidelity on closed-surface reconstruction. However, their dependence on the distinction between the inside and outside of a surface makes them incapable of reconstructing open surfaces. Recently, a new class of implicit functions have shown promise in reconstructing open surfaces by regressing an unsigned distance field. Yet, these methods rely on a discretized representation of the raw data, which loses important surface details and can lead to outliers in the reconstruction. We propose IPVNet, a learning-based implicit model that predicts the unsigned distance between a surface and a query point in 3D space by leveraging both raw point cloud data and its discretized voxel counterpart. Experiments on synthetic and real-world public datasets demonstrates that IPVNet outperforms the state of the art while producing far fewer outliers in the reconstruction.
In this paper, we introduce a novel conditional generative adversarial network that creates dense 3D point clouds, with color, for assorted classes of objects in an unsupervised manner. To overcome the difficulty of capturing intricate details at high resolutions, we propose a point transformer that progressively grows the network through the use of graph convolutions. The network is composed of a leaf output layer and an initial set of branches. Every training iteration evolves a point vector into a point cloud of increasing resolution. After a fixed number of iterations, the number of branches is increased by replicating the last branch. Experimental results show that our network is capable of learning and mimicking a 3D data distribution, and produces colored point clouds with fine details at multiple resolutions.
This paper presents a new algorithm to identify Bengali Sign Language (BdSL) for recognizing 46 hand gestures, including 9 gestures for 11 vowels, 28 gestures for 39 consonants, and 9 gestures for 9 numerals according to the similarity of pronunciation. The image was first re-sized and then converted to a binary format to crop the region of interest by using only top-most, left-most, and right-most white pixels. The positions of the finger-tips were found by applying a fingertip finder algorithm. Eleven features were extracted from each image to train a multi-layered feed-forward neural network with a back-propagation training algorithm. The distance between the centroid of the hand region and each fingertip was calculated along with the angles between each fingertip and horizontal x-axis crossed the centroid. A database of 2300 images of Bengali signs was constructed to evaluate the effectiveness of the proposed system, where 70%, 15%, and 15% images were used for training, testing, and validating, respectively. The experimental results showed an average of 88.69% accuracy in recognizing BdSL which is very much promising compare to other existing methods.
Trained two agent to play Tic-Tac-Toe using reinforcement learning.
Assisted Dr. Bob Weems with grading assignments and quizzes. This course presents an overview of classic approaches to algorithm design - decomposition, dynamic programming, and greedy method, understanding of particular algorithms and data structures that have wide applicability. It also included basic algorithm analysis concepts by applying math skills to worst-case and expected time using recurrences and asymptotic notation and improved programming skills - especially data structures, recursion and graphs.
Assisted Dr. Chance Eary. This course includes multithreading, distributed systems, device drivers, object oriented operating systems, advanced file systems, parallel virtual machines, and load balancing. Examples from current popular modern systems and research operating systems are analyzed too.
Assisted Dr. Chance Eary. This course presents an overview of applications of mobile systems in health, entertainment, security, and other areas.
This course introduces students to computers, to the algorithmic process and to programming using basic control and data structures.
Assisted Dr. Ramez Elmasri. This course presents history of Programming Languages, overview of the scripting/mixed language: Python, functional programming languages paradigm: Haskell, overview of the logic programming language: Prolog, overview of syntax and semantics of programming languages.
Assisted several Professor with grading assignments and quizzes. This course presents Programming concepts beyond basic control and data structures. Emphasis is given to data structures including linked-lists and trees as well as modular design consistent with software engineering principles.
701, S Nedderman Dr
Arlington, TX 76019.
mohammadsamiul.arshad [at] mavs.uta.edu
samiularshad [at] gmail.com