Research

Advancing the frontiers of Computer Vision and Deep Learning

Research Focus

My research focuses on developing robust and efficient deep learning models for computer vision applications. I am particularly interested in self-supervised learning approaches that can learn meaningful representations from unlabeled data, multimodal learning that combines visual and textual information, and adversarial robustness to create more reliable AI systems.

Working under the guidance of Prof. Arijit Sur and Dr. Pinaki Mitra at the MultiMedia Lab, IIT Guwahati, I explore novel approaches to address fundamental challenges in machine learning and computer vision.

Research Areas

Computer Vision
Developing algorithms for image analysis, object detection, visual understanding, and scene interpretation. Focus on robust and efficient vision systems that can work in real-world scenarios.
Object Detection Image Analysis Visual Understanding Scene Recognition
Deep Learning
Advancing neural network architectures, optimization techniques, and representation learning methods. Developing efficient and robust deep learning models for various applications.
Neural Networks Representation Learning Model Optimization Architecture Design
Multimodal Learning
Developing models that can understand and process information from multiple modalities such as vision and language. Focus on cross-modal understanding and joint representation learning.
Vision-Language Cross-modal Learning Multimodal Fusion Joint Embeddings
Self-Supervised Learning
Exploring methods to learn meaningful representations without labeled data. Investigating contrastive and predictive approaches for learning from unlabeled datasets.
Contrastive Learning Unsupervised Learning Representation Learning Pretext Tasks

Research Tools & Technologies

Deep Learning Frameworks

PyTorch TensorFlow Keras JAX

Programming Languages

Python C/C++ MATLAB R

Computer Vision Libraries

OpenCV PIL scikit-image Albumentations

Data Analysis & Visualization

NumPy Pandas Matplotlib Seaborn Plotly

High-Performance Computing

CUDA Docker Slurm Git

Experiment Management

Weights & Biases MLflow TensorBoard Hydra

Future Research Directions

Emerging Research Areas

Multimodal Deep Learning

Advancing multimodal learning techniques that effectively combine and understand information from multiple modalities including vision, language, and audio.

Agentic AI

Developing autonomous AI agents capable of reasoning, planning, and taking actions in complex environments with minimal human intervention.

RAG (Retrieval-Augmented Generation)

Exploring retrieval-augmented generation systems that combine knowledge retrieval with generative models for more accurate and contextual responses.

VQA (Visual Question Answering)

Advancing visual question answering systems that can understand and reason about visual content to provide accurate answers to natural language questions.

Zero-shot Learning

Developing models that can recognize and understand new concepts without explicit training examples, leveraging semantic knowledge and transfer learning.

Discuss Research Collaboration
×

Citation