Research

Advancing the frontiers of Computer Vision and Deep Learning

Research Focus

My research focuses on developing robust and efficient deep learning models for computer vision applications. I am particularly interested in self-supervised learning approaches that can learn meaningful representations from unlabeled data, multimodal learning that combines visual and textual information, and adversarial robustness to create more reliable AI systems.

Working under the guidance of Prof. Arijit Sur and Dr. Pinaki Mitra at the MultiMedia Lab, IIT Guwahati, I explore novel approaches to address fundamental challenges in machine learning and computer vision.

Research Areas

Computer Vision
Developing algorithms for image analysis, object detection, visual understanding, and scene interpretation. Focus on robust and efficient vision systems that can work in real-world scenarios.
Object Detection Image Analysis Visual Understanding Scene Recognition
Deep Learning
Advancing neural network architectures, optimization techniques, and representation learning methods. Developing efficient and robust deep learning models for various applications.
Neural Networks Representation Learning Model Optimization Architecture Design
Multimodal Learning
Developing models that can understand and process information from multiple modalities such as vision and language. Focus on cross-modal understanding and joint representation learning.
Vision-Language Cross-modal Learning Multimodal Fusion Joint Embeddings
Self-Supervised Learning
Exploring methods to learn meaningful representations without labeled data. Investigating contrastive and predictive approaches for learning from unlabeled datasets.
Contrastive Learning Unsupervised Learning Representation Learning Pretext Tasks

Current Research Projects

C-LEAD: Contrastive Learning for Enhanced Adversarial Defense
Preprint
Under Review
This work introduces a novel contrastive learning framework for enhancing adversarial robustness in deep neural networks. The approach leverages contrastive learning principles to learn robust feature representations that are less susceptible to adversarial perturbations while maintaining competitive performance on clean data.

Key Contributions:

  • Novel contrastive learning framework for adversarial defense
  • Theoretical analysis of robustness properties
  • Comprehensive evaluation on multiple benchmark datasets
  • Improved trade-off between clean accuracy and robustness
Multi-source Transfer Learning with Self-Supervised Learning
In Progress
Investigating novel approaches to combine multiple source domains for transfer learning using self-supervised learning techniques. The goal is to develop methods that can effectively leverage diverse source domains to improve performance on target domains with limited labeled data.

Research Objectives:

  • Develop multi-source domain adaptation algorithms
  • Integrate self-supervised learning for better representations
  • Handle domain shift and distribution mismatch
  • Evaluate on computer vision benchmarks

Research Methodology

Problem Identification

Systematic literature review and gap analysis to identify challenging problems in computer vision and machine learning

Algorithm Development

Design and implementation of novel algorithms with theoretical foundations and practical considerations

Experimental Validation

Comprehensive experiments on standard benchmarks with rigorous statistical analysis and comparison

Publication & Sharing

Dissemination of research findings through peer-reviewed publications and open-source implementations

Research Tools & Technologies

Deep Learning Frameworks

PyTorch TensorFlow Keras JAX

Programming Languages

Python C/C++ MATLAB R

Computer Vision Libraries

OpenCV PIL scikit-image Albumentations

Data Analysis & Visualization

NumPy Pandas Matplotlib Seaborn Plotly

High-Performance Computing

CUDA Docker Slurm Git

Experiment Management

Weights & Biases MLflow TensorBoard Hydra

Future Research Directions

Emerging Research Areas

Multimodal Deep Learning

Advancing multimodal learning techniques that effectively combine and understand information from multiple modalities including vision, language, and audio.

Agentic AI

Developing autonomous AI agents capable of reasoning, planning, and taking actions in complex environments with minimal human intervention.

RAG (Retrieval-Augmented Generation)

Exploring retrieval-augmented generation systems that combine knowledge retrieval with generative models for more accurate and contextual responses.

VQA (Visual Question Answering)

Advancing visual question answering systems that can understand and reason about visual content to provide accurate answers to natural language questions.

Zero-shot Learning

Developing models that can recognize and understand new concepts without explicit training examples, leveraging semantic knowledge and transfer learning.

Discuss Research Collaboration
×

Citation