About

I will be starting my Master's in CSE at the University of Michigan, Ann Arbor.

My research interests revolve around Multimodal AI and natural language supervision in vision tasks.

I am currently working as a Research Intern at Stanford's Personalized and Translational Neuroscience Lab (PanLab) on multimodal AI in neuroimaging. I am also interning at DREAM:Lab, IISc Bangalore, researching on topics related to deep learning in edge accelerators.

I have previously interned at Samsung, Upthrust and AarogyaAI on research and data science projects.

Publications

ViDAS: Vision-based Danger Assessment and Scoring

Pranav Gupta, Advith Krishnan, Naman Nanda, Ananth Eswar, Deeksha Agarwal, Pratham Gohil, Pratyush Goel

ICVGIP 2024

[ACM DL]

ECHO: Environmental Sound Classification With Hierarchical Ontology-Guided Semi-Supervised Learning

Pranav Gupta, Raunak Sharma, Rashmi Kumari, Sri Krishna Aditya, Shwetank Choudhary, Sumit Kumar, Kanchana M, R Thilagavathy

IEEE CONECCT 2024

[IEEE Xplore]

ISAApp – Image Based Smart Attendance Application

Aritra Dutta, G Suseela, G Niranjana, Pushpita Boral, Pranav Gupta, Subha Bal Pal

AAIMB 2023

[Springer Link] [code]

Managing Congregations of People by Predicting Likelihood of a Person being Infected by a Contagious Disease like the COVID Virus

Pranav Gupta, Manish Gupta

IEEE CCEM 2020

[IEEE Xplore] [code]

Organizations

Odyssey Lab

Co-Founder

Conducting research with 15 students and mentors in India from IISc, IIT, IBM, IIIT-Hyderabad

[Website Link]

Next Tech Lab

Syndicate - Head AI Researcher

Guiding a large team of researchers, overseeing a diverse portfolio of over 40 projects in machine learning and other areas, and successfully organizing and facilitating more than 20 events and workshops for groups exceeding 50 students, have been key accomplishments.

[Website Link]

Paper Implementations

CLIP-ViL-GradCam

A PyTorch implementation of CLIP-ViL from the paper "How Much Can CLIP Benefit Vision-and-Language Tasks?" from the authors Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer

SimCLR-UrbanSound8K

A PyTorch implementation of SimCLR from the paper "A Simple Framework for Contrastive Learning of Visual Representations" from the authors Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton on the UrbanSound8K dataset. Trained only on the first fold, due to inadequate computation power and produced an accuracy of 81% on melspectrogram images.

CLIP

A PyTorch implementation of CLIP from the paper "Learning Transferable Visual Models From Natural Language Supervision" from the authors Alec Radford et al. Implemented the main architecture of the model and trying to extend this architecture for VQA tasks.

MusicLM/AudioLDM

A PyTorch implementation of MusicLM and AudioLDM from the paper "AudioLDM: Text-to-Audio Generation with Latent Diffusion Models" from the authors Liu et al on the MusicCaps dataset. Faced some errors to train the MusicLM model and currently trying to resolve them. Trained the AudioLDM model by finetuning it from the huggingface library on the melspectrogram images of the MusicCaps dataset.

Siamese Network with Triplet Loss

A Tensorflow implementation of Siamese Network architecture with Triplet Loss from the paper "FaceNet: A Unified Embedding for Face Recognition and Clustering" from the authors Florian Schroff, Dmitry Kalenichenko, James Philbin. Trained the model on the Ship Classification dataset which I scraped from Ship Spotting to make an Indian Ships Dataset uploaded on Kaggle.

AI Algorithms

Implementations of some basic AI algorithms like Gradient Descent and K-means with real-time visualization from scratch using NumPy and Matplotlib.

About

Publications

Organizations

Paper Implementations

CLIP-ViL-GradCam

SimCLR-UrbanSound8K

CLIP

MusicLM/AudioLDM

Siamese Network with Triplet Loss

AI Algorithms

Side Projects

AI Wordle Solver

Search Browser History GPT

Splitwise GPT Vision

Genetic Handwritten Digits

Face Recognition LFW