About

I am a Junior at SRM Institute of Science and Technology, Chennai, India.
My research interests revolve around Multimodal AI and natural language supervision in vision tasks.
I am currently working as a Research Intern at Stanford's Personalized and Translational Neuroscience Lab (PanLab) on multimodal AI in neuroimaging. I am also interning at DREAM:Lab, IISc Bangalore, researching on topics related to deep learning in edge accelerators.
I have previously interned at Samsung, Upthrust and AarogyaAI on research and data science projects.

Publications

ViDAS: Vision-based Danger Assessment and Scoring
Pranav Gupta, Advith Krishnan, Naman Nanda, Ananth Eswar, Deeksha Agarwal, Pratham Gohil, Pratyush Goel
ICVGIP 2024
ECHO: Environmental Sound Classification With Hierarchical Ontology-Guided Semi-Supervised Learning
Pranav Gupta, Raunak Sharma, Rashmi Kumari, Sri Krishna Aditya, Shwetank Choudhary, Sumit Kumar, Kanchana M, R Thilagavathy
IEEE CONECCT 2024
ISAApp – Image Based Smart Attendance Application
Aritra Dutta, G Suseela, G Niranjana, Pushpita Boral, Pranav Gupta, Subha Bal Pal
AAIMB 2023
Managing Congregations of People by Predicting Likelihood of a Person being Infected by a Contagious Disease like the COVID Virus
Pranav Gupta, Manish Gupta
IEEE CCEM 2020

Organizations

Odyssey Lab
Co-Founder
Conducting research with 15 students and mentors in India from IISc, IIT, IBM, IIIT-Hyderabad
Next Tech Lab
Syndicate - Head AI Researcher
Guiding a large team of researchers, overseeing a diverse portfolio of over 40 projects in machine learning and other areas, and successfully organizing and facilitating more than 20 events and workshops for groups exceeding 50 students, have been key accomplishments.

Paper Implementations

CLIP-ViL-GradCam

A PyTorch implementation of CLIP-ViL from the paper "How Much Can CLIP Benefit Vision-and-Language Tasks?" from the authors Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer

SimCLR-UrbanSound8K

A PyTorch implementation of SimCLR from the paper "A Simple Framework for Contrastive Learning of Visual Representations" from the authors Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton on the UrbanSound8K dataset. Trained only on the first fold, due to inadequate computation power and produced an accuracy of 81% on melspectrogram images.

CLIP

A PyTorch implementation of CLIP from the paper "Learning Transferable Visual Models From Natural Language Supervision" from the authors Alec Radford et al. Implemented the main architecture of the model and trying to extend this architecture for VQA tasks.

MusicLM/AudioLDM

A PyTorch implementation of MusicLM and AudioLDM from the paper "AudioLDM: Text-to-Audio Generation with Latent Diffusion Models" from the authors Liu et al on the MusicCaps dataset. Faced some errors to train the MusicLM model and currently trying to resolve them. Trained the AudioLDM model by finetuning it from the huggingface library on the melspectrogram images of the MusicCaps dataset.

Siamese Network with Triplet Loss

A Tensorflow implementation of Siamese Network architecture with Triplet Loss from the paper "FaceNet: A Unified Embedding for Face Recognition and Clustering" from the authors Florian Schroff, Dmitry Kalenichenko, James Philbin. Trained the model on the Ship Classification dataset which I scraped from Ship Spotting to make an Indian Ships Dataset uploaded on Kaggle.

AI Algorithms

Implementations of some basic AI algorithms like Gradient Descent and K-means with real-time visualization from scratch using NumPy and Matplotlib.

Side Projects

AI Wordle Solver

Predict the next best word to play on Wordle by giving a screenshot of a partially-filled Wordle.
[Medium Article]

Search Browser History GPT

Query the content of your search browser history to navigate to the desired webpage.

Splitwise GPT Vision

Give an image of a bill and automatically add SplitWise entries directly into the app.

Genetic Handwritten Digits

Use genetic algorithms to evolve the CNN architecture, convolution kernels size and pooling to classify handwritten digits.
[Medium Article]

Face Recognition LFW

Used FaceNet embeddings to train SVMs using one-vs-all and one-vs-one approach to classify the LFW dataset over 86 faces.
[Medium Article]