Soumyajit
Chakraborty
About
I've always been the kind of person who enjoyed getting stuck on problems—whether it was math puzzles as a kid or models that refuse to behave now.
I started in mechanical engineering, but somewhere between systems thinking and discovering transformers, I realised I was more interested in how machines learn than how they move.
Since then, I've been exploring deep learning, building things that sometimes work, often break, and always teach me something. Still figuring things out, still asking questions.
When I'm not looking at a screen, I love to go trekking in the Himalayas and also in my free time, I love to click the moments of nature. I am a football player too.
Research Interests
Passions
Experience
Research Intern
IIM Udaipur
- 01Led end-to-end data analysis on an e-commerce dataset (n = 3,600+ sessions) using Python
- 02Identified drivers of customer conversion via correlation analysis and logistic regression
- 03Performed time-based analysis to identify peak engagement and correlate with conversion rates
- 04Synthesized academic literature on online customer sentiment to contextualize findings
Skills
LANGUAGES
Python • SQL • Bash
ML/DL
PyTorch • TensorFlow/Keras • scikit-learn
DATA
pandas • NumPy • matplotlib • seaborn • plotly
DEEP LEARNING
ANN • CNN • LSTM • BiLSTM • Transformers
COMPUTER VISION
YOLOv8 • OpenCV • ViT
NLP
HuggingFace • spaCy • LangChain
TOOLS
Git • FastAPI • Pinecone • Jupyter • Colab
MATH
Linear Algebra • Probability • Statistics • Optimization
Portfolio
Featured Projects
A showcase of my explorations in Computer Vision, NLP, and Predictive Analytics.
Developed a real-time system using YOLOv8, achieving 91.8% mAP@50. Engineered end-to-end data pipelines and optimized model performance through comprehensive EDA.
- /91.8% mAP@50 precision on custom dataset
- /Automated Pascal VOC to YOLO format conversion
- /Real-time visualization with OpenCV and Matplotlib
Engineered a RAG system for medical QA using FastAPI and LangChain. Integrated Llama-3.3-70B with Pinecone for accurate semantic search and source attribution.
- /Groq's Llama-3.3-70B + Gemini Embeddings
- /Context-only responses to prevent hallucinations
- /Asynchronous FastAPI with modular architecture
End-to-end NLP pipeline analyzing narrative patterns across 220+ episodes. Built character network generators and zero-shot theme classifiers.
- /Zero-shot classification with BART-large-MNLI
- /Co-occurrence graphs using NetworkX and PyVis
- /Fine-tuned DistilBERT for technique analysis
Built an LSTM neural network in PyTorch to forecast stock prices on 14+ years of data. Focused on time series feature engineering and normalization.
- /3,600+ daily records processed
- /Engineered volatility and moving average indicators
- /Interactive comparative visualizations
Implemented multiple CNN architectures for real-time emotion detection. Used transfer learning with VGG16 and ResNet50V2 to handle imbalanced data.
- /Custom CNNs and Transfer Learning
- /Applied Early Stopping and LR Scheduling
- /Gradio interface for real-time testing
Learning Path
Documented Learnings
"Education is not the learning of facts, but the training of the mind to think."