Available for Summer 2026

Arjun Varma

Data Scientist & ML Engineer

MS Data Science @ Columbia University

Arjun Varma

3+

Years Experience

Get to know me

About Me

Passionate about transforming data into actionable insights

My Story

Currently pursuing my Master's in Data Science at Columbia University, with 3+ years of prior experience at ZS Associates building ML platforms and analytics solutions for Fortune 500 healthcare clients. I also TA at Columbia Business School, teaching AI foundations and data-driven decision-making.

My coursework spans Applied Machine Learning, Agentic AI for Analytics, Statistical Inference and Modeling, and Probability and Statistics. I'm drawn to the areas where engineering meets real-world problem solving.

I'm looking for roles where I can make a genuine impact, environments that challenge me, push my thinking, and let me build things that matter.

Quick Facts

3+Years of Industry Experience
1000+Users Impacted

Education

Columbia University logo

Columbia University

New York, NY

Master of Science in Data Science

Aug 2025 - Dec 2026

  • Coursework: Applied Machine Learning, Agentic AI for Analytics, Statistical Inference and Modeling, Probability and Statistics
  • Teaching Assistant, Columbia Business School: Business Analytics II (Foundations of AI) and Hollywood and Big Data
Vellore Institute of Technology logo

Vellore Institute of Technology

Vellore, India

B.Tech in Electronics & Communication Engineering

Jul 2018 - May 2022

  • Special Achiever Award | Merit Scholarship
Career Journey

Work Experience

3+ years of building data-driven solutions at scale

Advanced Data Science Associate Consultant

ZS Associates

Pune, IndiaFeb 2025 - Jun 2025

  • Built and deployed an org-wide analytics and ML platform (Spark/SQL, dashboards) that unified 5+ data sources into territory and product KPIs used by 100+ stakeholders supporting a $10B oncology portfolio
  • Cut weekly reporting time from days to minutes, replacing Excel workflows with automated pipelines and self-serve dashboards

Decision Analytics Associate Consultant

ZS Associates

Pune, IndiaJul 2024 - Jan 2025

  • Led a 5-member team to modernize legacy business rules; saved ~50 hrs/mo and improved first-pass quality to >99%
  • Built and deployed Positive-Unlabeled learning models to infer missing categorical labels in medical transaction data, increasing customer-journey analytics coverage from ~40% to ~95% with consistent performance across tumor types and territories
  • Implemented drift monitoring (feature + prediction drift) and CI unit tests for production pipelines, reducing silent failures
  • Placed in the top ~10% in a company-wide hackathon and earned selection for a lateral transfer into the Data Science vertical

Decision Analytics Associate

Fast Track

ZS Associates

Pune, IndiaFeb 2022 - Jun 2024

  • Engineered PySpark/SQL ETL pipelines integrating multiple healthcare data sources covering millions of patients for $4B+ oncology drug performance analytics
  • Defined patient cohort inclusion and exclusion logic robust to missing and miscoded fields, enabling audit-ready reporting
  • Delivered ad hoc analyses identifying care gaps and market opportunities to inform brand strategy across multiple launches
  • Promoted to Associate Consultant in 4 cycles (typical: 5) and received Expert Associate and Insight Illuminator awards
My Work

Featured Projects

From ML models predicting cancer to LLM-powered chatbots

Featured Case Study

Biliary Tract Cancer (BTC) Early Detection

Predictive Analytics & NLP

ZS AssociatesJan 2025 - May 2025

Developed an early detection model across 250M patient claims, enabling ~45-day earlier identification compared to standard diagnosis lag. Engineered a hybrid feature pipeline combining clinical risk factors, K-means and GMM segmentation, and Transformer-based NLP clustering on diagnosis narratives.

  • 250M patient claims analyzed
  • Hybrid clinical + NLP feature pipeline
  • Presented at PMSA 2025; adopted for territory planning
Pythonscikit-learnPySparkK-meansGMMNLP Clustering

Financial RAG Chatbot

LLM & Information Retrieval

Columbia UniversityNov 2025 - Dec 2025

Built an LLM-powered RAG chatbot answering company financial questions from SEC filings with line-level citations. Implemented semantic retrieval with ChromaDB and text-embedding-3-large plus automatic ticker and period parsing.

  • Line-level source citations
  • Claude Opus evaluation pipeline
  • Live demo on Streamlit Cloud
PythonFastAPIChromaDBStreamlitGCP
Live Demo

SeanceAI

Conversational AI & Multi-Model LLM

Personal Project2025

Built an AI chatbot enabling conversations with 60+ historical figures using multi-model LLM support and streaming responses. Implemented era-appropriate prompt engineering and "Dinner Party" mode for multi-figure conversations; deployed on Railway.

  • 60+ historical figures
  • Multi-model LLM support & streaming
  • Deployed on Railway
PythonFlaskOpenRouter API
Live Demo

Agricultural Product Standardization and Risk Detection

RAG and Classification System

SunCulture (Internship/Co-op)SunCulture (Internship/Co-op)Aug 2025 - Oct 2025

SunCulture (Internship/Co-op)

Built a RAG-augmented classification system at SunCulture (Series B Agtech) categorizing 7M+ farmer transactions across 500+ product categories to support creditworthiness assessment for microloans in East Africa. Achieved 99% accuracy on a 10,000-item holdout set using hybrid rule-based and LLM-assisted classification, reducing manual review volume by 95% and accelerating loan decisioning.

  • 7M+ farmer transactions classified
  • 99% accuracy on 10K holdout set
  • 95% reduction in manual review
PythonRAGREST API

Video Speed Controller

Chrome Extension

Personal Project2025

Built a Chrome extension for fine-grained video playback speed control across all websites. Features persistent speed memory, keyboard shortcuts, and works with YouTube, Netflix, Udemy, and more.

  • Works on all major platforms
  • 0.1x to 16x speed range
  • Persistent speed memory
JavaScriptChrome APIsHTML/CSSMutationObserver
Live Demo

Tweet Bot

AI Chrome Extension

Personal Project2026

AI-powered Chrome extension that generates tweet replies, quote tweets, and threads using Claude via OpenRouter. Features tone control, image understanding, voice learning that adapts to your style, and real-time streaming responses.

  • 3 distinct suggestions with rhetorical strategy tags
  • Voice learning adapts to your style
  • Multi-model support (Opus, Sonnet, Haiku)
JavaScriptChrome APIsCSSOpenRouter API

Citation Format Checker

Domain Q&A Chatbot

Columbia University2026

Academic citation format checker chatbot supporting APA 7th, MLA 9th, and Chicago 17th editions. Powered by Vertex AI (Gemini 2.0 Flash Lite) and FastAPI, it identifies specific formatting violations with rule IDs and quoted evidence. Deployed on GCP Cloud Run.

  • Supports APA 7th, MLA 9th, Chicago 17th
  • Rule-ID based violation detection
  • 30+ eval test cases across 3 methods
PythonFastAPIVertex AIGoogle Cloud RunDocker
Live Demo

ClassPulse

Live Classroom Theme Extraction

Personal Project2026

Built a real-time classroom feedback tool where professors post a question, students submit answers via QR code or link, and an LLM automatically summarizes responses into 4-6 themed cards with student attribution. Uses FastAPI with SSE for live updates and OpenRouter with a 5-model fallback chain; deployed on Railway as a single service.

  • Real-time theme extraction every 10 seconds
  • 5-model LLM fallback chain via OpenRouter
  • Single service: FastAPI + React on Railway
PythonFastAPIReactTypeScriptOpenRouter APIRailway
Live Demo
Tech Stack

Technical Skills

Technologies and tools I use to bring ideas to life

Programming

Core languages I work with daily

Python
SQL
C++
R

Analytics & ML

ML frameworks and data tools

PyTorch
Scikit-learn
Pandas
NumPy

Big Data & MLOps

Scalable infrastructure tools

PySpark
Databricks
MLflow
AWS

Tools & Platforms

Development environment

Git
Jupyter
Streamlit
Docker

Also experienced with

ClassificationRegressionNLPClusteringModel EvaluationLLMs/RAGPrompt EngineeringETL/ELTSHAPmatplotlibBeautifulSoupS3EMRAthenaSageMakerLinuxCI/CDJiraConfluenceClaude CodeCursor IDE
Let's Connect

Get in Touch

Interested in collaborating or have a question? Feel free to reach out!

Contact Information

Email

av3342@columbia.edu

Phone

(347) 987 9427

Location

New York, NY

Connect with me

Download Resume

Send a Message