Notebook · VOL 04 · p. 01
Field Notes
~ hi, I’m —

Arjun Varma

data scientist, ML engineer, obsessive about the loop.
[ available · jan ’27 ]full-time DS / MLE roles — let’s talk.
[ currently ]
  • Incoming Data Science Intern, Novo Nordisk (Summer ’26)
  • M.S. Data Science, Columbia — TA-ing AI Foundations & Big Data courses
  • Building agents. Shipping Chrome extensions on weekends.
resume·email·linkedin·github
Arjun
Morningside — Spring ’26
[ Obs. 01 ] — the thesis

Three-plus years working with data taught me one thing: the model is easy; the loop is the product. Pipelines, evals, drift, citations — the stuff that isn’t pretty on a slide is what keeps systems alive in production. Now I’m building agents that plan, query, and cite their own homework.

[ Exp. 01 — featured ]

A drop-in data analyst for any database.

Agentic AI · Columbia · 2026

Ask a question in English. A multi-agent system plans the work, writes SQL, validates the result, renders a chart, and narrates the answer with citations back to the source rows. Built on the NYC Airbnb corpus, designed to plug into any warehouse.

fig. A — multi-agent architecture
user question (natural language)message_bus · {plan, sql, df, chart, answer, error}plannerplans stepssqlwrites SQLvalidatoraudits resultchartrenders viznarratornarrates + citeswarehouse · DuckDB · Postgres · Snowflakecited answer + chart + replayable trace
five agents, typed message bus, every step auditable.
tools & contracts
  • db.schema() → Table[]
  • db.query(sql) → DataFrame
  • df.describe(df) → Stats
  • plot.auto(df, intent) → PNG
  • web.search(q) → Link[]
guards & evals
  • → SQL dry-run + row-count sanity
  • → null / type audit pre-plot
  • → self-critique on mismatched intents
  • → retry on tool error (×3, backoff)
  • → golden Q/A regression suite
fig. B — sample trace
↳ try a question
agent.trace · 1/4
Do superhosts get better review scores than other hosts?
plannerneed avg review scores for superhost=true vs false; check stat significance.
sql
SELECT is_superhost, AVG(review_scores_rating) AS avg_rating,
       COUNT(*) AS n FROM listings
WHERE review_scores_rating IS NOT NULL
GROUP BY is_superhost;
dbreturned 2 rows · n(true)=18,402 · n(false)=26,117
validatornull audit OK · sample size healthy · running Welch’s t-test …
sql
-- significance check via python stats
t, p = ttest_ind(super_scores, other_scores,
                  equal_var=False)
chartrendered: bar chart, superhost=4.89 vs other=4.61 (0–5)
narratorYes. Superhosts average 4.89 vs 4.61 (Δ=0.28, p<0.001). Source: listings.review_scores_rating × host.is_superhost.
at a glance
  • five specialized agents on a typed message bus
  • planner decomposes → SQL agent writes → validator audits
  • chart agent renders, narrator cites every number
  • pluggable warehouse adapters: DuckDB · Postgres · Snowflake
  • regression evals + per-query latency / cost breakdown
[ Log — experiments 02—06 ]
8 more in the book
exp. 02
ZS · Oncology analytics · 2025

Biliary Tract Cancer Early Detection

Predictive model over 250M patient-claims identifying BTC patients ~45 days earlier than the standard diagnosis lag. Hybrid clinical rules + K-means/GMM + Transformer NLP clustering on diagnosis narratives. Presented at PMSA 2025.

~45d
earlier than diagnosis lag
250M
patient-claims scored / mo
PMSA ’25
presented · funded across tumors
pysparkxgboostshapnlp clusteringmlflow
exp. 03
Columbia · LLM + retrieval · 2025

Financial RAG Chatbot

RAG chatbot answering company financial questions from SEC filings with line-level citations. Semantic retrieval with ChromaDB + text-embedding-3-large; automatic ticker and period parsing; Claude Opus as eval judge.

fastapichromadblangchainstreamlitgcp
live demogithub →
exp. 04
Series-B Agtech · East Africa · 2025

SunCulture — Farmer Transaction Standardization

RAG-augmented classifier categorizing 7M+ farmer transactions across 500+ categories to drive creditworthiness for microloans. Hybrid rule + LLM pipeline reached 99% accuracy on a 10K holdout and cut manual review by 95%.

99%
accuracy · 10K holdout
−95%
manual-review volume
7M+
txns · 500+ categories
pythonragrest
sunculture.io
exp. 05
Live classroom theme extraction · 2026

ClassPulse

Professors post a question, students answer via QR, and an LLM summarizes responses into 4–6 themed cards in real time. FastAPI + SSE with a 5-model OpenRouter fallback chain. Single service on Railway.

fastapireactsseopenrouter
live demogithub →
exp. 06
Conversational AI · 2025

SeanceAI

Chatbot enabling conversations with 60+ historical figures with era-appropriate knowledge boundaries. Dinner-Party mode for 2–5 figure multi-agent dialogue. Flask + SSE streaming, OpenRouter multi-model with fallbacks.

flaskopenroutersse
live demogithub →
+ More experiments (3) — shipped tools & applied systems
exp. 07
Chrome extension · AI assistant · 2026

Tweet Bot

DOM-injected reply generator for X. Three rhetorical angles per request, image-aware context extraction, voice-learning from user selections, streaming responses.

chrome mv3openrouterclaude opus/sonnet/haiku
github →
exp. 08
Columbia · Domain Q&A · 2026

Citation Format Checker

Narrow-scope chatbot that identifies APA 7 / MLA 9 / Chicago 17 violations with rule-IDs and quoted evidence. Vertex AI (Gemini 2.0 Flash Lite) + FastAPI. Three-method eval suite with 30+ test cases.

vertex aifastapicloud run
live demogithub →
exp. 09
Chrome extension · 2025

Video Speed Controller

Fine-grained playback-rate controller that persists per-site preferences and survives segment changes. Works across YouTube, Netflix, Coursera, generic HTML5.

chrome mv3mutationobserver
live demogithub →
[ Timeline — trajectory ]

career, one margin note at a time

Jun 2026 — Aug 2026
New York

Data Science Intern

@ Novo Nordiskincoming
  • Joining the data-science group for the summer — applied ML on real healthcare/pharma problems.
  • Focus areas: predictive modeling, feature engineering, evaluation — more to come.
Feb 2025 — Jun 2025
Pune

Advanced Data Science Associate Consultant

@ ZS Associates
  • Built an org-wide analytics + ML platform (Spark/SQL, dashboards) unifying 5+ data sources into territory and product KPIs used by 100+ stakeholders supporting a $10B oncology portfolio.
  • Cut weekly reporting time from days to minutes, replacing Excel workflows with automated pipelines and self-serve dashboards.
Jul 2024 — Jan 2025
Pune

Decision Analytics Associate Consultant

@ ZS Associates
  • Led a 5-member team to modernize legacy business rules; saved ~50 hrs/mo and improved first-pass quality to >99%.
  • Built and deployed Positive-Unlabeled learning models to infer missing categorical labels in medical transaction data, lifting customer-journey coverage from ~40% to ~95%.
  • Implemented feature + prediction drift monitoring and CI unit tests for production pipelines, reducing silent failures.
  • Top ~10% in a company-wide hackathon; earned lateral transfer into the Data Science vertical.
Feb 2022 — Jun 2024
Pune

Decision Analytics Associate

@ ZS AssociatesFast-track promotion
  • Engineered PySpark/SQL ETL pipelines across multiple healthcare data sources covering millions of patients for $4B+ oncology drug performance analytics.
  • Defined audit-ready patient cohort inclusion/exclusion logic robust to missing and miscoded fields.
  • Promoted to Associate Consultant in 4 cycles (typical: 5). Expert Associate and Insight Illuminator awards.
[ Schooling — margin notes ]
where I read the fine print
Aug 2025 — Dec 2026 · New York, NY

Columbia University

M.S. Data Science
  • TA: Business Analytics II, Hollywood & Big Data
  • Coursework: Applied ML, Agentic AI for Analytics, Statistical Inference, Probability & Stats
Jul 2018 — May 2022 · Vellore, India

Vellore Institute of Technology

B.Tech — Electronics & Communication Engineering
  • Special Achiever Award · Merit Scholarship
[ Tools on the bench ]
— what I reach for
Languages
  • · Python
  • · SQL
  • · C++
  • · R
ML / DS
  • · PyTorch
  • · scikit-learn
  • · XGBoost
  • · pandas
  • · NumPy
  • · SHAP
  • · MLflow
LLM / Agents
  • · RAG
  • · LangChain
  • · ChromaDB
  • · OpenRouter
  • · Vertex AI
  • · Evals
Data / Cloud
  • · PySpark
  • · Databricks
  • · AWS (S3, EMR, Athena, SageMaker)
  • · GCP Cloud Run
  • · Docker
Workflow
  • · Git
  • · CI/CD
  • · Jupyter
  • · Streamlit
  • · FastAPI
  • · Cursor
  • · Claude Code