Healthcare data scientist — causal inference, quasi-experimental program evaluation, applied ML. M.S. Data Science at UC San Diego starting Fall 2026; B.S. Mathematics, UC Irvine.
Samir2000VIP@gmail.com · LinkedIn · Irvine, CA
Four years evaluating pharmacist-led clinical programs at a 60,000-patient managed care system:
- COPD program (n = 997, PSM + DiD): $83.50 PMPM cost reduction (p = 0.0027), driven by lower ED (p = 0.002), inpatient (p = 0.04), and readmission (p = 0.002) utilization.
- Post-discharge pharmacist intervention (n = 878, negative binomial): 22% reduction in 30-day readmissions (IRR = 0.78, p = 0.02).
- Heart failure outcomes manuscript under peer review; sole DOHC data analyst (11 clinics).
- ASHP National Conference poster (AFib anticoagulation care-gap analysis).
Open to full-time and contract data scientist roles. Remote-friendly.
CausalCare — Causal inference on ICU mortality (eICU). Five-method stack (PSM, IPW, AIPW, Double ML, Causal Forest) with DoWhy's identify–estimate–refute workflow and placebo / random-common-cause refuters. Method agreement as a robustness check.
GenomicsGPT — Variant interpretation on 1.69M ClinVar variants. XGBoost/LightGBM ensemble, leakage-corrected AUC 0.985, macro-F1 0.948. Feature ablation (consequence + LoF alone: AUC 0.97 vs. 0.78 for gene-only) rules out gene-name memorization. SHAP per-variant audit; Llama 3 / Claude narrative engine for ACMG/AMP reports.
ClinicalRAG — RAG over 220 clinical documents treating retrieval and refusal as first-class metrics: 97.6% condition recall, 85.7% citation rate, 95.2% abstention accuracy.
Diabetic Retinopathy — Custom CNN for 5-class DR grading, weighted F1 = 0.94, outperforming ResNet-50 and VGG-16 on the same split. Grad-CAM confirmed attention to clinically meaningful pathology. Paper.
REIGN — Cross-era NBA impact models over 29,969 player-seasons with era-specific z-score normalization.
Python, SQL, R. Causal inference (PSM, DiD, IPW, AIPW, Double ML, Causal Forest), GLMs including negative binomial, survival analysis. ML: XGBoost, LightGBM, scikit-learn, TensorFlow/Keras, SHAP. LLM/NLP: RAG, LangChain, ChromaDB, HuggingFace Transformers. Healthcare: EHR, claims, pharmacy, ICD-10, HCC, PMPM. Delivery: Power BI, FastAPI, Git.
2500+ rated chess · basketball · piano