Skip to content
View skerk001's full-sized avatar
😃
😃

Block or report skerk001

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
skerk001/README.md

Samir Kerkar — Data Scientist

Healthcare data scientist — causal inference, quasi-experimental program evaluation, applied ML. M.S. Data Science at UC San Diego starting Fall 2026; B.S. Mathematics, UC Irvine.

Samir2000VIP@gmail.com · LinkedIn · Irvine, CA

Four years evaluating pharmacist-led clinical programs at a 60,000-patient managed care system:

  • COPD program (n = 997, PSM + DiD): $83.50 PMPM cost reduction (p = 0.0027), driven by lower ED (p = 0.002), inpatient (p = 0.04), and readmission (p = 0.002) utilization.
  • Post-discharge pharmacist intervention (n = 878, negative binomial): 22% reduction in 30-day readmissions (IRR = 0.78, p = 0.02).
  • Heart failure outcomes manuscript under peer review; sole DOHC data analyst (11 clinics).
  • ASHP National Conference poster (AFib anticoagulation care-gap analysis).

Open to full-time and contract data scientist roles. Remote-friendly.

Projects

CausalCare — Causal inference on ICU mortality (eICU). Five-method stack (PSM, IPW, AIPW, Double ML, Causal Forest) with DoWhy's identify–estimate–refute workflow and placebo / random-common-cause refuters. Method agreement as a robustness check.

GenomicsGPT — Variant interpretation on 1.69M ClinVar variants. XGBoost/LightGBM ensemble, leakage-corrected AUC 0.985, macro-F1 0.948. Feature ablation (consequence + LoF alone: AUC 0.97 vs. 0.78 for gene-only) rules out gene-name memorization. SHAP per-variant audit; Llama 3 / Claude narrative engine for ACMG/AMP reports.

ClinicalRAG — RAG over 220 clinical documents treating retrieval and refusal as first-class metrics: 97.6% condition recall, 85.7% citation rate, 95.2% abstention accuracy.

Diabetic Retinopathy — Custom CNN for 5-class DR grading, weighted F1 = 0.94, outperforming ResNet-50 and VGG-16 on the same split. Grad-CAM confirmed attention to clinically meaningful pathology. Paper.

REIGN — Cross-era NBA impact models over 29,969 player-seasons with era-specific z-score normalization.

Stack

Python, SQL, R. Causal inference (PSM, DiD, IPW, AIPW, Double ML, Causal Forest), GLMs including negative binomial, survival analysis. ML: XGBoost, LightGBM, scikit-learn, TensorFlow/Keras, SHAP. LLM/NLP: RAG, LangChain, ChromaDB, HuggingFace Transformers. Healthcare: EHR, claims, pharmacy, ICD-10, HCC, PMPM. Delivery: Power BI, FastAPI, Git.

2500+ rated chess · basketball · piano

Pinned Loading

  1. diabetic-retinopathy-classification diabetic-retinopathy-classification Public

    CNN-based 5-class diabetic retinopathy severity classification from retinal fundus images (F1 = 0.94)

  2. gene-cancer-prediction gene-cancer-prediction Public

    ML classification of AML vs. ALL leukemia subtypes from gene expression data (F1 = 0.95)

    Jupyter Notebook

  3. clinical-rag clinical-rag Public

    RAG system for clinical question answering over 220 discharge summaries with hallucination guardrails, citation tracking, and chunking strategy evaluation (97.6% condition recall)

    Python

  4. genomicsgpt genomicsgpt Public

    ML + LLM pipeline for genetic variant pathogenicity prediction (AUC 0.9949, 1.69M ClinVar variants) with SHAP explainability and clinical report generation via Llama 3 / Claude

    Jupyter Notebook

  5. CausalCare CausalCare Public

    Causal inference analysis of ICU beta-blocker treatment effects using propensity matching, IPW, doubly robust estimation, Double ML, and Causal Forest on eICU data

    Python

  6. reign-web reign-web Public

    NBA player impact analytics across 80 years. Era-specific composite models, playoff opponent adjustments, and interactive visualizations for 3,484 players (1946–2025).

    JavaScript