Add data science practitioner strategy and brand awareness tutorial by igerber · Pull Request #286 · igerber/diff-diff

igerber · 2026-04-09T13:33:12Z

Summary

Add strategic analysis (docs/business-strategy.md) assessing the opportunity to make diff-diff appealing to data science practitioners — competitive landscape, personas, gap analysis, and phased roadmap
Add parallel B1-B4 roadmap track in ROADMAP.md targeting practitioners in marketing, product, and operations
Deliver B1a: Tutorial 17 — measuring campaign impact on brand awareness with survey data
Tutorial showcases unique SurveyDesign support in a CPG brand tracking scenario with naive-vs-corrected comparison (2.14x SE ratio), brand funnel analysis (awareness/consideration/purchase intent), staggered rollout extension with CallawaySantAnna, HonestDiD sensitivity, and stakeholder communication guidance

Methodology references (required if estimator / math changes)

Method name(s): N/A — no methodology changes
Paper / source link(s): N/A
Any intentional deviations from the source (and why): None

Validation

Tests added/updated: No test changes (tutorial-only PR)
Backtest / simulation / notebook evidence: Tutorial 17 executes end-to-end via jupyter nbconvert --execute with seeded DGPs producing deterministic output

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Strategic analysis of the opportunity to make diff-diff appealing to data science practitioners (marketing, product, operations). Adds a parallel B1-B4 roadmap track targeting this audience, and delivers the first item (B1a): Tutorial 17 — measuring campaign impact on brand awareness with survey data. The tutorial showcases the unique survey design support (SurveyDesign with strata, PSU, FPC, weights) in a CPG brand tracking scenario, with naive-vs-corrected comparison, brand funnel analysis, staggered rollout extension, and stakeholder communication guidance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-09T13:42:57Z

Overall Assessment

⚠️ Needs changes

Executive Summary

P1 methodology issue: the new HonestDiD example fits CallawaySantAnna() with the default base_period="varying" and suppresses the warning that the library emits for this exact case, even though the implementation and registry say HonestDiD on CS results requires base_period="universal" for valid interpretation.
P2 methodology/docs issue: the tutorial story is “8 of 20 DMAs,” but the synthetic DGP is documented as 200 respondents per period, and the notebook renames respondent unit to market_id. That mismatches the unit of analysis for the example.
P2 code-quality issue: the no-matplotlib fallback is broken; it prints trends in the else branch before trends is defined.
P3 methodology/docs issue: the notebook labels the naive-vs-survey SE ratio as a “design effect,” but the existing survey tutorial and registry define DEFF differently.
No security findings. I could not execute the notebook locally because this review environment is missing Python deps (numpy import fails), so notebook validation here is static.

Methodology

Severity: P1. Impact: the HonestDiD section is methodologically invalid as written. The notebook creates cs = CallawaySantAnna() with the default varying base period, suppresses warnings during both the CS fit and compute_honest_did(...), and then presents the sensitivity result as if it were valid. The codebase explicitly warns that HonestDiD on CallawaySantAnna results requires base_period='universal' because varying-base pre-treatment coefficients are consecutive comparisons, not a common-reference event study. The surrounding markdown also says pre-period insignificance is “confirming” parallel trends, which overstates what the pre-period checks establish. docs/tutorials/17_brand_awareness_survey.ipynb:609 docs/tutorials/17_brand_awareness_survey.ipynb:655 docs/tutorials/17_brand_awareness_survey.ipynb:680 diff_diff/staggered.py:163 diff_diff/honest_did.py:661 docs/methodology/REGISTRY.md:1991 Concrete fix: fit the HonestDiD example with CallawaySantAnna(base_period="universal") (or refit a second CS model just for sensitivity analysis), stop blanket-suppressing the HonestDiD warning, and soften the prose to “consistent with” rather than “confirming” parallel trends.
Severity: P2. Impact: the tutorial’s stated design does not match the simulated data structure. The intro says the campaign ran in 8 of 20 DMAs, but the DGP documents n_units as respondents per period and panel=True as the same respondents across periods; the notebook sets n_units=200, renames unit to market_id, and prints counts from that column. That teaches the wrong observational unit for a market-level rollout example. docs/tutorials/17_brand_awareness_survey.ipynb:9 docs/tutorials/17_brand_awareness_survey.ipynb:83 docs/tutorials/17_brand_awareness_survey.ipynb:116 docs/tutorials/17_brand_awareness_survey.ipynb:135 diff_diff/prep_dgp.py:1208 diff_diff/prep_dgp.py:1251 Concrete fix: either rewrite the example as respondent-level survey data and keep respondent IDs/terminology, or build a true DMA-wave panel before using market-level rollout language.
Severity: P3. Impact: the notebook prints Design effect (SE ratio) for results_survey.se / results_naive.se, but the project’s own survey tutorial and registry define DEFF as survey variance divided by SRS variance for the same estimator. This is terminology drift, not a core estimator defect. docs/tutorials/17_brand_awareness_survey.ipynb:318 docs/tutorials/16_survey_did.ipynb:292 docs/methodology/REGISTRY.md:2472 Concrete fix: rename this quantity to SE ratio or SE inflation, and point readers to formal DEFF diagnostics if you want to discuss design effects.

Code Quality

Severity: P2. Impact: the notebook claims plots will be skipped when matplotlib is unavailable, but the first plotting cell calls print(trends.to_string()) in the else branch even though trends is only defined inside the if HAS_MATPLOTLIB branch. That path raises NameError instead of degrading gracefully. docs/tutorials/17_brand_awareness_survey.ipynb:72 docs/tutorials/17_brand_awareness_survey.ipynb:189 Concrete fix: compute trends before the conditional, or recompute it inside the else branch.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new tech-debt-only findings. The blocking methodology issue above is not mitigated by TODO.md.

Security

No findings. I did not see secrets, PII, or suspicious notebook metadata/outputs in the new files.

Documentation/Tests

No additional diff-local findings beyond the issues above. I could not run the notebook in this environment because the workspace Python is missing required packages, so this review is static for notebook behavior.

Path to Approval

Refit the HonestDiD section with CallawaySantAnna(base_period="universal") (or fit a second CS model solely for the sensitivity section) so the example matches the library’s documented methodology.
Remove the blanket warning suppression around compute_honest_did(...) and revise the surrounding markdown so pre-period checks are described as supportive evidence, not confirmation that parallel trends “holds.”

- P1: Use base_period="universal" for CallawaySantAnna in staggered section so HonestDiD sensitivity analysis is methodologically valid - P2: Fix unit of analysis — rename to respondent_id, reframe narrative as respondent-level survey data (not market-level DMAs) - P2: Fix matplotlib fallback — compute trends before conditional - P2: Replace blanket warnings.simplefilter("ignore") with targeted RuntimeWarning filter for survey module matmul artifacts only; UserWarnings and methodology warnings now come through - P3: Rename "Design effect (SE ratio)" to "SE inflation ratio" to avoid terminology drift from formal DEFF definition - Soften parallel trends language from "confirming" to "consistent with" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-09T15:31:40Z

/ai-review

github-actions · 2026-04-09T15:38:24Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5873d1f660a4700888825b57361c3a4e09a6abd1

Overall Assessment

⚠️ Needs changes

Executive Summary

All prior P1+ findings appear resolved. In particular, the HonestDiD example now fits CallawaySantAnna(base_period="universal") before calling compute_honest_did(...), which matches the Methodology Registry note for CS-to-HonestDiD usage.
The earlier no-matplotlib fallback bug and the misleading “design effect” label also appear resolved in the new notebook.
Severity P1 [Newly identified]: the tutorial’s “trustworthiness” section validates a survey-aware DiD result with non-survey-aware diagnostics. The placebo DiD omits survey_design=sd, and check_parallel_trends() is only a simple slope comparison with no survey-design or panel-dependence adjustment.
No security findings.
Static review only: the notebook is valid JSON, but I could not execute it here because the local environment cannot import diff_diff (numpy is missing).

Methodology

Re-review note: the previous HonestDiD/base-period blocker is resolved. The staggered section now uses CallawaySantAnna(base_period="universal") and then compute_honest_did(...), consistent with the registry’s CS/HonestDiD note. References: docs/tutorials/17_brand_awareness_survey.ipynb#L322, docs/tutorials/17_brand_awareness_survey.ipynb#L368, docs/methodology/REGISTRY.md#L1991

Severity: P1 [Newly identified]. Impact: the new Section 7 presents pre-trends and placebo checks as validation for a survey-aware main result, but those diagnostics are not survey-aware. The main estimate correctly uses survey_design=sd; the placebo refit does not, so its SE/p-value/CI revert to the naive inference the notebook just taught users not to trust on these data. Separately, check_parallel_trends() is a simple pooled slope comparison with no survey-design or repeated-respondent adjustment. In a tutorial whose core lesson is that naive inference on survey data yields false precision and false pre-trend signals, this is an unmitigated methodology mismatch that can overstate the strength of the falsification evidence. Concrete fix: pass survey_design=sd in the placebo cell, and either replace the check_parallel_trends() subsection with a survey-aware pre-period diagnostic or explicitly label it as informal/non-survey-aware and remove the later prose that treats it as formal validation. References: docs/tutorials/17_brand_awareness_survey.ipynb#L121, docs/tutorials/17_brand_awareness_survey.ipynb#L241, docs/tutorials/17_brand_awareness_survey.ipynb#L260, docs/tutorials/17_brand_awareness_survey.ipynb#L266, docs/tutorials/17_brand_awareness_survey.ipynb#L386, diff_diff/estimators.py#L147, diff_diff/utils.py#L618, docs/tutorials/16_survey_did.ipynb#L416, docs/tutorials/16_survey_did.ipynb#L1419

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The P1 above is not mitigated by TODO.md tracking. Reference: TODO.md#L51

Security

No findings. I did not see secrets, PII, or committed notebook outputs.

Documentation/Tests

No additional findings beyond the methodology issue above.
Static review only: docs/tutorials/17_brand_awareness_survey.ipynb is valid JSON, but full execution could not be verified because the local environment is missing numpy.

Path to Approval

Update the placebo cell to run DifferenceInDifferences.fit(..., survey_design=sd) so the falsification result uses the same design-based inference the tutorial advocates for the main estimate.
Change the parallel-trends subsection so it no longer presents check_parallel_trends() as formal validation for the survey design. Either replace it with a survey-aware pre-period diagnostic, or clearly label it as informal/non-survey-aware and soften the later summary/stakeholder text that says the result is “supported” by those checks.

- Pass survey_design=sd to placebo DiD fit so falsification uses the same design-based inference as the main estimate - Label check_parallel_trends() as informal/non-survey-aware and direct readers to HonestDiD for formal robustness assessment - Soften diagnostic summary prose: "supportive evidence" not validation - Update tutorial summary to distinguish informal checks from formal sensitivity analysis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-09T16:04:23Z

/ai-review

github-actions · 2026-04-09T16:09:51Z

🔁 AI review rerun (requested by @igerber)

Head SHA: bc7e55712e63323f6bf86b5ab93c9433fabdd1e6

Overall Assessment

✅ Looks good

Executive Summary

Re-review: the prior P1 methodology issues appear resolved. The placebo DiD now uses survey_design=sd, and the staggered example fits CallawaySantAnna(base_period="universal") before calling HonestDiD, which matches the registry guidance for comparable pre-treatment effects and CS→HonestDiD usage. docs/tutorials/17_brand_awareness_survey.ipynb:L243-L261 docs/tutorials/17_brand_awareness_survey.ipynb:L317-L363 docs/methodology/REGISTRY.md:L381-L400 docs/methodology/REGISTRY.md:L1989-L1991
The notebook now explicitly frames check_parallel_trends() as an informal, non-survey-aware sanity check instead of formal validation, which is consistent with the helper’s implementation and resolves the earlier overclaim. docs/tutorials/17_brand_awareness_survey.ipynb:L228-L261 diff_diff/utils.py:L618-L710
No new P0/P1 issues found in estimator usage, weighting, inference, or control-group logic in the changed files.
Severity P3 [Newly identified]: the per-directory tutorial index was not updated, so Tutorial 17 is only partially discoverable.
Static review only: I could validate the notebook JSON and confirm there are no committed outputs, but I could not reproduce notebook execution locally because this environment is missing numpy/pandas.

Methodology

No findings.

Affected methods reviewed: DifferenceInDifferences with SurveyDesign, CallawaySantAnna event-study aggregation, and HonestDiD. The changed notebook now uses survey-aware inference consistently in the placebo refit, clearly discloses that check_parallel_trends() is only an informal slope comparison, and uses base_period="universal" for the CS event-study path before HonestDiD. That is consistent with the in-code helper behavior and the Methodology Registry notes for CS survey support and HonestDiD base-period handling. docs/tutorials/17_brand_awareness_survey.ipynb:L121-L172 docs/tutorials/17_brand_awareness_survey.ipynb:L228-L261 docs/tutorials/17_brand_awareness_survey.ipynb:L317-L363 diff_diff/utils.py:L618-L710 docs/methodology/REGISTRY.md:L429-L439 docs/methodology/REGISTRY.md:L1989-L1991

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

I did not see a new silent-correctness issue that would need TODO-based mitigation, and nothing in the changed files appears to depend on an already-deferred limitation in TODO.md:L51-L75.

Security

No findings.

I did not find secrets or PII in the changed files, and the new notebook metadata is minimal.

Documentation/Tests

Severity: P3 [Newly identified]. Impact: docs/tutorials/README.md was not updated to include the new brand-awareness notebook, so users browsing the tutorials directory directly will not see Tutorial 17 in the local index even though it was added and linked from the top-level README. Concrete fix: add a 17_brand_awareness_survey.ipynb entry to docs/tutorials/README.md and reconcile that local tutorial list while touching the file. References: docs/tutorials/README.md:L1-L73 README.md:L126-L145 docs/tutorials/17_brand_awareness_survey.ipynb:L1-L404

Static review only: notebook execution was not reproduced locally because the environment cannot import numpy/pandas.

igerber added the ready-for-ci Triggers CI test workflows label Apr 9, 2026

igerber merged commit e90908e into main Apr 9, 2026
3 of 4 checks passed

igerber deleted the ds-analysis branch April 9, 2026 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data science practitioner strategy and brand awareness tutorial#286

Add data science practitioner strategy and brand awareness tutorial#286
igerber merged 3 commits intomainfrom
ds-analysis

igerber commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

igerber commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

igerber commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 9, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

igerber commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

igerber commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant