From 28973e1a91c0258f6a646c66ad7003fad0dbf4b8 Mon Sep 17 00:00:00 2001 From: igerber Date: Thu, 9 Apr 2026 09:32:49 -0400 Subject: [PATCH 1/3] Add data science practitioner strategy and brand awareness tutorial MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Strategic analysis of the opportunity to make diff-diff appealing to data science practitioners (marketing, product, operations). Adds a parallel B1-B4 roadmap track targeting this audience, and delivers the first item (B1a): Tutorial 17 — measuring campaign impact on brand awareness with survey data. The tutorial showcases the unique survey design support (SurveyDesign with strata, PSU, FPC, weights) in a CPG brand tracking scenario, with naive-vs-corrected comparison, brand funnel analysis, staggered rollout extension, and stakeholder communication guidance. Co-Authored-By: Claude Opus 4.6 (1M context) --- .gitignore | 1 + README.md | 1 + ROADMAP.md | 57 +- docs/business-strategy.md | 420 +++++++++ .../tutorials/17_brand_awareness_survey.ipynb | 799 ++++++++++++++++++ 5 files changed, 1276 insertions(+), 2 deletions(-) create mode 100644 docs/business-strategy.md create mode 100644 docs/tutorials/17_brand_awareness_survey.ipynb diff --git a/.gitignore b/.gitignore index f0efe4c5..27fc846c 100644 --- a/.gitignore +++ b/.gitignore @@ -95,3 +95,4 @@ analysis/ # Replication data (local only, not for distribution) replication_data/ +_scratch/ diff --git a/README.md b/README.md index 63793bd3..1bc95209 100644 --- a/README.md +++ b/README.md @@ -142,6 +142,7 @@ We provide Jupyter notebook tutorials in `docs/tutorials/`: | `13_stacked_did.ipynb` | Stacked DiD (Wing et al. 2024), Q-weights, sub-experiment inspection, trimming, clean control definitions | | `15_efficient_did.ipynb` | Efficient DiD (Chen et al. 2025), optimal weighting, PT-All vs PT-Post, efficiency gains, bootstrap inference | | `16_survey_did.ipynb` | Survey-aware DiD with complex sampling designs (strata, PSU, FPC, weights), replicate weights, subpopulation analysis, DEFF diagnostics | +| `17_brand_awareness_survey.ipynb` | Measuring campaign impact on brand awareness with survey data — naive vs. survey-corrected comparison, brand funnel analysis, staggered rollouts, stakeholder communication | ## Data Preparation diff --git a/ROADMAP.md b/ROADMAP.md index 39f14e3a..5f9626a3 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -61,7 +61,58 @@ credibility. See [survey-roadmap.md](docs/survey-roadmap.md) for detailed specs. | **10d.** Tutorial: flat-weight vs design-based comparison | HIGH | ✅ Shipped (v2.9.1) | | **10e.** Position paper / arXiv preprint | MEDIUM | Not started — depends on 10b | | **10f.** WooldridgeDiD survey support (OLS + logit + Poisson) | MEDIUM | ✅ Shipped (v2.9.0) | -| **10g.** Practitioner guidance: when does survey design matter? | LOW | Not started | +| **10g.** Practitioner guidance: when does survey design matter? | LOW | Subsumed by B1d | + +--- + +## Data Science Practitioners (Phases B1–B4) + +Parallel track targeting data science practitioners — marketing, product, operations — who need DiD for real-world problems but are underserved by the current academic framing. See [business-strategy.md](docs/business-strategy.md) for competitive analysis, personas, and full rationale. + +### Phase B1: Foundation (Docs & Positioning) + +*Goal: Make diff-diff discoverable and approachable for data science practitioners. Zero code changes.* + +| Item | Priority | Status | +|------|----------|--------| +| **B1a.** Brand Awareness Survey DiD tutorial — lead use case showcasing unique survey support | HIGH | Done (Tutorial 17) | +| **B1b.** README "For Data Scientists" section alongside "For Academics" and "For AI Agents" | HIGH | Not started | +| **B1c.** Practitioner decision tree — "which method should I use?" framed for business contexts | HIGH | Not started | +| **B1d.** "Getting Started" guide for practitioners with business ↔ academic terminology bridge | MEDIUM | Not started | + +### Phase B2: Practitioner Content + +*Goal: End-to-end tutorials for each persona. Ship incrementally, each as its own PR.* + +| Item | Priority | Status | +|------|----------|--------| +| **B2a.** Marketing Campaign Lift tutorial (CallawaySantAnna, staggered geo rollout) | HIGH | Not started | +| **B2b.** Geo-Experiment tutorial (SyntheticDiD, comparison with GeoLift/CausalImpact) | HIGH | Not started | +| **B2c.** diff-diff vs GeoLift vs CausalImpact comparison page | MEDIUM | Not started | +| **B2d.** Product Launch Regional Rollout tutorial (staggered estimators) | MEDIUM | Not started | +| **B2e.** Pricing/Promotion Impact tutorial (ContinuousDiD, dose-response) | MEDIUM | Not started | +| **B2f.** Loyalty Program Evaluation tutorial (TripleDifference) | LOW | Not started | + +### Phase B3: Convenience Layer + +*Goal: Reduce time-to-insight and enable stakeholder communication. Core stays numpy/pandas/scipy only.* + +| Item | Priority | Status | +|------|----------|--------| +| **B3a.** `BusinessReport` class — plain-English summaries, markdown export; rich export via optional `[reporting]` extra | HIGH | Not started | +| **B3b.** `DiagnosticReport` — unified diagnostic runner with plain-English interpretation. Includes making `practitioner_next_steps()` context-aware (substitute actual column names from fitted results into code snippets instead of generic placeholders). | HIGH | Not started | +| **B3c.** Practitioner data generator wrappers (thin wrappers around existing generators with business-friendly names) | MEDIUM | Not started | +| **B3d.** `survey_aggregate()` helper (see [Survey Aggregation Helper](#future-survey-aggregation-helper)) | MEDIUM | Not started | + +### Phase B4: Platform (Longer-term) + +*Goal: Integrate into data science practitioner workflows.* + +| Item | Priority | Status | +|------|----------|--------| +| **B4a.** Integration guides (Databricks, Jupyter dashboards, survey platforms) | MEDIUM | Not started | +| **B4b.** Export templates (PowerPoint via optional extra, Confluence/Notion markdown, HTML widget) | MEDIUM | Not started | +| **B4c.** AI agent integration — position B3a/B3b as tools for AI agents assisting practitioners | LOW | Not started | --- @@ -69,13 +120,15 @@ credibility. See [survey-roadmap.md](docs/survey-roadmap.md) for detailed specs. **`survey_aggregate()` helper function** for the microdata-to-panel workflow. Bridges individual-level survey data (BRFSS, ACS, CPS) collected as repeated cross-sections to geographic-level (state, city) panel DiD. Computes design-based cell means and precision weights that estimators can consume directly. +Also cross-referenced as **B3d** — directly enables the practitioner survey tutorial workflow beyond the original academic framing. + --- ## Future Estimators ### de Chaisemartin-D'Haultfouille Estimator -Handles treatment that switches on and off (reversible treatments), unlike most other methods. +Handles treatment that switches on and off (reversible treatments), unlike most other methods. Reversible treatments are common in marketing (seasonal campaigns, promotions), giving this estimator higher priority for data science practitioners. - Allows units to move into and out of treatment - Time-varying, heterogeneous treatment effects diff --git a/docs/business-strategy.md b/docs/business-strategy.md new file mode 100644 index 00000000..cd548f73 --- /dev/null +++ b/docs/business-strategy.md @@ -0,0 +1,420 @@ +# Strategic Analysis: diff-diff for Business Data Science + +*April 2026* + +## Context + +diff-diff is the most comprehensive Difference-in-Differences library in Python — 16 estimators, unique survey design support, HonestDiD sensitivity analysis, and a practitioner workflow. But its entire framing speaks to academic econometricians. There's a large, underserved market of business data scientists who need DiD for real-world problems (campaign measurement, product launches, pricing changes) but are currently using fragmented tools or manual approaches. This analysis assesses the opportunity, competitive positioning, and what we need to do. + +--- + +## 1. The Market Opportunity + +### The Direction +- The causal inference market is growing rapidly — analyst estimates vary widely by scope, but all directionally agree on strong double-digit growth +- Enterprise adoption is accelerating: Microsoft (DoWhy, EconML), Meta (GeoLift, Robyn), Google (Meridian, CausalImpact), Uber (CausalML) have all invested heavily in open-source causal inference tooling in the past 2 years +- Privacy changes (cookie deprecation, tracking restrictions) are forcing marketing teams toward causal measurement methods — away from tracking-based attribution + +### The Shift Happening Now +Marketing measurement is undergoing a structural shift. The old model (track users, attribute conversions, optimize) is breaking due to privacy regulation and platform restrictions. The new model requires **causal inference**: geo-experiments, DiD, synthetic control, and MMM. This shift is why Google built Meridian (Jan 2025), Meta built GeoLift and Robyn, and Uber invested in CausalML. + +Companies actively using causal inference in production: Uber, DoorDash, Airbnb, Netflix, Meta, Spotify, Booking.com, Mercado Libre (20+ geo-experiments), among many others. + +### Why This Matters for diff-diff +The demand for DiD in business is real and growing. But the supply side in Python is fragmented and academic. No one owns "DiD for business data scientists" in Python. This is our lane. + +--- + +## 2. Our Current Position + +### What We've Built (Strengths) + +| Capability | Competitive Position | +|---|---| +| 16 estimators (CS, SA, BJS, ETWFE, SDiD, TROP, etc.) | **Unmatched** — nearest competitor has 3-4 | +| Survey design support (strata, PSU, FPC, replicate weights) | **Unique in Python** — no competitor offers this | +| HonestDiD sensitivity analysis | **Unique in Python** — critical for credibility | +| Baker et al. (2025) practitioner workflow | **Unique** — no other library embeds methodological guardrails | +| Power analysis & pre-trends power | **Unique** — essential for study design | +| Bacon decomposition, parallel trends tests, placebo tests | **Most complete** diagnostic suite | +| Rust backend (5-50x speedup) | **Unique** performance advantage | +| 16 tutorials, real datasets, rich visualization | Strong, but academic framing | + +### Who We Serve Today +Applied econometricians and academic researchers who: +- Know what "ATT(g,t)" means +- Read Callaway & Sant'Anna (2021) and Rambachan & Roth (2023) +- Work in R-like workflows with Python +- Need publication-ready statistical output + +### What's Missing for Business +Our technical foundation is strong. The gap is not in methodology — it's in **packaging, language, workflows, and examples**. A marketing data scientist looking at our README sees Card-Krueger minimum wage studies and "forbidden comparisons." They need to see campaign lift measurement and "is this result trustworthy?" + +--- + +## 3. Target Personas + +### Persona A: Brand & Market Research +**Role**: Marketing analytics lead at CPG, retail, or agency +**Problem**: "We ran an awareness campaign in 5 markets. Did it actually move consideration?" +**Current tools**: Qualtrics/Dynata for survey data, Excel/manual for analysis, no formal causal framework +**What they need**: Survey data → DiD → stakeholder report. Plain-English validity assessment. Design effect handling. +**Our advantage**: Survey design support is unique and directly relevant. No competitor can do design-based variance with modern DiD estimators. + +### Persona B: Growth & Performance Marketing +**Role**: Marketing data scientist at tech company or e-commerce +**Problem**: "We launched a campaign in some geos. What was the incremental lift?" +**Current tools**: GeoLift (synthetic control only), CausalImpact (time-series only), manual DiD in pandas +**What they need**: Geo-experiment → DiD with staggered rollout → confidence intervals → ROI calculation +**Our advantage**: Staggered estimators handle the reality that campaigns roll out in waves, not all at once. GeoLift can't do this. + +### Persona C: Product & Operations DS +**Role**: Data scientist at tech/SaaS company +**Problem**: "We rolled out a new feature/pricing/process in some regions. What was the impact?" +**Current tools**: A/B testing platforms (Optimizely, Statsig), manual DiD when randomization isn't feasible +**What they need**: Quick setup → estimation → diagnostics → presentation to PM/VP +**Our advantage**: Comprehensive estimator suite handles any design pattern. Sensitivity analysis answers "how robust is this?" + +### Common Needs Across All Personas +1. **Business language**: "lift", "incremental impact", "confidence level" — not "ATT", "parallel trends assumption" +2. **Speed to insight**: Minutes from data to answer, not hours learning methodology +3. **Stakeholder communication**: Output a VP can read, not a statistics table +4. **Validity without PhD**: "Is this analysis trustworthy?" answered in plain English +5. **Real business examples**: Campaigns, launches, pricing — not minimum wage studies + +--- + +## 4. Competitive Landscape + +### Direct Competitors (Python DiD) + +| Package | Estimators | Business-Ready? | Weakness vs Us | +|---|---|---|---| +| **pyfixest** | TWFE, SA, did2s | No — academic framing | No CS, no HonestDiD, no survey. Has wildboottest integration for bootstrap inference. | +| **differences** | CS | No — maintenance issues | Removed TWFE + plotting in v0.2.0, limited scope | +| **CausalPy** | Basic DiD, SC | Partially — Bayesian framing | No staggered, no sensitivity, no survey | +| **linearmodels** | PanelOLS (manual) | No — building block | Requires manual DiD implementation | +| **statsmodels OLS** | Manual 2x2 | "Good enough" for many | Many business DS do DiD manually with OLS + interaction terms. No diagnostics, no staggered, no sensitivity — but low friction. | + +**The bilingual R/Python angle**: Many data science teams have R capability. A business DS who needs serious DiD might reach for R's `did` package rather than learn a Python library. Our pitch must be stronger than "it's in Python" — it needs to be "it's better than chaining 3 R packages together, and it has survey support no R package matches." + +**Assessment**: No Python DiD library targets business users. All are academic-oriented. We're the most complete, but we're also academic-oriented. The real competitor for many business DS is not another library — it's manual OLS in statsmodels or switching to R. + +### Adjacent Platforms (Causal ML) + +| Platform | Focus | DiD Support | Business Positioning | +|---|---|---|---| +| **DoWhy** (Microsoft) | DAG-based causal inference | Minimal — no modern DiD | Strong — "democratizing causal inference" | +| **CausalML** (Uber) | Uplift/CATE | None | Strong — "personalization + targeting" | +| **EconML** (Microsoft) | HTE estimation | None | Strong — "causal ML for decisions" | + +**Assessment**: These platforms are well-positioned for business but don't do DiD. They're not competitors for our core use case — they're adjacent. We could potentially integrate rather than compete. + +### Narrow Tools (Marketing-Specific) + +| Tool | Method | Scope | +|---|---|---| +| **GeoLift** (Meta) | Synthetic control | Geo-experiments only, no staggered, no panel | +| **CausalImpact** (Google) | Bayesian structural time-series | Single intervention, time-series only | +| **Robyn** (Meta) | MMM (ridge regression) | Marketing mix, not DiD | +| **Meridian** (Google) | Bayesian MMM | Marketing mix, not DiD | + +**Assessment**: These are point solutions for specific marketing problems. We're broader and more rigorous, but they own the "marketing causal inference" mindshare. We need to explicitly show how diff-diff handles the same problems — and more. + +### Competitive Positioning Map (DiD-Specific) + +``` + Academic ←————————————→ Business + | | + Narrow (1-2 methods) | pyfixest | GeoLift (synth control only) + | differences | CausalImpact (time-series only) + | linearmodels | statsmodels OLS (manual) + | | + | | + Broad DiD suite | diff-diff ←(today) | ← (opportunity) + | | +``` + +Note: DoWhy and CausalML are broad causal inference platforms but don't specialize in DiD — they occupy a different map entirely. The open quadrant is specifically **broad DiD + business framing** in Python. This is a real but narrower opportunity than a generic "causal AI" framing would suggest. + +--- + +## 5. Gap Analysis + +### Gap 1: Language & Terminology +**Current**: "ATT", "parallel trends assumption", "forbidden comparisons", "no-anticipation" +**Business needs**: "lift", "incremental impact", "is this result valid?", "how confident are we?" +**Impact**: Business DS bounces off the README. The methodology is powerful but the words are foreign. + +### Gap 2: Examples & Use Cases +**Current**: Card-Krueger (1994), Castle Doctrine, unilateral divorce laws +**Business needs**: Marketing campaign lift, product launch rollout, pricing experiment, brand tracking survey, loyalty program evaluation +**Impact**: No "I see myself in this" moment. Business DS can't map their problem to our examples. + +### Gap 3: Stakeholder Communication +**Current**: Statistical tables with t-stats, p-values, significance stars +**Business needs**: "The campaign increased awareness by 4.2 percentage points (95% CI: 1.8 to 6.6). This result is robust to violations of the parallel trends assumption up to 1.5x the pre-treatment variation." +**Impact**: Results can't be dropped into a deck or email to leadership without manual translation. + +### Gap 4: Automated Validity Assessment +**Current**: 8-step Baker et al. workflow requiring statistical knowledge at each step +**Business needs**: "Run diagnostics → get a traffic-light assessment (green/yellow/red) with plain-English explanation" +**Impact**: Diagnostics are skipped because they're hard to interpret, producing less credible analyses. + +### Gap 5: Business Workflow Integration +**Current**: Standalone analysis, academic notebook style +**Business needs**: Integration with common data patterns — survey exports from Qualtrics, geo-level marketing data, event logs from experimentation platforms +**Impact**: Significant data wrangling before analysis can begin. No guidance on common transformations. + +### Gap 6: Decision-Oriented Output +**Current**: Estimate → inference → done +**Business needs**: Estimate → "what does this mean?" → "what should we do?" → "how confident should we be?" +**Impact**: Analysis produces a number but not a decision recommendation. + +--- + +## 6. Strategic Recommendations + +### Tier 1: Reframe & Reach (Documentation + Positioning) +*Effort: Low-Medium. Impact: High. No code changes required.* + +**1a. Business-oriented "Getting Started" guide** +A new entry point alongside the academic quickstart. Frame DiD in business terms: +- "Measuring the impact of interventions when A/B tests aren't possible" +- "Did the campaign/launch/change actually work?" +- Walk through a business scenario end-to-end +- Use business terminology with parenthetical academic equivalents: "lift (average treatment effect on the treated)" + +**1b. Terminology bridge** +A reference mapping business ↔ academic language: +| Business Term | Statistical Term | +|---|---| +| Lift / incremental impact | ATT (Average Treatment Effect on the Treated) | +| Test vs. control markets | Treated vs. untreated units | +| Pre-campaign / post-campaign | Pre-treatment / post-treatment | +| "Would the trend have continued?" | Parallel trends assumption | +| Confidence level | Confidence interval | +| "How robust is this?" | Sensitivity analysis | +| Staggered rollout | Staggered adoption | +| Campaign intensity / dose | Continuous treatment | + +**1c. README positioning update** +Add a "For Data Scientists" section alongside "For Academics" and "For AI Agents". Highlight business use cases, survey support, and the automated workflow. + +**1d. Comparison with business tools** +New docs page: "diff-diff vs GeoLift vs CausalImpact" — showing how we handle the same problems (and more) with greater rigor and flexibility. + +### Tier 2: Business Tutorials (Content) +*Effort: Medium. Impact: High.* + +Six new tutorial notebooks, each telling a complete business story: + +**2a. Marketing Campaign Lift Measurement** +Scenario: E-commerce company runs brand campaign in 8 of 20 DMAs. Measures sales lift. +Estimator: CallawaySantAnna (staggered rollout across DMAs) +Unique value: Shows why GeoLift's synthetic control is insufficient for staggered launches. + +**2b. Brand Awareness Survey DiD** *(primary use case)* +Scenario: CPG company runs awareness campaign. Surveys track aided awareness, consideration, purchase intent in test vs. control markets before and after. +Estimator: DifferenceInDifferences + SurveyDesign (strata, PSU, weights) +Unique value: Full survey methodology — design effects, replicate weights, subpopulation analysis. No other Python tool can do this. + +**2c. Product Launch Regional Rollout** +Scenario: SaaS company rolls out new pricing in waves across regions. Measures revenue impact. +Estimator: CallawaySantAnna or EfficientDiD (staggered by region) +Unique value: Handles the reality that launches aren't simultaneous. + +**2d. Pricing/Promotion Impact** +Scenario: Retailer changes pricing in some stores. Measures unit sales and revenue. +Estimator: ContinuousDiD (varying discount levels as dose) +Unique value: Dose-response curves for different discount levels. + +**2e. Loyalty Program Evaluation** +Scenario: Company launches loyalty program in some markets. Measures retention and LTV. +Estimator: TripleDifference (market × eligible × post) +Unique value: DDD handles the fact that only eligible customers can enroll. + +**2f. Geo-Experiment with Few Markets** +Scenario: Brand runs campaign in 3 test markets with 15 control markets. +Estimator: SyntheticDiD (few treated units) +Unique value: Direct comparison with GeoLift/CausalImpact, showing when each is appropriate. + +### Tier 3: Convenience Layer (API Additions) +*Effort: Medium-High. Impact: High for adoption.* + +**3a. `BusinessReport` class** +Generates stakeholder-ready output from any results object. Uses only existing dependencies (numpy/pandas/scipy for computation, string formatting for output). Rich export formats (PowerPoint, HTML) would be optional extras via `pip install diff-diff[reporting]` to preserve the core dependency policy. + +```python +from diff_diff import BusinessReport + +report = BusinessReport(results) +report.summary() +# "The campaign increased awareness by 4.2 pp (95% CI: 1.8-6.6, p=0.003). +# This is statistically significant at the 99% level. +# Robustness: The result holds under parallel trends violations up to 1.5x +# the observed pre-period variation." + +report.export_markdown() # Always available -- plain text/markdown for Notion/Confluence/email +report.export_slide() # Requires diff-diff[reporting] extra (python-pptx) +``` + +**3b. `DiagnosticReport` -- automated validity assessment** +Wraps existing diagnostic functions into a unified runner with plain-English interpretation. The check battery maps to existing capabilities: +- Parallel trends -> `check_parallel_trends()` existing function +- Sensitivity -> `HonestDiD` with default M grid +- Placebo -> `run_all_placebo_tests()` existing function +- Effect stability -> coefficient of variation across cohort effects + +Traffic-light thresholds (green/yellow/red) are a design decision that needs careful thought -- naive thresholds risk false confidence. The initial version should present results descriptively with plain-English interpretation rather than hard pass/fail gates. Example: + +```python +from diff_diff import DiagnosticReport + +diag = DiagnosticReport(results) +diag.run_all() +# Parallel trends: No significant pre-trends detected (joint p=0.42) +# Sensitivity: ATT sign stable through M=1.5; CI includes zero at M=2.0 +# Placebo: Pre-period placebo ATT = 0.003 (p=0.91), consistent with no effect +# Cohort heterogeneity: ATT ranges from 2.1 to 5.8 across cohorts (CV=0.38) +# +# Interpretation: Results appear credible. The main caveat is moderate +# heterogeneity across cohorts -- consider reporting group-specific effects. +``` + +**3c. Business data generators** +These would be thin wrappers around existing generators (`generate_did_data`, `generate_staggered_data`, `generate_survey_did_data`) with business-friendly parameter names and defaults -- not new DGPs. The value is discoverability and narrative framing, not new statistical machinery. + +```python +from diff_diff import generate_campaign_data # wraps generate_staggered_data + +data = generate_campaign_data( + n_markets=20, n_treated_markets=8, n_months=12, + lift=0.05, noise=0.02 +) +# Returns DataFrame with columns: market, month, sales, campaign_active, campaign_start_month +# (vs. unit, time, outcome, treatment, first_treat) +``` + +**3d. Deferred: `QuickDiD` simplified entry point** +Originally proposed as an auto-selecting estimator, but auto-selection risks encouraging methodologically unsound analysis -- the very problem Baker et al. (2025) warns against. Defer this until the business tutorial content validates whether users actually need it, or whether good documentation + `practitioner_next_steps()` is sufficient guidance. + +### Tier 4: Ecosystem & Integration (Longer-term) +*Effort: High. Impact: Medium-High (broadens reach).* + +**4a. Integration guides** +- "Using diff-diff with Databricks/Spark" -- handling large datasets +- "diff-diff in Jupyter dashboards" -- interactive analysis templates +- "Connecting survey platforms (Qualtrics, SurveyMonkey) to diff-diff" -- data pipeline guides + +**4b. Decision framework documentation** +"Which method should I use?" framed for business contexts: +- "I ran a campaign in some markets" -> CallawaySantAnna +- "I have only 3 test markets" -> SyntheticDiD +- "Campaign rolled out at different times" -> Staggered estimators +- "I varied the spending level" -> ContinuousDiD +- "I have survey data with complex sampling" -> Any estimator + SurveyDesign +Not the academic flowchart -- a business decision tree. + +**4c. Presentation/export templates** +- PowerPoint slide generator from results +- Markdown report for Notion/Confluence +- HTML dashboard widget + +--- + +## 7. Interaction with Existing Roadmap + +The project has an existing ROADMAP.md covering Phase 10 (survey academic credibility), future estimators, and research directions. This strategy supplements rather than replaces it: + +**Directly subsumed items:** +- **10g. "Practitioner guidance: when does survey design matter?"** -- this becomes part of the business tutorials and Getting Started guide. No longer a standalone item. +- **survey_aggregate() helper** -- the microdata-to-panel workflow helper is directly relevant for Persona A (survey data from BRFSS/ACS -> geographic panel). Should be prioritized alongside business tutorials. + +**Reprioritized by business use cases:** +- **de Chaisemartin-D'Haultfouille (reversible treatments)** -- marketing interventions frequently switch on/off (seasonal campaigns, promotions). This estimator becomes higher priority for business DS than for academics. Should move up in the roadmap. +- **10e. Position paper / arXiv preprint** -- still valuable for academic credibility but not on the critical path for business DS adoption. + +**Unchanged:** +- Future estimators (Local Projections DiD, Causal Duration, etc.) and long-term research directions remain academic-oriented and unaffected by this strategy. + +--- + +## 8. Prioritized Roadmap + +### Phase 1: Foundation +*Goal: Make diff-diff discoverable and approachable for business DS* + +1. Business "Getting Started" guide (1a) +2. Terminology bridge as supplement within business docs, not standalone (1b) +3. README "For Data Scientists" section (1c) +4. Business decision tree -- "which method should I use?" (4b) +5. Brand Awareness Survey DiD tutorial -- the lead use case (2b) + +**Why start here**: Zero code changes. Maximum positioning impact. The survey tutorial showcases our unique capability (survey design support) in the context that matters most to the user. + +**Validation gate before Phase 2**: After Phase 1 ships, look for adoption signals -- tutorial page views, GitHub issues from business users, PyPI download trajectory. These signals determine how aggressively to invest in Phases 2-3. + +### Phase 2: Business Content +*Goal: Provide end-to-end examples for each major persona* + +Tutorials in priority order (ship incrementally, not all at once): + +6. Marketing Campaign Lift tutorial (2a) -- **highest priority after survey** +7. Geo-Experiment tutorial (2f) -- captures GeoLift/CausalImpact search traffic +8. Comparison page: diff-diff vs GeoLift vs CausalImpact (1d) +9. Product Launch Rollout tutorial (2c) +10. Pricing/Promotion Impact tutorial (2d) +11. Loyalty Program tutorial using DDD (2e) + +### Phase 3: Convenience Layer +*Goal: Reduce time-to-insight and enable stakeholder communication* + +12. `BusinessReport` class (3a) -- core uses only numpy/pandas/scipy; rich export via optional `[reporting]` extra +13. `DiagnosticReport` descriptive assessment (3b) +14. Business data generator wrappers (3c) +15. `survey_aggregate()` helper from existing roadmap -- directly enables the survey tutorial workflow + +### Phase 4: Platform (Longer-term) +*Goal: Integrate into business DS workflows* + +16. Integration guides (4a) +17. Export templates (4c) +18. AI agent integration -- position DiagnosticReport and BusinessReport as tools AI agents can invoke on behalf of business DS (leveraging existing `practitioner_next_steps()` infrastructure) + +--- + +## 9. Key Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| Oversimplifying may undermine credibility with academic users | Keep business layer additive -- don't change existing academic interface. Business tools translate, not replace. | +| Business tutorials may encourage methodologically unsound analysis | Embed guardrails: DiagnosticReport flags issues, tutorials emphasize assumption checking in business language | +| Scope creep | Phase 1 is documentation-only. Validate adoption signals before investing in code (Phase 3+). | +| Maintaining two audiences | Shared codebase, separate entry points. Like scikit-learn serving both ML engineers and researchers. | + +--- + +## 10. Success Metrics + +**Leading indicators (measurable after Phase 1):** +- Tutorial notebook page views / nbviewer hits for business tutorials +- GitHub issues or discussions mentioning business use cases (campaigns, surveys, geo-experiments) +- Search console impressions for business-oriented queries ("python campaign lift", "python geo experiment", "python survey did") + +**Lagging indicators (Phases 2-3):** +- PyPI download trajectory (month-over-month growth rate, not absolute) +- GitHub stars from non-academic profiles +- External blog posts or talks using diff-diff for business analysis + +**Phase 1 -> Phase 2 gate**: At least one of: (a) 3+ GitHub issues from business users, (b) measurable search impression growth for business queries, (c) qualitative signal that the business framing is resonating (social media, conference mentions). If none after 8 weeks, revisit the strategy before investing in code changes. + +--- + +## 11. Bottom Line + +We have the best DiD engine in Python. What we don't have is the business packaging. The methodology is sound, the survey support is unique, the diagnostic suite is unmatched. But a marketing data scientist looking at our docs sees academic econometrics, not their problem. + +The fix is mostly about **framing, examples, and a thin convenience layer** -- not rebuilding the core. Phase 1 requires zero code changes. Phases 2-3 add content and lightweight APIs. The competitive window is open because no one else is targeting this intersection: comprehensive DiD + business data science + Python. + +The survey use case is the sharpest wedge. No other tool in any language combines complex survey design with modern heterogeneity-robust DiD estimators. Lead with that, then broaden. diff --git a/docs/tutorials/17_brand_awareness_survey.ipynb b/docs/tutorials/17_brand_awareness_survey.ipynb new file mode 100644 index 00000000..170bea49 --- /dev/null +++ b/docs/tutorials/17_brand_awareness_survey.ipynb @@ -0,0 +1,799 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Measuring Campaign Impact on Brand Awareness with Survey Data\n", + "\n", + "Your company launched a brand awareness campaign in 8 of 20 designated market areas (DMAs).\n", + "The marketing team conducted brand tracking surveys across all 20 DMAs before and after the\n", + "campaign, using a stratified sampling design with demographic weighting.\n", + "\n", + "Marketing leadership wants to know:\n", + "\n", + "- Did aided awareness actually increase in the campaign markets?\n", + "- Did consideration move?\n", + "- How confident should we be in these numbers?\n", + "\n", + "This tutorial shows how to answer these questions using Difference-in-Differences (DiD) with\n", + "proper survey design corrections. DiD compares the change in campaign markets to the change in\n", + "control markets \u2014 if awareness went up 8 points in campaign markets but only 2 in control\n", + "markets, the incremental lift is 6 points.\n", + "\n", + "The complication: your survey data has a complex sampling design \u2014 stratified by region, with\n", + "unequal selection probabilities and geographic clustering. Ignoring this can make you\n", + "overconfident in your results.\n", + "\n", + "**What you'll learn:**\n", + "\n", + "1. Analyzing brand tracking survey data with DiD\n", + "2. Why survey design (weights, strata, clusters) changes your answer\n", + "3. Measuring multiple brand funnel metrics\n", + "4. Checking whether the result is trustworthy\n", + "5. Extending to staggered campaign rollouts\n", + "6. Communicating results to stakeholders" + ], + "id": "f8cd1807" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Setup" + ], + "id": "63c132f7" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import warnings\n", + "\n", + "import numpy as np\n", + "import pandas as pd\n", + "from diff_diff import (\n", + " DifferenceInDifferences,\n", + " SurveyDesign,\n", + " check_parallel_trends,\n", + ")\n", + "from diff_diff.prep import generate_survey_did_data\n", + "from diff_diff.practitioner import practitioner_next_steps\n", + "\n", + "try:\n", + " import matplotlib.pyplot as plt\n", + "\n", + " plt.style.use(\"seaborn-v0_8-whitegrid\")\n", + " HAS_MATPLOTLIB = True\n", + "except ImportError:\n", + " HAS_MATPLOTLIB = False\n", + " print(\"matplotlib not installed \u2014 plots will be skipped.\")" + ], + "id": "7c6c8ec1" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Data Preparation\n", + "\n", + "We'll generate synthetic brand tracking data that mirrors a real survey:\n", + "200 respondents across 8 waves, sampled from 5 geographic regions with\n", + "cluster sampling and demographic weighting. The campaign launches at wave 5\n", + "in a subset of markets." + ], + "id": "69d0010f" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Generate survey data with known treatment effect (~5 percentage points)\n", + "raw = generate_survey_did_data(\n", + " n_units=200,\n", + " n_periods=8,\n", + " cohort_periods=[5], # Campaign launches at wave 5\n", + " never_treated_frac=0.6, # ~60% of markets are control\n", + " treatment_effect=5.0, # True lift: 5 percentage points\n", + " n_strata=5, # 5 geographic regions\n", + " psu_per_stratum=4, # 4 sampling clusters per region\n", + " weight_variation=\"high\", # Substantial demographic weighting\n", + " informative_sampling=True,\n", + " return_true_population_att=True,\n", + " seed=46,\n", + ")\n", + "\n", + "# Create the binary indicators that DiD needs\n", + "raw[\"campaign_market\"] = (raw[\"first_treat\"] > 0).astype(int)\n", + "raw[\"post_campaign\"] = (raw[\"period\"] >= 5).astype(int)\n", + "\n", + "# Rename columns to business terms\n", + "data = raw.rename(columns={\n", + " \"unit\": \"market_id\",\n", + " \"period\": \"wave\",\n", + " \"outcome\": \"awareness\",\n", + " \"stratum\": \"region\",\n", + " \"psu\": \"cluster\",\n", + " \"weight\": \"survey_weight\",\n", + " \"first_treat\": \"campaign_start_wave\",\n", + " \"treated\": \"campaign_active\",\n", + "})\n", + "\n", + "# Scale awareness to realistic brand metric percentages (~45% baseline)\n", + "data[\"awareness\"] = data[\"awareness\"] + 45\n", + "\n", + "# Create additional brand funnel metrics\n", + "# Effects attenuate down the funnel: awareness > consideration > purchase intent\n", + "rng = np.random.default_rng(seed=99)\n", + "data[\"consideration\"] = 25 + (data[\"awareness\"] - 45) * 0.6 + rng.normal(0, 1.0, len(data))\n", + "data[\"purchase_intent\"] = 12 + (data[\"awareness\"] - 45) * 0.3 + rng.normal(0, 0.8, len(data))\n", + "\n", + "print(f\"Dataset: {data.shape[0]} observations, {data['market_id'].nunique()} markets, {data['wave'].nunique()} waves\")\n", + "print(f\"Campaign markets: {data.groupby('market_id')['campaign_market'].first().sum()}\")\n", + "print(f\"Control markets: {(~data.groupby('market_id')['campaign_market'].first().astype(bool)).sum()}\")" + ], + "id": "c6960896" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Average brand metrics by group and period\n", + "summary = data.groupby([\"campaign_market\", \"post_campaign\"]).agg(\n", + " awareness=(\"awareness\", \"mean\"),\n", + " consideration=(\"consideration\", \"mean\"),\n", + " purchase_intent=(\"purchase_intent\", \"mean\"),\n", + ").round(1)\n", + "summary.index = summary.index.set_names([\"Campaign Market\", \"Post Campaign\"])\n", + "summary" + ], + "id": "53cf1176" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Visual Inspection\n", + "\n", + "Before running any analysis, plot awareness over time for campaign vs. control markets.\n", + "The key question: were the two groups trending similarly *before* the campaign launched?" + ], + "id": "b16f8a20" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if HAS_MATPLOTLIB:\n", + " trends = data.groupby([\"wave\", \"campaign_market\"])[\"awareness\"].mean().unstack()\n", + " trends.columns = [\"Control Markets\", \"Campaign Markets\"]\n", + "\n", + " fig, ax = plt.subplots(figsize=(10, 5))\n", + " trends.plot(ax=ax, marker=\"o\", linewidth=2)\n", + " ax.axvline(x=4.5, color=\"gray\", linestyle=\"--\", alpha=0.7, label=\"Campaign Launch\")\n", + " ax.set_xlabel(\"Wave\")\n", + " ax.set_ylabel(\"Aided Awareness (%)\")\n", + " ax.set_title(\"Brand Awareness Over Time\")\n", + " ax.legend()\n", + " plt.tight_layout()\n", + " plt.show()\n", + "else:\n", + " print(trends.to_string())" + ], + "id": "c03c6b19" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Before the campaign launched (waves 1-4), awareness was trending similarly in both groups.\n", + "After launch (waves 5-8), campaign markets pulled ahead. This is exactly the pattern DiD\n", + "is designed to measure." + ], + "id": "8b4cef1e" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Naive DiD (Ignoring Survey Design)\n", + "\n", + "First, run a standard DiD analysis that treats every survey response equally." + ], + "id": "7245c127" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "did_naive = DifferenceInDifferences()\n", + "results_naive = did_naive.fit(\n", + " data,\n", + " outcome=\"awareness\",\n", + " treatment=\"campaign_market\",\n", + " time=\"post_campaign\",\n", + ")\n", + "print(results_naive)\n", + "print(f\"\\nThe campaign increased awareness by {results_naive.att:.1f} percentage points\")\n", + "print(f\"95% CI: ({results_naive.conf_int[0]:.1f}, {results_naive.conf_int[1]:.1f})\")" + ], + "id": "e5db9120" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This looks like a strong, precise result. But it treats every survey response as equally\n", + "informative and ignores the sampling structure. Let's see what happens when we account for\n", + "the survey design." + ], + "id": "81f52c46" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Survey-Aware DiD\n", + "\n", + "Brand tracking surveys rarely use simple random sampling. Respondents are sampled in\n", + "geographic clusters with demographic quotas and weighting. The `SurveyDesign` object\n", + "tells diff-diff how the survey was conducted." + ], + "id": "0bb69515" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sd = SurveyDesign(\n", + " weights=\"survey_weight\", # Accounts for demographic oversampling\n", + " strata=\"region\", # Sample was drawn separately within each region\n", + " psu=\"cluster\", # Respondents sampled in geographic clusters\n", + " fpc=\"fpc\", # Finite population correction\n", + ")\n", + "\n", + "did_survey = DifferenceInDifferences()\n", + "with warnings.catch_warnings():\n", + " warnings.simplefilter(\"ignore\") # Suppress weight normalization notice\n", + " results_survey = did_survey.fit(\n", + " data,\n", + " outcome=\"awareness\",\n", + " treatment=\"campaign_market\",\n", + " time=\"post_campaign\",\n", + " survey_design=sd,\n", + " )\n", + "\n", + "print(results_survey)\n", + "print(f\"\\nThe campaign increased awareness by {results_survey.att:.1f} percentage points\")\n", + "print(f\"95% CI: ({results_survey.conf_int[0]:.1f}, {results_survey.conf_int[1]:.1f})\")" + ], + "id": "efbf20d6" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### What Changed?\n", + "\n", + "Let's compare the naive and survey-aware results side by side." + ], + "id": "dd92bbc9" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "se_ratio = results_survey.se / results_naive.se\n", + "\n", + "comparison = pd.DataFrame({\n", + " \"Naive\": [\n", + " f\"{results_naive.att:.2f}\",\n", + " f\"{results_naive.se:.3f}\",\n", + " f\"({results_naive.conf_int[0]:.1f}, {results_naive.conf_int[1]:.1f})\",\n", + " f\"{results_naive.p_value:.4f}\",\n", + " ],\n", + " \"Survey-Aware\": [\n", + " f\"{results_survey.att:.2f}\",\n", + " f\"{results_survey.se:.3f}\",\n", + " f\"({results_survey.conf_int[0]:.1f}, {results_survey.conf_int[1]:.1f})\",\n", + " f\"{results_survey.p_value:.4f}\",\n", + " ],\n", + "}, index=[\"Lift (pp)\", \"Std Error\", \"95% CI\", \"p-value\"])\n", + "\n", + "print(comparison.to_string())\n", + "print(f\"\\nDesign effect (SE ratio): {se_ratio:.2f}x\")\n", + "print(f\"Survey-aware standard errors are {(se_ratio - 1) * 100:.0f}% larger than naive.\")\n", + "print(f\"\\nThe lift estimate is similar, but the naive analysis makes you think\")\n", + "print(f\"you know it more precisely than you actually do.\")" + ], + "id": "387a083f" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The standard errors more than doubled. Respondents within the same geographic cluster\n", + "tend to answer similarly, so each response carries less independent information than the\n", + "raw sample size suggests. The naive analysis was overconfident.\n", + "\n", + "In this case, both analyses agree the campaign worked \u2014 but the survey-aware confidence\n", + "interval is much wider. In a closer call, ignoring the survey design could lead you to\n", + "claim a significant result when the evidence is actually inconclusive." + ], + "id": "e8a4065d" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Multiple Brand Metrics\n", + "\n", + "Brand campaigns don't just move awareness \u2014 they should also move consideration and\n", + "purchase intent. Let's measure the lift across the full brand funnel." + ], + "id": "60fc326a" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "outcomes = [\"awareness\", \"consideration\", \"purchase_intent\"]\n", + "funnel_results = {}\n", + "\n", + "for outcome in outcomes:\n", + " did = DifferenceInDifferences()\n", + " with warnings.catch_warnings():\n", + " warnings.simplefilter(\"ignore\")\n", + " r = did.fit(\n", + " data,\n", + " outcome=outcome,\n", + " treatment=\"campaign_market\",\n", + " time=\"post_campaign\",\n", + " survey_design=sd,\n", + " )\n", + " funnel_results[outcome] = r\n", + "\n", + "# Results table\n", + "funnel_df = pd.DataFrame({\n", + " \"Metric\": [\"Awareness\", \"Consideration\", \"Purchase Intent\"],\n", + " \"Lift (pp)\": [funnel_results[o].att for o in outcomes],\n", + " \"SE\": [funnel_results[o].se for o in outcomes],\n", + " \"95% CI Lower\": [funnel_results[o].conf_int[0] for o in outcomes],\n", + " \"95% CI Upper\": [funnel_results[o].conf_int[1] for o in outcomes],\n", + " \"p-value\": [funnel_results[o].p_value for o in outcomes],\n", + "}).round(2)\n", + "\n", + "print(funnel_df.to_string(index=False))" + ], + "id": "f891e2f1" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if HAS_MATPLOTLIB:\n", + " metrics = [\"Awareness\", \"Consideration\", \"Purchase\\nIntent\"]\n", + " lifts = [funnel_results[o].att for o in outcomes]\n", + " ci_low = [funnel_results[o].conf_int[0] for o in outcomes]\n", + " ci_high = [funnel_results[o].conf_int[1] for o in outcomes]\n", + " errors = [[l - lo for l, lo in zip(lifts, ci_low)],\n", + " [hi - l for l, hi in zip(lifts, ci_high)]]\n", + "\n", + " fig, ax = plt.subplots(figsize=(8, 5))\n", + " bars = ax.bar(metrics, lifts, color=[\"#2196F3\", \"#4CAF50\", \"#FF9800\"],\n", + " yerr=errors, capsize=8, edgecolor=\"black\", linewidth=0.5)\n", + " ax.axhline(y=0, color=\"black\", linewidth=0.5)\n", + " ax.set_ylabel(\"Incremental Lift (percentage points)\")\n", + " ax.set_title(\"Campaign Impact Across the Brand Funnel\")\n", + "\n", + " for bar, lift in zip(bars, lifts):\n", + " ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.3,\n", + " f\"+{lift:.1f}pp\", ha=\"center\", va=\"bottom\", fontweight=\"bold\")\n", + "\n", + " plt.tight_layout()\n", + " plt.show()" + ], + "id": "14ff1a06" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The campaign moved awareness the most, consideration less, and purchase intent the\n", + "least. This is typical funnel attenuation \u2014 the message reached people but didn't fully\n", + "convert to purchase consideration. All three effects are statistically significant." + ], + "id": "d3b6008d" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Is This Result Trustworthy?\n", + "\n", + "Two diagnostic checks help validate the result." + ], + "id": "d68bf901" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Parallel Trends Check\n", + "\n", + "DiD assumes campaign and control markets would have continued trending the same way if\n", + "the campaign hadn't run. We can check whether the pre-campaign trends were similar." + ], + "id": "96bbef84" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pt = check_parallel_trends(\n", + " data,\n", + " outcome=\"awareness\",\n", + " time=\"wave\",\n", + " treatment_group=\"campaign_market\",\n", + ")\n", + "\n", + "print(f\"Pre-campaign trend difference: {pt['trend_difference']:.3f}\")\n", + "print(f\"p-value: {pt['p_value']:.3f}\")\n", + "print(f\"\\nParallel trends {'supported' if pt['parallel_trends_plausible'] else 'NOT supported'}\")\n", + "if pt[\"parallel_trends_plausible\"]:\n", + " print(\"Before the campaign, awareness was trending at the same rate in both groups.\")" + ], + "id": "3d718ede" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Placebo Test\n", + "\n", + "Run the same DiD analysis on the pre-campaign period only, where no campaign effect\n", + "should exist. If we find a \"significant\" effect here, something is wrong." + ], + "id": "aecefb2f" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Use waves 1-4 only; split at wave 3 as a \"placebo\" campaign launch\n", + "pre_data = data[data[\"wave\"] <= 4].copy()\n", + "pre_data[\"placebo_post\"] = (pre_data[\"wave\"] >= 3).astype(int)\n", + "\n", + "did_placebo = DifferenceInDifferences()\n", + "r_placebo = did_placebo.fit(\n", + " pre_data,\n", + " outcome=\"awareness\",\n", + " treatment=\"campaign_market\",\n", + " time=\"placebo_post\",\n", + ")\n", + "\n", + "print(f\"Placebo lift: {r_placebo.att:.2f} pp (p = {r_placebo.p_value:.3f})\")\n", + "if r_placebo.p_value > 0.05:\n", + " print(\"No significant effect in the pre-campaign period \u2014 the method isn't picking up spurious patterns.\")\n", + "else:\n", + " print(\"WARNING: Significant placebo effect detected \u2014 investigate further.\")" + ], + "id": "ef7db9b1" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Both checks pass: the pre-campaign trends were parallel, and the placebo test\n", + "finds no effect where none should exist. This gives us confidence the measured\n", + "lift is attributable to the campaign.\n", + "\n", + "For event study designs (Section 8 below), you can also run\n", + "[HonestDiD sensitivity analysis](../tutorials/05_honest_did.ipynb) to quantify\n", + "how robust the result is to violations of the parallel trends assumption." + ], + "id": "5a36cef4" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Practitioner Guidance\n", + "\n", + "diff-diff includes an automated checklist based on the\n", + "[Baker et al. (2025)](https://arxiv.org/pdf/2503.13323) practitioner workflow.\n", + "It suggests diagnostic steps based on your estimator and results.\n", + "\n", + "*Note: the code snippets in the output use placeholder column names \u2014 substitute\n", + "your own.*" + ], + "id": "ea3733ec" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "practitioner_next_steps(results_survey, verbose=True)" + ], + "id": "8e82a98a" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8. Extension: Staggered Campaign Rollout\n", + "\n", + "Many campaigns don't launch in all markets at once \u2014 they roll out in waves.\n", + "Some markets go live in month 2, others in month 4. When this happens, basic\n", + "DiD can give biased results. The `CallawaySantAnna` estimator handles this\n", + "correctly.\n", + "\n", + "Let's generate data where the campaign rolled out in two waves." + ], + "id": "5dd70c53" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from diff_diff import CallawaySantAnna\n", + "from diff_diff.visualization import plot_event_study\n", + "\n", + "# Campaign rolls out in two waves: some markets at wave 3, others at wave 5\n", + "stag_raw = generate_survey_did_data(\n", + " n_units=200,\n", + " n_periods=8,\n", + " cohort_periods=[3, 5],\n", + " never_treated_frac=0.4,\n", + " treatment_effect=5.0,\n", + " dynamic_effects=True,\n", + " effect_growth=0.1, # Effect builds 10% per wave (repeated exposure)\n", + " n_strata=5,\n", + " psu_per_stratum=4,\n", + " weight_variation=\"high\",\n", + " informative_sampling=True,\n", + " return_true_population_att=True,\n", + " seed=42,\n", + ")\n", + "\n", + "stag_data = stag_raw.rename(columns={\n", + " \"unit\": \"market_id\", \"period\": \"wave\", \"outcome\": \"awareness\",\n", + " \"stratum\": \"region\", \"psu\": \"cluster\", \"weight\": \"survey_weight\",\n", + " \"first_treat\": \"campaign_start_wave\",\n", + "})\n", + "stag_data[\"awareness\"] = stag_data[\"awareness\"] + 45\n", + "\n", + "print(f\"Campaign cohorts: {sorted(stag_data['campaign_start_wave'].unique())}\")\n", + "print(f\" Wave 3 launch: {(stag_data.groupby('market_id')['campaign_start_wave'].first() == 3).sum()} markets\")\n", + "print(f\" Wave 5 launch: {(stag_data.groupby('market_id')['campaign_start_wave'].first() == 5).sum()} markets\")\n", + "print(f\" Control: {(stag_data.groupby('market_id')['campaign_start_wave'].first() == 0).sum()} markets\")" + ], + "id": "7d1c9510" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "stag_sd = SurveyDesign(\n", + " weights=\"survey_weight\", strata=\"region\", psu=\"cluster\", fpc=\"fpc\",\n", + ")\n", + "\n", + "cs = CallawaySantAnna()\n", + "with warnings.catch_warnings():\n", + " warnings.simplefilter(\"ignore\")\n", + " stag_results = cs.fit(\n", + " stag_data,\n", + " outcome=\"awareness\",\n", + " unit=\"market_id\",\n", + " time=\"wave\",\n", + " first_treat=\"campaign_start_wave\",\n", + " aggregate=\"event_study\",\n", + " survey_design=stag_sd,\n", + " )\n", + "\n", + "print(stag_results)\n", + "print(f\"\\nOverall campaign lift: {stag_results.overall_att:.1f} pp\")\n", + "print(\"\\nEvent study effects (relative to campaign launch):\")\n", + "print(stag_results.to_dataframe(level=\"event_study\").round(2).to_string(index=False))" + ], + "id": "dd607a40" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if HAS_MATPLOTLIB:\n", + " fig, ax = plt.subplots(figsize=(10, 6))\n", + " plot_event_study(\n", + " stag_results,\n", + " ax=ax,\n", + " title=\"Campaign Effect Over Time (Staggered Rollout)\",\n", + " xlabel=\"Waves Relative to Campaign Launch\",\n", + " ylabel=\"Awareness Lift (pp)\",\n", + " )\n", + " plt.tight_layout()\n", + " plt.show()" + ], + "id": "372755dd" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The event study shows the campaign effect building over time \u2014 starting around 5pp at\n", + "launch and growing to about 7pp with sustained exposure. Pre-campaign periods show no\n", + "significant effects, confirming the parallel trends assumption holds." + ], + "id": "f67896f2" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Sensitivity Analysis\n", + "\n", + "HonestDiD ([Rambachan & Roth, 2023](https://academic.oup.com/restud/article/90/5/2555/7039335))\n", + "tells us how much the parallel trends assumption would need to be violated for the\n", + "result to disappear." + ], + "id": "281d0958" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from diff_diff import compute_honest_did\n", + "\n", + "with warnings.catch_warnings():\n", + " warnings.simplefilter(\"ignore\")\n", + " honest = compute_honest_did(stag_results, method=\"relative_magnitude\", M=1.0)\n", + "\n", + "print(honest.summary())\n", + "print(\"\\nIn plain English: even if the pre-campaign trends were off by as much as\")\n", + "print(\"the largest observed pre-period fluctuation, the campaign effect remains positive.\")" + ], + "id": "0d7c096e" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9. Communicating Results to Leadership\n", + "\n", + "Here's how to write up the finding for stakeholders:" + ], + "id": "751a8e53" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "r = results_survey # The main 2x2 result\n", + "\n", + "print(\"=\"* 70)\n", + "print(\"EXECUTIVE SUMMARY\")\n", + "print(\"=\"* 70)\n", + "print(f\"\"\"\n", + "The brand awareness campaign increased aided awareness by {r.att:.1f}\n", + "percentage points (95% CI: {r.conf_int[0]:.1f} to {r.conf_int[1]:.1f})\n", + "in the {data.groupby('market_id')['campaign_market'].first().sum()} campaign\n", + "markets compared to the {(~data.groupby('market_id')['campaign_market'].first().astype(bool)).sum()} control markets.\n", + "\n", + "This result accounts for the complex survey sampling design and is\n", + "supported by pre-campaign trend analysis and placebo testing.\n", + "\n", + "Impact across the brand funnel:\n", + " - Awareness: +{funnel_results['awareness'].att:.1f} pp\n", + " - Consideration: +{funnel_results['consideration'].att:.1f} pp\n", + " - Purchase Intent: +{funnel_results['purchase_intent'].att:.1f} pp\n", + "\n", + "The effect attenuates down the funnel, suggesting the campaign\n", + "successfully raised awareness but further investment is needed to\n", + "convert awareness into purchase consideration.\n", + "\"\"\")" + ], + "id": "2fa7cc34" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Key points for your write-up:**\n", + "\n", + "- Report the **survey-aware** estimate, not the naive one \u2014 it reflects the true uncertainty\n", + "- Include confidence intervals, not just point estimates \u2014 leadership should understand the range\n", + "- Distinguish **statistical significance** (is the effect real?) from **practical significance**\n", + " (is it big enough to matter?)\n", + "- A 5pp lift in awareness from 46% to 51% may or may not justify the campaign spend \u2014\n", + " that's a business judgment, not a statistical one" + ], + "id": "d4884c2b" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "**What we covered:**\n", + "\n", + "- **Survey design matters**: Ignoring the complex sampling structure made standard errors\n", + " more than 2x too small, creating false precision\n", + "- **DiD with survey data**: `SurveyDesign` integrates directly with all diff-diff estimators \u2014\n", + " just pass `survey_design=sd` to `.fit()`\n", + "- **Brand funnel analysis**: Measuring awareness, consideration, and purchase intent together\n", + " reveals where the campaign effect attenuates\n", + "- **Diagnostics**: Parallel trends checks and placebo tests validate the result in plain terms\n", + "- **Staggered rollouts**: `CallawaySantAnna` handles campaigns that launch in waves, with\n", + " event study plots showing how the effect builds over time\n", + "- **Sensitivity**: HonestDiD quantifies how robust the result is to assumption violations\n", + "\n", + "**When to use this approach:**\n", + "\n", + "- You have survey data collected before and after a campaign or intervention\n", + "- The campaign ran in some markets/regions but not others\n", + "- Randomized A/B testing wasn't feasible\n", + "- Your survey uses stratified sampling, clustering, or weighting\n", + "\n", + "**Related tutorials:**\n", + "\n", + "- [Tutorial 16: Survey DiD](16_survey_did.ipynb) \u2014 deep dive into survey design theory,\n", + " replicate weights, and design effect diagnostics\n", + "- [Tutorial 02: Staggered DiD](02_staggered_did.ipynb) \u2014 more on Callaway-Sant'Anna and\n", + " staggered adoption designs\n", + "- [Tutorial 05: Honest DiD](05_honest_did.ipynb) \u2014 full sensitivity analysis guide" + ], + "id": "f3b24495" + } + ], + "metadata": { + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file From 5873d1f660a4700888825b57361c3a4e09a6abd1 Mon Sep 17 00:00:00 2001 From: igerber Date: Thu, 9 Apr 2026 11:01:32 -0400 Subject: [PATCH 2/3] Address CI review: fix methodology, unit of analysis, and warnings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - P1: Use base_period="universal" for CallawaySantAnna in staggered section so HonestDiD sensitivity analysis is methodologically valid - P2: Fix unit of analysis — rename to respondent_id, reframe narrative as respondent-level survey data (not market-level DMAs) - P2: Fix matplotlib fallback — compute trends before conditional - P2: Replace blanket warnings.simplefilter("ignore") with targeted RuntimeWarning filter for survey module matmul artifacts only; UserWarnings and methodology warnings now come through - P3: Rename "Design effect (SE ratio)" to "SE inflation ratio" to avoid terminology drift from formal DEFF definition - Soften parallel trends language from "confirming" to "consistent with" Co-Authored-By: Claude Opus 4.6 (1M context) --- .../tutorials/17_brand_awareness_survey.ipynb | 402 ++---------------- 1 file changed, 30 insertions(+), 372 deletions(-) diff --git a/docs/tutorials/17_brand_awareness_survey.ipynb b/docs/tutorials/17_brand_awareness_survey.ipynb index 170bea49..d0ddb74d 100644 --- a/docs/tutorials/17_brand_awareness_survey.ipynb +++ b/docs/tutorials/17_brand_awareness_survey.ipynb @@ -3,37 +3,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "# Measuring Campaign Impact on Brand Awareness with Survey Data\n", - "\n", - "Your company launched a brand awareness campaign in 8 of 20 designated market areas (DMAs).\n", - "The marketing team conducted brand tracking surveys across all 20 DMAs before and after the\n", - "campaign, using a stratified sampling design with demographic weighting.\n", - "\n", - "Marketing leadership wants to know:\n", - "\n", - "- Did aided awareness actually increase in the campaign markets?\n", - "- Did consideration move?\n", - "- How confident should we be in these numbers?\n", - "\n", - "This tutorial shows how to answer these questions using Difference-in-Differences (DiD) with\n", - "proper survey design corrections. DiD compares the change in campaign markets to the change in\n", - "control markets \u2014 if awareness went up 8 points in campaign markets but only 2 in control\n", - "markets, the incremental lift is 6 points.\n", - "\n", - "The complication: your survey data has a complex sampling design \u2014 stratified by region, with\n", - "unequal selection probabilities and geographic clustering. Ignoring this can make you\n", - "overconfident in your results.\n", - "\n", - "**What you'll learn:**\n", - "\n", - "1. Analyzing brand tracking survey data with DiD\n", - "2. Why survey design (weights, strata, clusters) changes your answer\n", - "3. Measuring multiple brand funnel metrics\n", - "4. Checking whether the result is trustworthy\n", - "5. Extending to staggered campaign rollouts\n", - "6. Communicating results to stakeholders" - ], + "source": "# Measuring Campaign Impact on Brand Awareness with Survey Data\n\nYour company launched a brand awareness campaign in certain markets.\nThe marketing team conducted brand tracking surveys across all markets before and after the\ncampaign, using a stratified sampling design with demographic weighting. Each wave surveyed\n200 respondents — some in campaign markets, some in control markets.\n\nMarketing leadership wants to know:\n\n- Did aided awareness actually increase among respondents in campaign markets?\n- Did consideration move?\n- How confident should we be in these numbers?\n\nThis tutorial shows how to answer these questions using Difference-in-Differences (DiD) with\nproper survey design corrections. DiD compares the change among campaign-exposed respondents\nto the change among control respondents — if awareness went up 8 points in campaign markets\nbut only 2 in control markets, the incremental lift is 6 points.\n\nThe complication: your survey data has a complex sampling design — stratified by region, with\nunequal selection probabilities and geographic clustering. Ignoring this can make you\noverconfident in your results.\n\n**What you'll learn:**\n\n1. Analyzing brand tracking survey data with DiD\n2. Why survey design (weights, strata, clusters) changes your answer\n3. Measuring multiple brand funnel metrics\n4. Checking whether the result is trustworthy\n5. Extending to staggered campaign rollouts\n6. Communicating results to stakeholders", "id": "f8cd1807" }, { @@ -49,41 +19,13 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "import warnings\n", - "\n", - "import numpy as np\n", - "import pandas as pd\n", - "from diff_diff import (\n", - " DifferenceInDifferences,\n", - " SurveyDesign,\n", - " check_parallel_trends,\n", - ")\n", - "from diff_diff.prep import generate_survey_did_data\n", - "from diff_diff.practitioner import practitioner_next_steps\n", - "\n", - "try:\n", - " import matplotlib.pyplot as plt\n", - "\n", - " plt.style.use(\"seaborn-v0_8-whitegrid\")\n", - " HAS_MATPLOTLIB = True\n", - "except ImportError:\n", - " HAS_MATPLOTLIB = False\n", - " print(\"matplotlib not installed \u2014 plots will be skipped.\")" - ], + "source": "import warnings\n\nimport numpy as np\nimport pandas as pd\nfrom diff_diff import (\n DifferenceInDifferences,\n SurveyDesign,\n check_parallel_trends,\n)\nfrom diff_diff.prep import generate_survey_did_data\nfrom diff_diff.practitioner import practitioner_next_steps\n\n# Suppress numerical artifacts from survey variance computation with\n# extreme weights. These are benign matmul edge cases, not methodology\n# issues — results are unaffected. All other warnings come through.\nwarnings.filterwarnings(\"ignore\", category=RuntimeWarning, module=\"diff_diff.survey\")\n\ntry:\n import matplotlib.pyplot as plt\n\n plt.style.use(\"seaborn-v0_8-whitegrid\")\n HAS_MATPLOTLIB = True\nexcept ImportError:\n HAS_MATPLOTLIB = False\n print(\"matplotlib not installed — plots will be skipped.\")", "id": "7c6c8ec1" }, { "cell_type": "markdown", "metadata": {}, - "source": [ - "## 2. Data Preparation\n", - "\n", - "We'll generate synthetic brand tracking data that mirrors a real survey:\n", - "200 respondents across 8 waves, sampled from 5 geographic regions with\n", - "cluster sampling and demographic weighting. The campaign launches at wave 5\n", - "in a subset of markets." - ], + "source": "## 2. Data Preparation\n\nWe'll generate synthetic brand tracking data that mirrors a real survey:\n200 respondents across 8 waves, sampled from 5 geographic regions with\ncluster sampling and demographic weighting. The campaign launches at wave 5\nfor respondents in certain markets.", "id": "69d0010f" }, { @@ -91,51 +33,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "# Generate survey data with known treatment effect (~5 percentage points)\n", - "raw = generate_survey_did_data(\n", - " n_units=200,\n", - " n_periods=8,\n", - " cohort_periods=[5], # Campaign launches at wave 5\n", - " never_treated_frac=0.6, # ~60% of markets are control\n", - " treatment_effect=5.0, # True lift: 5 percentage points\n", - " n_strata=5, # 5 geographic regions\n", - " psu_per_stratum=4, # 4 sampling clusters per region\n", - " weight_variation=\"high\", # Substantial demographic weighting\n", - " informative_sampling=True,\n", - " return_true_population_att=True,\n", - " seed=46,\n", - ")\n", - "\n", - "# Create the binary indicators that DiD needs\n", - "raw[\"campaign_market\"] = (raw[\"first_treat\"] > 0).astype(int)\n", - "raw[\"post_campaign\"] = (raw[\"period\"] >= 5).astype(int)\n", - "\n", - "# Rename columns to business terms\n", - "data = raw.rename(columns={\n", - " \"unit\": \"market_id\",\n", - " \"period\": \"wave\",\n", - " \"outcome\": \"awareness\",\n", - " \"stratum\": \"region\",\n", - " \"psu\": \"cluster\",\n", - " \"weight\": \"survey_weight\",\n", - " \"first_treat\": \"campaign_start_wave\",\n", - " \"treated\": \"campaign_active\",\n", - "})\n", - "\n", - "# Scale awareness to realistic brand metric percentages (~45% baseline)\n", - "data[\"awareness\"] = data[\"awareness\"] + 45\n", - "\n", - "# Create additional brand funnel metrics\n", - "# Effects attenuate down the funnel: awareness > consideration > purchase intent\n", - "rng = np.random.default_rng(seed=99)\n", - "data[\"consideration\"] = 25 + (data[\"awareness\"] - 45) * 0.6 + rng.normal(0, 1.0, len(data))\n", - "data[\"purchase_intent\"] = 12 + (data[\"awareness\"] - 45) * 0.3 + rng.normal(0, 0.8, len(data))\n", - "\n", - "print(f\"Dataset: {data.shape[0]} observations, {data['market_id'].nunique()} markets, {data['wave'].nunique()} waves\")\n", - "print(f\"Campaign markets: {data.groupby('market_id')['campaign_market'].first().sum()}\")\n", - "print(f\"Control markets: {(~data.groupby('market_id')['campaign_market'].first().astype(bool)).sum()}\")" - ], + "source": "# Generate survey data with known treatment effect (~5 percentage points)\nraw = generate_survey_did_data(\n n_units=200,\n n_periods=8,\n cohort_periods=[5], # Campaign launches at wave 5\n never_treated_frac=0.6, # ~60% of respondents are in control markets\n treatment_effect=5.0, # True lift: 5 percentage points\n n_strata=5, # 5 geographic regions\n psu_per_stratum=4, # 4 sampling clusters per region\n weight_variation=\"high\", # Substantial demographic weighting\n informative_sampling=True,\n return_true_population_att=True,\n seed=46,\n)\n\n# Create the binary indicators that DiD needs\nraw[\"campaign_respondent\"] = (raw[\"first_treat\"] > 0).astype(int)\nraw[\"post_campaign\"] = (raw[\"period\"] >= 5).astype(int)\n\n# Rename columns to business terms\ndata = raw.rename(columns={\n \"unit\": \"respondent_id\",\n \"period\": \"wave\",\n \"outcome\": \"awareness\",\n \"stratum\": \"region\",\n \"psu\": \"cluster\",\n \"weight\": \"survey_weight\",\n \"first_treat\": \"campaign_start_wave\",\n \"treated\": \"campaign_active\",\n})\n\n# Scale awareness to realistic brand metric percentages (~45% baseline)\ndata[\"awareness\"] = data[\"awareness\"] + 45\n\n# Create additional brand funnel metrics\n# Effects attenuate down the funnel: awareness > consideration > purchase intent\nrng = np.random.default_rng(seed=99)\ndata[\"consideration\"] = 25 + (data[\"awareness\"] - 45) * 0.6 + rng.normal(0, 1.0, len(data))\ndata[\"purchase_intent\"] = 12 + (data[\"awareness\"] - 45) * 0.3 + rng.normal(0, 0.8, len(data))\n\nprint(f\"Dataset: {data.shape[0]} observations, {data['respondent_id'].nunique()} respondents, {data['wave'].nunique()} waves\")\nprint(f\"Campaign respondents: {data.groupby('respondent_id')['campaign_respondent'].first().sum()}\")\nprint(f\"Control respondents: {(~data.groupby('respondent_id')['campaign_respondent'].first().astype(bool)).sum()}\")", "id": "c6960896" }, { @@ -143,16 +41,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "# Average brand metrics by group and period\n", - "summary = data.groupby([\"campaign_market\", \"post_campaign\"]).agg(\n", - " awareness=(\"awareness\", \"mean\"),\n", - " consideration=(\"consideration\", \"mean\"),\n", - " purchase_intent=(\"purchase_intent\", \"mean\"),\n", - ").round(1)\n", - "summary.index = summary.index.set_names([\"Campaign Market\", \"Post Campaign\"])\n", - "summary" - ], + "source": "# Average brand metrics by group and period\nsummary = data.groupby([\"campaign_respondent\", \"post_campaign\"]).agg(\n awareness=(\"awareness\", \"mean\"),\n consideration=(\"consideration\", \"mean\"),\n purchase_intent=(\"purchase_intent\", \"mean\"),\n).round(1)\nsummary.index = summary.index.set_names([\"Campaign Respondent\", \"Post Campaign\"])\nsummary", "id": "53cf1176" }, { @@ -171,23 +60,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "if HAS_MATPLOTLIB:\n", - " trends = data.groupby([\"wave\", \"campaign_market\"])[\"awareness\"].mean().unstack()\n", - " trends.columns = [\"Control Markets\", \"Campaign Markets\"]\n", - "\n", - " fig, ax = plt.subplots(figsize=(10, 5))\n", - " trends.plot(ax=ax, marker=\"o\", linewidth=2)\n", - " ax.axvline(x=4.5, color=\"gray\", linestyle=\"--\", alpha=0.7, label=\"Campaign Launch\")\n", - " ax.set_xlabel(\"Wave\")\n", - " ax.set_ylabel(\"Aided Awareness (%)\")\n", - " ax.set_title(\"Brand Awareness Over Time\")\n", - " ax.legend()\n", - " plt.tight_layout()\n", - " plt.show()\n", - "else:\n", - " print(trends.to_string())" - ], + "source": "trends = data.groupby([\"wave\", \"campaign_respondent\"])[\"awareness\"].mean().unstack()\ntrends.columns = [\"Control\", \"Campaign\"]\n\nif HAS_MATPLOTLIB:\n fig, ax = plt.subplots(figsize=(10, 5))\n trends.plot(ax=ax, marker=\"o\", linewidth=2)\n ax.axvline(x=4.5, color=\"gray\", linestyle=\"--\", alpha=0.7, label=\"Campaign Launch\")\n ax.set_xlabel(\"Wave\")\n ax.set_ylabel(\"Aided Awareness (%)\")\n ax.set_title(\"Brand Awareness Over Time\")\n ax.legend()\n plt.tight_layout()\n plt.show()\nelse:\n print(trends.to_string())", "id": "c03c6b19" }, { @@ -215,18 +88,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "did_naive = DifferenceInDifferences()\n", - "results_naive = did_naive.fit(\n", - " data,\n", - " outcome=\"awareness\",\n", - " treatment=\"campaign_market\",\n", - " time=\"post_campaign\",\n", - ")\n", - "print(results_naive)\n", - "print(f\"\\nThe campaign increased awareness by {results_naive.att:.1f} percentage points\")\n", - "print(f\"95% CI: ({results_naive.conf_int[0]:.1f}, {results_naive.conf_int[1]:.1f})\")" - ], + "source": "did_naive = DifferenceInDifferences()\nresults_naive = did_naive.fit(\n data,\n outcome=\"awareness\",\n treatment=\"campaign_respondent\",\n time=\"post_campaign\",\n)\nprint(results_naive)\nprint(f\"\\nThe campaign increased awareness by {results_naive.att:.1f} percentage points\")\nprint(f\"95% CI: ({results_naive.conf_int[0]:.1f}, {results_naive.conf_int[1]:.1f})\")", "id": "e5db9120" }, { @@ -256,29 +118,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "sd = SurveyDesign(\n", - " weights=\"survey_weight\", # Accounts for demographic oversampling\n", - " strata=\"region\", # Sample was drawn separately within each region\n", - " psu=\"cluster\", # Respondents sampled in geographic clusters\n", - " fpc=\"fpc\", # Finite population correction\n", - ")\n", - "\n", - "did_survey = DifferenceInDifferences()\n", - "with warnings.catch_warnings():\n", - " warnings.simplefilter(\"ignore\") # Suppress weight normalization notice\n", - " results_survey = did_survey.fit(\n", - " data,\n", - " outcome=\"awareness\",\n", - " treatment=\"campaign_market\",\n", - " time=\"post_campaign\",\n", - " survey_design=sd,\n", - " )\n", - "\n", - "print(results_survey)\n", - "print(f\"\\nThe campaign increased awareness by {results_survey.att:.1f} percentage points\")\n", - "print(f\"95% CI: ({results_survey.conf_int[0]:.1f}, {results_survey.conf_int[1]:.1f})\")" - ], + "source": "sd = SurveyDesign(\n weights=\"survey_weight\", # Accounts for demographic oversampling\n strata=\"region\", # Sample was drawn separately within each region\n psu=\"cluster\", # Respondents sampled in geographic clusters\n fpc=\"fpc\", # Finite population correction\n)\n\ndid_survey = DifferenceInDifferences()\nresults_survey = did_survey.fit(\n data,\n outcome=\"awareness\",\n treatment=\"campaign_respondent\",\n time=\"post_campaign\",\n survey_design=sd,\n)\n\nprint(results_survey)\nprint(f\"\\nThe campaign increased awareness by {results_survey.att:.1f} percentage points\")\nprint(f\"95% CI: ({results_survey.conf_int[0]:.1f}, {results_survey.conf_int[1]:.1f})\")", "id": "efbf20d6" }, { @@ -296,30 +136,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "se_ratio = results_survey.se / results_naive.se\n", - "\n", - "comparison = pd.DataFrame({\n", - " \"Naive\": [\n", - " f\"{results_naive.att:.2f}\",\n", - " f\"{results_naive.se:.3f}\",\n", - " f\"({results_naive.conf_int[0]:.1f}, {results_naive.conf_int[1]:.1f})\",\n", - " f\"{results_naive.p_value:.4f}\",\n", - " ],\n", - " \"Survey-Aware\": [\n", - " f\"{results_survey.att:.2f}\",\n", - " f\"{results_survey.se:.3f}\",\n", - " f\"({results_survey.conf_int[0]:.1f}, {results_survey.conf_int[1]:.1f})\",\n", - " f\"{results_survey.p_value:.4f}\",\n", - " ],\n", - "}, index=[\"Lift (pp)\", \"Std Error\", \"95% CI\", \"p-value\"])\n", - "\n", - "print(comparison.to_string())\n", - "print(f\"\\nDesign effect (SE ratio): {se_ratio:.2f}x\")\n", - "print(f\"Survey-aware standard errors are {(se_ratio - 1) * 100:.0f}% larger than naive.\")\n", - "print(f\"\\nThe lift estimate is similar, but the naive analysis makes you think\")\n", - "print(f\"you know it more precisely than you actually do.\")" - ], + "source": "se_ratio = results_survey.se / results_naive.se\n\ncomparison = pd.DataFrame({\n \"Naive\": [\n f\"{results_naive.att:.2f}\",\n f\"{results_naive.se:.3f}\",\n f\"({results_naive.conf_int[0]:.1f}, {results_naive.conf_int[1]:.1f})\",\n f\"{results_naive.p_value:.4f}\",\n ],\n \"Survey-Aware\": [\n f\"{results_survey.att:.2f}\",\n f\"{results_survey.se:.3f}\",\n f\"({results_survey.conf_int[0]:.1f}, {results_survey.conf_int[1]:.1f})\",\n f\"{results_survey.p_value:.4f}\",\n ],\n}, index=[\"Lift (pp)\", \"Std Error\", \"95% CI\", \"p-value\"])\n\nprint(comparison.to_string())\nprint(f\"\\nSE inflation ratio: {se_ratio:.2f}x\")\nprint(f\"Survey-aware standard errors are {(se_ratio - 1) * 100:.0f}% larger than naive.\")\nprint(f\"\\nThe lift estimate is similar, but the naive analysis makes you think\")\nprint(f\"you know it more precisely than you actually do.\")", "id": "387a083f" }, { @@ -330,7 +147,7 @@ "tend to answer similarly, so each response carries less independent information than the\n", "raw sample size suggests. The naive analysis was overconfident.\n", "\n", - "In this case, both analyses agree the campaign worked \u2014 but the survey-aware confidence\n", + "In this case, both analyses agree the campaign worked — but the survey-aware confidence\n", "interval is much wider. In a closer call, ignoring the survey design could lead you to\n", "claim a significant result when the evidence is actually inconclusive." ], @@ -342,7 +159,7 @@ "source": [ "## 6. Multiple Brand Metrics\n", "\n", - "Brand campaigns don't just move awareness \u2014 they should also move consideration and\n", + "Brand campaigns don't just move awareness — they should also move consideration and\n", "purchase intent. Let's measure the lift across the full brand funnel." ], "id": "60fc326a" @@ -352,35 +169,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "outcomes = [\"awareness\", \"consideration\", \"purchase_intent\"]\n", - "funnel_results = {}\n", - "\n", - "for outcome in outcomes:\n", - " did = DifferenceInDifferences()\n", - " with warnings.catch_warnings():\n", - " warnings.simplefilter(\"ignore\")\n", - " r = did.fit(\n", - " data,\n", - " outcome=outcome,\n", - " treatment=\"campaign_market\",\n", - " time=\"post_campaign\",\n", - " survey_design=sd,\n", - " )\n", - " funnel_results[outcome] = r\n", - "\n", - "# Results table\n", - "funnel_df = pd.DataFrame({\n", - " \"Metric\": [\"Awareness\", \"Consideration\", \"Purchase Intent\"],\n", - " \"Lift (pp)\": [funnel_results[o].att for o in outcomes],\n", - " \"SE\": [funnel_results[o].se for o in outcomes],\n", - " \"95% CI Lower\": [funnel_results[o].conf_int[0] for o in outcomes],\n", - " \"95% CI Upper\": [funnel_results[o].conf_int[1] for o in outcomes],\n", - " \"p-value\": [funnel_results[o].p_value for o in outcomes],\n", - "}).round(2)\n", - "\n", - "print(funnel_df.to_string(index=False))" - ], + "source": "outcomes = [\"awareness\", \"consideration\", \"purchase_intent\"]\nfunnel_results = {}\n\nfor outcome in outcomes:\n did = DifferenceInDifferences()\n r = did.fit(\n data,\n outcome=outcome,\n treatment=\"campaign_respondent\",\n time=\"post_campaign\",\n survey_design=sd,\n )\n funnel_results[outcome] = r\n\n# Results table\nfunnel_df = pd.DataFrame({\n \"Metric\": [\"Awareness\", \"Consideration\", \"Purchase Intent\"],\n \"Lift (pp)\": [funnel_results[o].att for o in outcomes],\n \"SE\": [funnel_results[o].se for o in outcomes],\n \"95% CI Lower\": [funnel_results[o].conf_int[0] for o in outcomes],\n \"95% CI Upper\": [funnel_results[o].conf_int[1] for o in outcomes],\n \"p-value\": [funnel_results[o].p_value for o in outcomes],\n}).round(2)\n\nprint(funnel_df.to_string(index=False))", "id": "f891e2f1" }, { @@ -418,7 +207,7 @@ "metadata": {}, "source": [ "The campaign moved awareness the most, consideration less, and purchase intent the\n", - "least. This is typical funnel attenuation \u2014 the message reached people but didn't fully\n", + "least. This is typical funnel attenuation — the message reached people but didn't fully\n", "convert to purchase consideration. All three effects are statistically significant." ], "id": "d3b6008d" @@ -449,20 +238,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "pt = check_parallel_trends(\n", - " data,\n", - " outcome=\"awareness\",\n", - " time=\"wave\",\n", - " treatment_group=\"campaign_market\",\n", - ")\n", - "\n", - "print(f\"Pre-campaign trend difference: {pt['trend_difference']:.3f}\")\n", - "print(f\"p-value: {pt['p_value']:.3f}\")\n", - "print(f\"\\nParallel trends {'supported' if pt['parallel_trends_plausible'] else 'NOT supported'}\")\n", - "if pt[\"parallel_trends_plausible\"]:\n", - " print(\"Before the campaign, awareness was trending at the same rate in both groups.\")" - ], + "source": "pt = check_parallel_trends(\n data,\n outcome=\"awareness\",\n time=\"wave\",\n treatment_group=\"campaign_respondent\",\n)\n\nprint(f\"Pre-campaign trend difference: {pt['trend_difference']:.3f}\")\nprint(f\"p-value: {pt['p_value']:.3f}\")\nprint(f\"\\nParallel trends {'consistent with the data' if pt['parallel_trends_plausible'] else 'NOT supported'}\")\nif pt[\"parallel_trends_plausible\"]:\n print(\"Before the campaign, awareness was trending at a similar rate in both groups.\")", "id": "3d718ede" }, { @@ -481,39 +257,13 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "# Use waves 1-4 only; split at wave 3 as a \"placebo\" campaign launch\n", - "pre_data = data[data[\"wave\"] <= 4].copy()\n", - "pre_data[\"placebo_post\"] = (pre_data[\"wave\"] >= 3).astype(int)\n", - "\n", - "did_placebo = DifferenceInDifferences()\n", - "r_placebo = did_placebo.fit(\n", - " pre_data,\n", - " outcome=\"awareness\",\n", - " treatment=\"campaign_market\",\n", - " time=\"placebo_post\",\n", - ")\n", - "\n", - "print(f\"Placebo lift: {r_placebo.att:.2f} pp (p = {r_placebo.p_value:.3f})\")\n", - "if r_placebo.p_value > 0.05:\n", - " print(\"No significant effect in the pre-campaign period \u2014 the method isn't picking up spurious patterns.\")\n", - "else:\n", - " print(\"WARNING: Significant placebo effect detected \u2014 investigate further.\")" - ], + "source": "# Use waves 1-4 only; split at wave 3 as a \"placebo\" campaign launch\npre_data = data[data[\"wave\"] <= 4].copy()\npre_data[\"placebo_post\"] = (pre_data[\"wave\"] >= 3).astype(int)\n\ndid_placebo = DifferenceInDifferences()\nr_placebo = did_placebo.fit(\n pre_data,\n outcome=\"awareness\",\n treatment=\"campaign_respondent\",\n time=\"placebo_post\",\n)\n\nprint(f\"Placebo lift: {r_placebo.att:.2f} pp (p = {r_placebo.p_value:.3f})\")\nif r_placebo.p_value > 0.05:\n print(\"No significant effect in the pre-campaign period — the method isn't picking up spurious patterns.\")\nelse:\n print(\"WARNING: Significant placebo effect detected — investigate further.\")", "id": "ef7db9b1" }, { "cell_type": "markdown", "metadata": {}, - "source": [ - "Both checks pass: the pre-campaign trends were parallel, and the placebo test\n", - "finds no effect where none should exist. This gives us confidence the measured\n", - "lift is attributable to the campaign.\n", - "\n", - "For event study designs (Section 8 below), you can also run\n", - "[HonestDiD sensitivity analysis](../tutorials/05_honest_did.ipynb) to quantify\n", - "how robust the result is to violations of the parallel trends assumption." - ], + "source": "Both checks pass: the pre-campaign trends are consistent with the parallel trends\nassumption, and the placebo test finds no effect where none should exist. This is\nsupportive evidence that the measured lift is attributable to the campaign.\n\nNote that passing these checks does not *prove* parallel trends holds — it is always\nan untestable assumption about what *would have* happened. For event study designs\n(Section 8 below), HonestDiD sensitivity analysis can quantify how robust the result\nis to violations of this assumption.", "id": "5a36cef4" }, { @@ -526,7 +276,7 @@ "[Baker et al. (2025)](https://arxiv.org/pdf/2503.13323) practitioner workflow.\n", "It suggests diagnostic steps based on your estimator and results.\n", "\n", - "*Note: the code snippets in the output use placeholder column names \u2014 substitute\n", + "*Note: the code snippets in the output use placeholder column names — substitute\n", "your own.*" ], "id": "ea3733ec" @@ -547,7 +297,7 @@ "source": [ "## 8. Extension: Staggered Campaign Rollout\n", "\n", - "Many campaigns don't launch in all markets at once \u2014 they roll out in waves.\n", + "Many campaigns don't launch in all markets at once — they roll out in waves.\n", "Some markets go live in month 2, others in month 4. When this happens, basic\n", "DiD can give biased results. The `CallawaySantAnna` estimator handles this\n", "correctly.\n", @@ -561,39 +311,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "from diff_diff import CallawaySantAnna\n", - "from diff_diff.visualization import plot_event_study\n", - "\n", - "# Campaign rolls out in two waves: some markets at wave 3, others at wave 5\n", - "stag_raw = generate_survey_did_data(\n", - " n_units=200,\n", - " n_periods=8,\n", - " cohort_periods=[3, 5],\n", - " never_treated_frac=0.4,\n", - " treatment_effect=5.0,\n", - " dynamic_effects=True,\n", - " effect_growth=0.1, # Effect builds 10% per wave (repeated exposure)\n", - " n_strata=5,\n", - " psu_per_stratum=4,\n", - " weight_variation=\"high\",\n", - " informative_sampling=True,\n", - " return_true_population_att=True,\n", - " seed=42,\n", - ")\n", - "\n", - "stag_data = stag_raw.rename(columns={\n", - " \"unit\": \"market_id\", \"period\": \"wave\", \"outcome\": \"awareness\",\n", - " \"stratum\": \"region\", \"psu\": \"cluster\", \"weight\": \"survey_weight\",\n", - " \"first_treat\": \"campaign_start_wave\",\n", - "})\n", - "stag_data[\"awareness\"] = stag_data[\"awareness\"] + 45\n", - "\n", - "print(f\"Campaign cohorts: {sorted(stag_data['campaign_start_wave'].unique())}\")\n", - "print(f\" Wave 3 launch: {(stag_data.groupby('market_id')['campaign_start_wave'].first() == 3).sum()} markets\")\n", - "print(f\" Wave 5 launch: {(stag_data.groupby('market_id')['campaign_start_wave'].first() == 5).sum()} markets\")\n", - "print(f\" Control: {(stag_data.groupby('market_id')['campaign_start_wave'].first() == 0).sum()} markets\")" - ], + "source": "from diff_diff import CallawaySantAnna\nfrom diff_diff.visualization import plot_event_study\n\n# Campaign rolls out in two waves: some markets at wave 3, others at wave 5\nstag_raw = generate_survey_did_data(\n n_units=200,\n n_periods=8,\n cohort_periods=[3, 5],\n never_treated_frac=0.4,\n treatment_effect=5.0,\n dynamic_effects=True,\n effect_growth=0.1, # Effect builds 10% per wave (repeated exposure)\n n_strata=5,\n psu_per_stratum=4,\n weight_variation=\"high\",\n informative_sampling=True,\n return_true_population_att=True,\n seed=42,\n)\n\nstag_data = stag_raw.rename(columns={\n \"unit\": \"respondent_id\", \"period\": \"wave\", \"outcome\": \"awareness\",\n \"stratum\": \"region\", \"psu\": \"cluster\", \"weight\": \"survey_weight\",\n \"first_treat\": \"campaign_start_wave\",\n})\nstag_data[\"awareness\"] = stag_data[\"awareness\"] + 45\n\nprint(f\"Campaign cohorts: {sorted(stag_data['campaign_start_wave'].unique())}\")\nprint(f\" Wave 3 launch: {(stag_data.groupby('respondent_id')['campaign_start_wave'].first() == 3).sum()} respondents\")\nprint(f\" Wave 5 launch: {(stag_data.groupby('respondent_id')['campaign_start_wave'].first() == 5).sum()} respondents\")\nprint(f\" Control: {(stag_data.groupby('respondent_id')['campaign_start_wave'].first() == 0).sum()} respondents\")", "id": "7d1c9510" }, { @@ -601,29 +319,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "stag_sd = SurveyDesign(\n", - " weights=\"survey_weight\", strata=\"region\", psu=\"cluster\", fpc=\"fpc\",\n", - ")\n", - "\n", - "cs = CallawaySantAnna()\n", - "with warnings.catch_warnings():\n", - " warnings.simplefilter(\"ignore\")\n", - " stag_results = cs.fit(\n", - " stag_data,\n", - " outcome=\"awareness\",\n", - " unit=\"market_id\",\n", - " time=\"wave\",\n", - " first_treat=\"campaign_start_wave\",\n", - " aggregate=\"event_study\",\n", - " survey_design=stag_sd,\n", - " )\n", - "\n", - "print(stag_results)\n", - "print(f\"\\nOverall campaign lift: {stag_results.overall_att:.1f} pp\")\n", - "print(\"\\nEvent study effects (relative to campaign launch):\")\n", - "print(stag_results.to_dataframe(level=\"event_study\").round(2).to_string(index=False))" - ], + "source": "stag_sd = SurveyDesign(\n weights=\"survey_weight\", strata=\"region\", psu=\"cluster\", fpc=\"fpc\",\n)\n\n# base_period=\"universal\" is required for valid HonestDiD sensitivity analysis —\n# it uses a common reference period so pre-treatment coefficients are comparable.\ncs = CallawaySantAnna(base_period=\"universal\")\nstag_results = cs.fit(\n stag_data,\n outcome=\"awareness\",\n unit=\"respondent_id\",\n time=\"wave\",\n first_treat=\"campaign_start_wave\",\n aggregate=\"event_study\",\n survey_design=stag_sd,\n)\n\nprint(stag_results)\nprint(f\"\\nOverall campaign lift: {stag_results.overall_att:.1f} pp\")\nprint(\"\\nEvent study effects (relative to campaign launch):\")\nprint(stag_results.to_dataframe(level=\"event_study\").round(2).to_string(index=False))", "id": "dd607a40" }, { @@ -649,11 +345,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "The event study shows the campaign effect building over time \u2014 starting around 5pp at\n", - "launch and growing to about 7pp with sustained exposure. Pre-campaign periods show no\n", - "significant effects, confirming the parallel trends assumption holds." - ], + "source": "The event study shows the campaign effect building over time — starting around 5pp at\nlaunch and growing to about 7pp with sustained exposure. Pre-campaign periods show no\nsignificant effects, consistent with the parallel trends assumption.", "id": "f67896f2" }, { @@ -673,17 +365,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "from diff_diff import compute_honest_did\n", - "\n", - "with warnings.catch_warnings():\n", - " warnings.simplefilter(\"ignore\")\n", - " honest = compute_honest_did(stag_results, method=\"relative_magnitude\", M=1.0)\n", - "\n", - "print(honest.summary())\n", - "print(\"\\nIn plain English: even if the pre-campaign trends were off by as much as\")\n", - "print(\"the largest observed pre-period fluctuation, the campaign effect remains positive.\")" - ], + "source": "from diff_diff import compute_honest_did\n\nhonest = compute_honest_did(stag_results, method=\"relative_magnitude\", M=1.0)\n\nprint(honest.summary())\nprint(\"\\nIn plain English: even if the pre-campaign trends were off by as much as\")\nprint(\"the largest observed pre-period fluctuation, the campaign effect remains positive.\")", "id": "0d7c096e" }, { @@ -701,31 +383,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "r = results_survey # The main 2x2 result\n", - "\n", - "print(\"=\"* 70)\n", - "print(\"EXECUTIVE SUMMARY\")\n", - "print(\"=\"* 70)\n", - "print(f\"\"\"\n", - "The brand awareness campaign increased aided awareness by {r.att:.1f}\n", - "percentage points (95% CI: {r.conf_int[0]:.1f} to {r.conf_int[1]:.1f})\n", - "in the {data.groupby('market_id')['campaign_market'].first().sum()} campaign\n", - "markets compared to the {(~data.groupby('market_id')['campaign_market'].first().astype(bool)).sum()} control markets.\n", - "\n", - "This result accounts for the complex survey sampling design and is\n", - "supported by pre-campaign trend analysis and placebo testing.\n", - "\n", - "Impact across the brand funnel:\n", - " - Awareness: +{funnel_results['awareness'].att:.1f} pp\n", - " - Consideration: +{funnel_results['consideration'].att:.1f} pp\n", - " - Purchase Intent: +{funnel_results['purchase_intent'].att:.1f} pp\n", - "\n", - "The effect attenuates down the funnel, suggesting the campaign\n", - "successfully raised awareness but further investment is needed to\n", - "convert awareness into purchase consideration.\n", - "\"\"\")" - ], + "source": "r = results_survey # The main 2x2 result\nn_campaign = data.groupby(\"respondent_id\")[\"campaign_respondent\"].first().sum()\nn_control = (~data.groupby(\"respondent_id\")[\"campaign_respondent\"].first().astype(bool)).sum()\n\nprint(\"=\" * 70)\nprint(\"EXECUTIVE SUMMARY\")\nprint(\"=\" * 70)\nprint(f\"\"\"\nThe brand awareness campaign increased aided awareness by {r.att:.1f}\npercentage points (95% CI: {r.conf_int[0]:.1f} to {r.conf_int[1]:.1f})\namong {n_campaign} campaign-exposed respondents compared to {n_control}\ncontrol respondents.\n\nThis result accounts for the complex survey sampling design and is\nsupported by pre-campaign trend analysis and placebo testing.\n\nImpact across the brand funnel:\n - Awareness: +{funnel_results['awareness'].att:.1f} pp\n - Consideration: +{funnel_results['consideration'].att:.1f} pp\n - Purchase Intent: +{funnel_results['purchase_intent'].att:.1f} pp\n\nThe effect attenuates down the funnel, suggesting the campaign\nsuccessfully raised awareness but further investment is needed to\nconvert awareness into purchase consideration.\n\"\"\")", "id": "2fa7cc34" }, { @@ -734,11 +392,11 @@ "source": [ "**Key points for your write-up:**\n", "\n", - "- Report the **survey-aware** estimate, not the naive one \u2014 it reflects the true uncertainty\n", - "- Include confidence intervals, not just point estimates \u2014 leadership should understand the range\n", + "- Report the **survey-aware** estimate, not the naive one — it reflects the true uncertainty\n", + "- Include confidence intervals, not just point estimates — leadership should understand the range\n", "- Distinguish **statistical significance** (is the effect real?) from **practical significance**\n", " (is it big enough to matter?)\n", - "- A 5pp lift in awareness from 46% to 51% may or may not justify the campaign spend \u2014\n", + "- A 5pp lift in awareness from 46% to 51% may or may not justify the campaign spend —\n", " that's a business judgment, not a statistical one" ], "id": "d4884c2b" @@ -753,7 +411,7 @@ "\n", "- **Survey design matters**: Ignoring the complex sampling structure made standard errors\n", " more than 2x too small, creating false precision\n", - "- **DiD with survey data**: `SurveyDesign` integrates directly with all diff-diff estimators \u2014\n", + "- **DiD with survey data**: `SurveyDesign` integrates directly with all diff-diff estimators —\n", " just pass `survey_design=sd` to `.fit()`\n", "- **Brand funnel analysis**: Measuring awareness, consideration, and purchase intent together\n", " reveals where the campaign effect attenuates\n", @@ -771,11 +429,11 @@ "\n", "**Related tutorials:**\n", "\n", - "- [Tutorial 16: Survey DiD](16_survey_did.ipynb) \u2014 deep dive into survey design theory,\n", + "- [Tutorial 16: Survey DiD](16_survey_did.ipynb) — deep dive into survey design theory,\n", " replicate weights, and design effect diagnostics\n", - "- [Tutorial 02: Staggered DiD](02_staggered_did.ipynb) \u2014 more on Callaway-Sant'Anna and\n", + "- [Tutorial 02: Staggered DiD](02_staggered_did.ipynb) — more on Callaway-Sant'Anna and\n", " staggered adoption designs\n", - "- [Tutorial 05: Honest DiD](05_honest_did.ipynb) \u2014 full sensitivity analysis guide" + "- [Tutorial 05: Honest DiD](05_honest_did.ipynb) — full sensitivity analysis guide" ], "id": "f3b24495" } From bc7e55712e63323f6bf86b5ab93c9433fabdd1e6 Mon Sep 17 00:00:00 2001 From: igerber Date: Thu, 9 Apr 2026 12:04:01 -0400 Subject: [PATCH 3/3] Address round 2 review: survey-aware placebo, label informal checks - Pass survey_design=sd to placebo DiD fit so falsification uses the same design-based inference as the main estimate - Label check_parallel_trends() as informal/non-survey-aware and direct readers to HonestDiD for formal robustness assessment - Soften diagnostic summary prose: "supportive evidence" not validation - Update tutorial summary to distinguish informal checks from formal sensitivity analysis Co-Authored-By: Claude Opus 4.6 (1M context) --- .../tutorials/17_brand_awareness_survey.ipynb | 43 ++----------------- 1 file changed, 4 insertions(+), 39 deletions(-) diff --git a/docs/tutorials/17_brand_awareness_survey.ipynb b/docs/tutorials/17_brand_awareness_survey.ipynb index d0ddb74d..372d4635 100644 --- a/docs/tutorials/17_brand_awareness_survey.ipynb +++ b/docs/tutorials/17_brand_awareness_survey.ipynb @@ -225,12 +225,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "### Parallel Trends Check\n", - "\n", - "DiD assumes campaign and control markets would have continued trending the same way if\n", - "the campaign hadn't run. We can check whether the pre-campaign trends were similar." - ], + "source": "### Parallel Trends Check\n\nDiD assumes campaign and control groups would have continued trending the same way if\nthe campaign hadn't run. `check_parallel_trends()` is a quick informal check that\ncompares pre-campaign slopes — it does not account for survey design, so treat it as\na sanity check rather than a formal test. The formal robustness assessment comes from\nHonestDiD in Section 8.", "id": "96bbef84" }, { @@ -257,13 +252,13 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": "# Use waves 1-4 only; split at wave 3 as a \"placebo\" campaign launch\npre_data = data[data[\"wave\"] <= 4].copy()\npre_data[\"placebo_post\"] = (pre_data[\"wave\"] >= 3).astype(int)\n\ndid_placebo = DifferenceInDifferences()\nr_placebo = did_placebo.fit(\n pre_data,\n outcome=\"awareness\",\n treatment=\"campaign_respondent\",\n time=\"placebo_post\",\n)\n\nprint(f\"Placebo lift: {r_placebo.att:.2f} pp (p = {r_placebo.p_value:.3f})\")\nif r_placebo.p_value > 0.05:\n print(\"No significant effect in the pre-campaign period — the method isn't picking up spurious patterns.\")\nelse:\n print(\"WARNING: Significant placebo effect detected — investigate further.\")", + "source": "# Use waves 1-4 only; split at wave 3 as a \"placebo\" campaign launch\npre_data = data[data[\"wave\"] <= 4].copy()\npre_data[\"placebo_post\"] = (pre_data[\"wave\"] >= 3).astype(int)\n\n# Use survey_design here too — consistent with the main analysis\ndid_placebo = DifferenceInDifferences()\nr_placebo = did_placebo.fit(\n pre_data,\n outcome=\"awareness\",\n treatment=\"campaign_respondent\",\n time=\"placebo_post\",\n survey_design=sd,\n)\n\nprint(f\"Placebo lift: {r_placebo.att:.2f} pp (p = {r_placebo.p_value:.3f})\")\nif r_placebo.p_value > 0.05:\n print(\"No significant effect in the pre-campaign period — the method isn't picking up spurious patterns.\")\nelse:\n print(\"WARNING: Significant placebo effect detected — investigate further.\")", "id": "ef7db9b1" }, { "cell_type": "markdown", "metadata": {}, - "source": "Both checks pass: the pre-campaign trends are consistent with the parallel trends\nassumption, and the placebo test finds no effect where none should exist. This is\nsupportive evidence that the measured lift is attributable to the campaign.\n\nNote that passing these checks does not *prove* parallel trends holds — it is always\nan untestable assumption about what *would have* happened. For event study designs\n(Section 8 below), HonestDiD sensitivity analysis can quantify how robust the result\nis to violations of this assumption.", + "source": "The informal trend check is consistent with parallel trends, and the survey-aware\nplacebo test finds no effect where none should exist. Together these are supportive\nevidence, though neither formally proves the parallel trends assumption — it is always\nan untestable assumption about what *would have* happened.\n\nFor event study designs (Section 8 below), HonestDiD sensitivity analysis provides\na formal assessment of how robust the result is to violations of this assumption.", "id": "5a36cef4" }, { @@ -404,37 +399,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "## Summary\n", - "\n", - "**What we covered:**\n", - "\n", - "- **Survey design matters**: Ignoring the complex sampling structure made standard errors\n", - " more than 2x too small, creating false precision\n", - "- **DiD with survey data**: `SurveyDesign` integrates directly with all diff-diff estimators —\n", - " just pass `survey_design=sd` to `.fit()`\n", - "- **Brand funnel analysis**: Measuring awareness, consideration, and purchase intent together\n", - " reveals where the campaign effect attenuates\n", - "- **Diagnostics**: Parallel trends checks and placebo tests validate the result in plain terms\n", - "- **Staggered rollouts**: `CallawaySantAnna` handles campaigns that launch in waves, with\n", - " event study plots showing how the effect builds over time\n", - "- **Sensitivity**: HonestDiD quantifies how robust the result is to assumption violations\n", - "\n", - "**When to use this approach:**\n", - "\n", - "- You have survey data collected before and after a campaign or intervention\n", - "- The campaign ran in some markets/regions but not others\n", - "- Randomized A/B testing wasn't feasible\n", - "- Your survey uses stratified sampling, clustering, or weighting\n", - "\n", - "**Related tutorials:**\n", - "\n", - "- [Tutorial 16: Survey DiD](16_survey_did.ipynb) — deep dive into survey design theory,\n", - " replicate weights, and design effect diagnostics\n", - "- [Tutorial 02: Staggered DiD](02_staggered_did.ipynb) — more on Callaway-Sant'Anna and\n", - " staggered adoption designs\n", - "- [Tutorial 05: Honest DiD](05_honest_did.ipynb) — full sensitivity analysis guide" - ], + "source": "## Summary\n\n**What we covered:**\n\n- **Survey design matters**: Ignoring the complex sampling structure made standard errors\n more than 2x too small, creating false precision\n- **DiD with survey data**: `SurveyDesign` integrates directly with all diff-diff estimators —\n just pass `survey_design=sd` to `.fit()`\n- **Brand funnel analysis**: Measuring awareness, consideration, and purchase intent together\n reveals where the campaign effect attenuates\n- **Diagnostics**: Informal trend checks and survey-aware placebo tests provide supportive\n evidence; HonestDiD provides formal robustness assessment\n- **Staggered rollouts**: `CallawaySantAnna` handles campaigns that launch in waves, with\n event study plots showing how the effect builds over time\n- **Sensitivity**: HonestDiD quantifies how robust the result is to assumption violations\n\n**When to use this approach:**\n\n- You have survey data collected before and after a campaign or intervention\n- The campaign ran in some markets/regions but not others\n- Randomized A/B testing wasn't feasible\n- Your survey uses stratified sampling, clustering, or weighting\n\n**Related tutorials:**\n\n- [Tutorial 16: Survey DiD](16_survey_did.ipynb) — deep dive into survey design theory,\n replicate weights, and design effect diagnostics\n- [Tutorial 02: Staggered DiD](02_staggered_did.ipynb) — more on Callaway-Sant'Anna and\n staggered adoption designs\n- [Tutorial 05: Honest DiD](05_honest_did.ipynb) — full sensitivity analysis guide", "id": "f3b24495" } ],