feat(staggered): materialize non-estimable (g,t) cells as NaN in CallawaySantAnna by igerber · Pull Request #582 · igerber/diff-diff

igerber · 2026-06-29T12:54:30Z

Summary

CallawaySantAnna now materializes non-estimable (g,t) cells as NaN entries (carrying a machine-readable skip_reason) uniformly across all estimation paths — no-covariate regression, covariate regression, IPW/DR, repeated cross-section, and survey-weighted — instead of silently dropping them from the grid. Previously only the covariate-regression singular case did this; the other paths omitted the cell.
NaN cells carry no influence-function entry, so they are excluded from every aggregation (simple/group/calendar/event-study), from balance_e, and from the bootstrap. All aggregate point estimates and standard errors — and the event-study n_groups / by-group n_periods metadata — are numerically unchanged and continue to match R did's aggte().
The "could not estimate any" guard now fires on no-finite-effect (not empty-dict), so an all-non-estimable fit still raises ValueError.
to_dataframe("group_time") now includes the NaN rows and a skip_reason column; the GroupTimeEffect dataclass documents the new field.

Methodology references (required if estimator / math changes)

Method name(s): CallawaySantAnna (Callaway & Sant'Anna 2021)
Paper / source link(s): docs/methodology/REGISTRY.md → "CallawaySantAnna" edge cases
Any intentional deviations from the source (and why): Per-cell surface deviation from R's did::att_gt, which omits non-estimable cells from its result table; diff-diff materializes them as NaN rows (with skip_reason) for inspectability. R's aggte() aggregation behavior is matched exactly — non-estimable cells contribute nothing to any aggregate. Documented with a **Deviation from R:** label in REGISTRY.md.

Validation

Tests added/updated: tests/test_staggered.py — new TestCallawaySantAnnaNonEstimableMaterialization (per-path materialization parametrized over reg/ipw/dr panel + reg RCS, skip_reason mapping, to_dataframe column, all-non-estimable raise); extended test_nan_cell_preserved_not_dropped to assert skip_reason. Existing CS NaN / aggregation / balance_e and test_methodology_callaway.py R-parity suites unchanged and green; shared-aggregation-mixin consumers (ContinuousDiD, TripleDifference, HonestDiD, ImputationDiD, EfficientDiD) and the diagnostic-report suite all green.
Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…awaySantAnna Uniformly materialize a NaN entry (with a machine-readable skip_reason) for every non-estimable (g,t) group-time cell across all CS estimation paths (no-covariate regression, covariate regression, IPW/DR, repeated cross-section, survey-weighted) instead of omitting it. Previously only the covariate-singular case materialized NaN; the other paths dropped the cell silently from the grid. Cells carry no influence-function entry, so they are excluded from every aggregation (simple/group/calendar/event-study), balance_e, and bootstrap -- all aggregate point estimates and SEs, plus event-study n_groups / by-group n_periods, are numerically unchanged and continue to match R did's aggte(). A fit where no cell is estimable still raises ValueError. to_dataframe("group_time") now includes the NaN rows and a skip_reason column. Documented per-cell surface deviation from R's att_gt (which omits the rows). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-29T12:58:21Z

Overall Assessment

✅ Looks good. I found no unmitigated P0/P1 issues. The R did::att_gt surface deviation is explicitly documented in docs/methodology/REGISTRY.md, and the aggregate paths finite-mask materialized NaN cells.

Executive Summary

Affected method: CallawaySantAnna, including panel, RCS, IPW/DR, regression, survey-weighted, aggregation, and bootstrap surfaces.
Methodology deviation from R is documented with a **Deviation from R:** label, so it is not a defect.
Aggregation and bootstrap aggregate prep exclude non-finite effects before weighting.
No new inline inference anti-patterns found; new NaN rows use all-NaN inference fields.
Remaining items are P2/P3 hygiene: stale TODO, release metadata/version rollback, and test coverage gaps.

Methodology

Severity: P2
Finding: Non-finite regression NaN cells in the covariate-vectorized path still receive influence_func_info, despite the new registry/helper contract saying NaN cells carry no IF entry and are excluded from bootstrap. The aggregate bootstrap finite-masks them, so I do not see wrong aggregate estimates/SEs. Location: diff_diff/staggered.py:L1523-L1557, diff_diff/staggered_bootstrap.py:L223-L226, docs/methodology/REGISTRY.md:L528-L530.
Impact: Internal contract/documentation mismatch and avoidable per-cell bootstrap work for NaN cells.
Concrete fix: When nan_cell is true, store via _nan_gt_entry(..., skip_reason="non_finite_regression") and do not add influence_func_info / task_keys for that cell, or narrow the registry wording.
Severity: P3 informational
Finding: Materializing non-estimable (g,t) cells differs from R did::att_gt, but this is documented with **Deviation from R:**. Location: docs/methodology/REGISTRY.md:L528-L530.
Impact: No action required under the review rules; aggregate behavior matches the documented omit-equivalent contract.
Concrete fix: None required.

Code Quality

Severity: P3
Finding: TODO.md still lists this exact feature as actionable and “currently omitted.” Location: TODO.md:L31-L32.
Impact: Future work tracking is stale and may cause duplicate effort.
Concrete fix: Remove the row or mark it completed with this PR reference.

Performance

Severity: P3
Finding: The non-finite regression IF-entry issue above can allocate bootstrap columns for NaN cells.
Impact: Minor avoidable bootstrap memory/CPU if many cells are non-finite.
Concrete fix: Same as Methodology P2: omit IF entries for NaN cells.

Maintainability

Severity: P2
Finding: Release metadata is downgraded from 3.6.0 to 3.5.3 while adding unreleased feature content. Locations: pyproject.toml:L7, diff_diff/__init__.py:L304, CITATION.cff:L8-L9, rust/Cargo.toml:L3.
Impact: If 3.6.0 is already the branch baseline, this regresses package/citation metadata.
Concrete fix: Confirm intended release baseline; if not intentional, keep version metadata at 3.6.0 or move all feature text under the correct unreleased version policy.

Tech Debt

Severity: P3
Finding: The old TODO entry for materializing CS NaN cells was not reconciled.
Impact: Tracked debt no longer reflects implementation state.
Concrete fix: Update TODO.md alongside this PR.

Security

Severity: None
Finding: No secrets, unsafe execution, or security-sensitive changes found.
Impact: None.
Concrete fix: None.

Documentation/Tests

Severity: P3
Finding: New tests cover panel no-cov regression, panel IPW/DR without covariates, and RCS regression, but not survey zero-weight-mass materialization, missing-period materialization, or covariate IPW/DR/RCS IPW/DR paths. Location: tests/test_staggered.py:L1343-L1346.
Impact: Some documented “uniform across all paths” behavior is not directly pinned.
Concrete fix: Add parametrized tests for zero_weight_mass with survey weights and covariate IPW/DR/RCS cells.

I could not run the selected tests in this environment because pytest and numpy are not installed.

#582) - The covariate-regression non-finite cell now materializes via _nan_gt_entry with NO influence-function entry, matching the other paths and the documented REGISTRY/helper contract (previously it wrote a zero-IF entry and ran batch inference). Aggregates and SEs are unchanged -- the cell was finite-masked / IF-membership-filtered out either way; now the "NaN cells carry no IF entry" invariant holds uniformly across all paths. - Extend the materialization test to cover covariate IPW/DR (panel + RCS) paths. - Remove the now-implemented CallawaySantAnna NaN-cell row from TODO.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-29T13:16:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: fa149d55eabcc831551a2338af2fb5b0500d248c

Overall Assessment

⚠️ Needs changes — one unmitigated P1. No P0 findings.

Executive Summary

Affected method: CallawaySantAnna, especially non-estimable (g,t) materialization across panel, RCS, general IPW/DR/regression, aggregation, and bootstrap.
The documented R did::att_gt surface deviation is properly labeled in REGISTRY.md; that part is not a defect.
The prior covariate-vectorized IF leak is fixed, but the same non-finite ATT pattern still exists in the general panel/RCS callers.
Aggregations finite-mask NaN/non-finite effects, so I do not see wrong aggregate ATT/SE output from the issue.
pytest could not be run in this environment: /usr/bin/python: No module named pytest.

Methodology

Severity: P1 [Newly identified]
Finding: General panel and RCS paths still treat att_gt is not None as estimable, even when att_gt is non-finite. Those cells are stored with "effect": att_gt, may surface inf rather than a NaN entry, are not counted in the consolidated skip warning, and still add influence_func_info. This conflicts with the registry contract that non-estimable cells are NaN entries, carry no IF entry, and are excluded from bootstrap. Locations: diff_diff/staggered.py:L2021-L2067, diff_diff/staggered.py:L2131-L2176, diff_diff/staggered.py:L3482-L3501, docs/methodology/REGISTRY.md:L528-L530.
Impact: Non-finite regression cells can violate the documented per-cell surface and bootstrap membership contract. Aggregate estimates remain protected by finite masks, but the per-cell result/IF contract is not uniformly implemented.
Concrete fix: Before building gte_entry or adding influence_func_info, branch on not np.isfinite(att_gt) and store _nan_gt_entry(..., skip_reason="non_finite_regression"); do not add that key to influence_func_info.
Severity: P3 informational
Finding: Materializing non-estimable cells differs from R did::att_gt, but the deviation is documented with a **Deviation from R:** label. Location: docs/methodology/REGISTRY.md:L530.
Impact: No action required under the review rules.
Concrete fix: None.

Code Quality

Severity: None
Impact: No additional code-quality findings beyond the methodology item.
Concrete fix: None.

Performance

Severity: P3
Finding: The P1 issue can still allocate/bootstrap IF columns for non-finite cells that should have no IF entry. Locations: diff_diff/staggered.py:L2066-L2067, diff_diff/staggered.py:L2175-L2176, diff_diff/staggered_bootstrap.py:L223-L226.
Impact: Avoidable bootstrap work for cells that should be omitted from bootstrap membership.
Concrete fix: Same as the P1 fix: omit IF entries for all non-finite materialized cells.

Maintainability

Severity: P2
Finding: Release metadata is downgraded from 3.6.0 to 3.5.3 while the PR adds unreleased feature behavior. Locations: pyproject.toml:L7, diff_diff/__init__.py:L304, CITATION.cff:L10-L11, rust/Cargo.toml:L3, CHANGELOG.md:L8-L60.
Impact: Built artifacts and citation metadata can report an older version while containing newer behavior.
Concrete fix: Restore the intended release/version metadata, or clearly align the changelog and package metadata with the project’s unreleased-version policy.

Tech Debt

Severity: None
Impact: The previous stale TODO.md item for this exact feature was removed.
Concrete fix: None.

Security

Severity: None
Impact: No secrets, unsafe execution, or security-sensitive changes found.
Concrete fix: None.

Documentation/Tests

Severity: P3
Finding: New registry/changelog prose says cells are excluded from “calendar” aggregation, but CallawaySantAnna.fit() documents only simple, event_study, group, and all. Locations: docs/methodology/REGISTRY.md:L529, CHANGELOG.md:L54-L56, diff_diff/staggered.py:L1652-L1658.
Impact: Documentation overstates the public aggregation surface.
Concrete fix: Remove “calendar” from the new CallawaySantAnna prose unless a calendar aggregation path is added.
Severity: P3
Finding: Tests cover many materialization paths, but not the non-finite general/RCS path with bootstrap membership.
Impact: The P1 regression was not pinned.
Concrete fix: Add tests that poison a general panel path and an RCS path to return non-finite att_gt with n_bootstrap > 0, then assert the cell is all-NaN, has skip_reason="non_finite_regression", and is absent from bootstrap_results.group_time_ses.

Path to Approval

Add a non-finite att_gt branch in both RCS and general panel callers before gte_entry construction.
Ensure non-finite cells are stored through _nan_gt_entry(..., skip_reason="non_finite_regression") and never inserted into influence_func_info.
Add focused regression tests for non-finite general panel and RCS cells with bootstrap enabled, asserting all-NaN fields and no bootstrap membership.

…/RCS paths (review #582 P1) The general (IPW/DR) and RCS estimable builders treated `att_gt is not None` as estimable even when att_gt was non-finite (NaN/inf): they stored effect=att_gt (which could surface inf), kept the influence-function entry, and did not count the cell in the consolidated skip total. Both now branch on `att_gt is None or not np.isfinite(att_gt)` first and materialize via _nan_gt_entry(skip_reason="non_finite_regression") with NO IF entry, so the documented contract (non-estimable cells are NaN entries, carry no IF, excluded from the bootstrap) holds uniformly across every path. Aggregate estimates and SEs are unchanged (these cells were finite-masked / IF-filtered out either way). Adds a regression test mimicking the inf-ATT-with-IF leak. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-29T13:24:54Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6910a57a846bd2af1455daa739a7c185c4c9bccf

Overall Assessment

⛔ Blocker — one unmitigated P0. The prior general panel/RCS non-finite ATT issue is fixed, but the no-covariate vectorized path still violates the new registry contract.

Executive Summary

Affected method: CallawaySantAnna, specifically non-estimable (g,t) materialization.
Prior P1 for general panel/RCS non-finite ATT handling appears resolved at diff_diff/staggered.py:L2021-L2033 and diff_diff/staggered.py:L2135-L2147.
[Newly identified] The no-covariate vectorized regression path still stores non-finite att as the cell effect and adds IF membership.
The documented R did::att_gt surface deviation is properly labeled in REGISTRY.md; that is not a defect.
I could not run tests locally: the environment is missing numpy.

Methodology

Severity: P0 [Newly identified]
Finding: _compute_all_att_gt_vectorized() detects non-finite att only by setting skip_reason, but it still stores the raw non-finite effect, inserts influence_func_info, appends the cell to batch inference, and allows bootstrap membership. This violates the registry contract that non-estimable cells have all inference fields NaN, carry no IF entry, and are excluded from bootstrap. If att=inf and se remains finite, safe_inference_batch() can produce t_stat=inf, p_value=0, and infinite CI endpoints. Locations: diff_diff/staggered.py:L1173-L1199, diff_diff/staggered.py:L1222-L1242, docs/methodology/REGISTRY.md:L528-L530.
Impact: Per-cell statistical output can be silently misleading, and bootstrap membership contradicts the documented method contract.
Concrete fix: Before building gte_entry or inf_info_gt, branch on not np.isfinite(att), store _nan_gt_entry(n_treated=n_t, n_control=n_c, skip_reason="non_finite_regression"), count the skip, and continue.
Severity: P3 informational
Finding: Materializing non-estimable cells differs from R did::att_gt, but this is documented with **Deviation from R:**. Location: docs/methodology/REGISTRY.md:L530.
Impact: No action required.
Concrete fix: None.

Code Quality

Severity: None
Impact: No separate code-quality findings beyond the P0 control-flow issue.
Concrete fix: None.

Performance

Severity: P3
Finding: The P0 issue also leaves non-finite no-covariate cells eligible for IF/bootstrap work. Location: diff_diff/staggered.py:L1199.
Impact: Avoidable bootstrap work for cells that should be omitted.
Concrete fix: Same as the P0 fix: do not add IF info for materialized NaN cells.

Maintainability

Severity: P2
Finding: Version metadata is downgraded from 3.6.0 to 3.5.3 while the PR adds unreleased behavior. Locations: pyproject.toml:L7, diff_diff/__init__.py:L304, CITATION.cff:L10-L11, rust/Cargo.toml:L3, CHANGELOG.md:L8-L60.
Impact: Built packages/citation metadata can report an older version than the code behavior.
Concrete fix: Restore the intended version metadata, or clearly align the changelog and package metadata with the project’s unreleased-version policy.

Tech Debt

Severity: None
Impact: The old TODO entry for this feature was removed after implementation.
Concrete fix: None.

Security

Severity: None
Impact: No secrets or security-sensitive issues found.
Concrete fix: None.

Documentation/Tests

Severity: P3
Finding: Tests cover zero-control materialization and the general IPW non-finite path, but not the no-covariate vectorized non-finite path or bootstrap exclusion for that case. Location: tests/test_staggered.py:L1440-L1477.
Impact: The P0 regression is not pinned.
Concrete fix: Add a no-covariate regression test with n_bootstrap > 0 asserting the poisoned/non-finite cell is all-NaN, has skip_reason="non_finite_regression", and is absent from bootstrap_results.group_time_ses.
Severity: P3
Finding: New prose says cells are excluded from “calendar” aggregation, but CallawaySantAnna.fit() documents only simple, event_study, group, and all. Locations: CHANGELOG.md:L54-L56, docs/methodology/REGISTRY.md:L529, diff_diff/staggered.py:L1652-L1658.
Impact: Documentation overstates the public aggregation surface.
Concrete fix: Remove “calendar” from the CallawaySantAnna-specific prose unless a calendar aggregation API exists.

Path to Approval

Add the non-finite branch in _compute_all_att_gt_vectorized() before gte_entry and influence_func_info construction.
Ensure that branch writes _nan_gt_entry(..., skip_reason="non_finite_regression") and does not append to atts, ses, task_keys, or influence_func_info.
Add a focused no-covariate regression test with bootstrap enabled that asserts all-NaN cell inference and no bootstrap membership.

…view #582 P0) Completes the uniform invariant: every CallawaySantAnna estimable-cell builder (no-covariate vectorized, covariate regression, general IPW/DR, RCS) now routes a non-finite ATT(g,t) to _nan_gt_entry(skip_reason="non_finite_regression") with NO influence-function entry, skipping batch inference and bootstrap membership. The no-covariate diff-in-means ATT is finite given n_t,n_c>0, but a non-finite outcome (inf survives the NaN-only valid mask) could otherwise store inf as the effect and produce t_stat=inf / p=0 / infinite CI via safe_inference_batch. Aggregate estimates and SEs are unchanged. Adds a regression test injecting an inf outcome through the no-covariate path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-29T13:32:07Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5e499d2e1f95d1c263bbcaf1c42c8cde12d08cc2

Overall Assessment

⚠️ Needs changes — prior P0 is resolved, but I found one new unmitigated P1 aggregate-surface mismatch.

Executive Summary

Prior P0 is fixed: no-covariate vectorized non-finite ATT cells now become _nan_gt_entry(...) before IF membership/bootstrap inclusion.
The per-cell materialization deviation from R is documented in REGISTRY.md; that part is not a defect.
[Newly identified] Event-study aggregation can still emit all-NaN relative-time buckets, which contradicts the registry claim that NaN cells are excluded from aggregation and aggregate surfaces match prior omit/R aggte() behavior.
Tests cover per-cell materialization and finite overall aggregation, but not the all-NaN event-time bucket surface.
I could not run tests locally because numpy is not installed.

Methodology

Severity: P1 [Newly identified]
Impact: _aggregate_event_study() finite-filters cells, but when a relative-time bucket contains only materialized NaN cells it appends an event-study row with effect=NaN, se=NaN, and n_groups=0 instead of omitting the bucket. That is a new aggregate surface difference for cells that were previously omitted, and it conflicts with the documented contract that NaN cells are excluded from event-study aggregation and aggregate behavior matches R did::aggte() exactly. Locations: diff_diff/staggered_aggregation.py:L661-L672, docs/methodology/REGISTRY.md:L528-L530.
Concrete fix: In _aggregate_event_study(), when finite filtering leaves len(effs) == 0, continue rather than appending a NaN aggregate row. Add a test with control_group="not_yet_treated" where the final event-time has only non-estimable cells and assert that relative time is absent from event_study_effects.
Severity: P3 informational
Impact: Materializing non-estimable (g,t) cells differs from R did::att_gt, but the deviation is explicitly documented. Location: docs/methodology/REGISTRY.md:L530.
Concrete fix: None.

Code Quality

Severity: None
Impact: The previous no-covariate non-finite control-flow issue is fixed cleanly at diff_diff/staggered.py:L1177-L1184.
Concrete fix: None.

Performance

Severity: None
Impact: NaN cells no longer enter IF/bootstrap membership in the reviewed paths.
Concrete fix: None.

Maintainability

Severity: P2
Impact: Version metadata is still downgraded from 3.6.0 to 3.5.3 across package/citation/Rust metadata while the PR adds unreleased behavior. Locations: pyproject.toml:L7, diff_diff/__init__.py:L304, CITATION.cff:L10-L11, rust/Cargo.toml:L3.
Concrete fix: Confirm this is intentional release-management work; otherwise restore the intended unreleased/package version alignment.

Tech Debt

Severity: None
Impact: The old TODO entry for materializing CS NaN cells was removed after implementation.
Concrete fix: None.

Security

Severity: None
Impact: No secrets or security-sensitive changes found.
Concrete fix: None.

Documentation/Tests

Severity: P3
Impact: CallawaySantAnna docs/changelog mention “calendar” aggregation, but the public fit() aggregation options are simple, event_study, group, and all. Locations: docs/methodology/REGISTRY.md:L529, CHANGELOG.md:L51-L56, diff_diff/staggered.py:L1665-L1671.
Concrete fix: Remove “calendar” from CallawaySantAnna-specific prose unless a calendar aggregation API exists.

Path to Approval

Change _aggregate_event_study() so all-NaN relative-time buckets are omitted after finite filtering.
Add a focused event-study regression test asserting a relative time with only materialized NaN (g,t) cells is absent from event_study_effects.
Re-run the CS staggered tests, including the new non-estimable materialization tests.

…nt study (review #582 P1) _aggregate_event_study() appended an all-NaN row (effect=NaN, se=NaN, n_groups=0) for a relative-time bucket whose cells are all non-estimable, instead of omitting the bucket. With NaN-cell materialization this surfaced new all-NaN event-study rows where the bucket previously had no cells (and thus no row) -- an aggregate- surface change vs the prior omit behavior and R did::aggte(). The bucket is now dropped when finite-filtering leaves no cell (tracked via a kept-periods list so the result lists stay aligned), matching _aggregate_by_group, which already omits all-NaN groups. Adds a test asserting an all-non-estimable relative time is absent from event_study_effects. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-29T13:40:33Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 785773d34d965871e4fb3fb13b9ed7a76df79916

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Prior P1 is resolved: all-NaN event-time buckets are now omitted, and n_groups counts only finite contributing cells.
The per-cell NaN materialization deviation from R is documented in REGISTRY.md, so it is informational, not a defect.
NaN cells carry no IF entry and are filtered out of aggregation/bootstrap paths.
Remaining findings are P3 documentation/warning-surface polish only.
I could not run pytest because pytest is not installed; AST parsing of touched Python files passed.

Methodology

Severity: P3 informational
Impact: Materializing non-estimable (g,t) cells differs from R did::att_gt, but this is explicitly documented as a **Deviation from R:**; aggregate behavior still matches the registry contract. Locations: docs/methodology/REGISTRY.md:L528-L530, diff_diff/staggered.py:L225-L260.
Concrete fix: None required.
Severity: None
Impact: The prior event-study aggregate mismatch is fixed: all-NaN buckets continue, emitted periods are tracked via agg_periods, and n_groups is finite-filtered. Locations: diff_diff/staggered_aggregation.py:L662-L675, diff_diff/staggered_aggregation.py:L696-L744.
Concrete fix: None.

Code Quality

Severity: P3
Impact: Warning accounting is not fully aligned with the registry wording that the consolidated warning lists skip reasons/counts. The no-covariate vectorized non-finite path materializes skip_reason="non_finite_regression" but does not add it to _skip_info; zero-weight cells are counted under empty_cell. Locations: diff_diff/staggered.py:L1137-L1143, diff_diff/staggered.py:L1177-L1184, diff_diff/staggered.py:L2235-L2249.
Concrete fix: Add explicit skip buckets for zero_weight_mass and non_finite_regression and include them in the consolidated warning.

Performance

Severity: None
Impact: NaN cells are excluded before bootstrap by IF membership and finite filtering. Locations: diff_diff/staggered_bootstrap.py:L223-L225, diff_diff/staggered_bootstrap.py:L261-L269, diff_diff/staggered_bootstrap.py:L762-L769.
Concrete fix: None.

Maintainability

Severity: None
Impact: Helper return arity changes are propagated to both callers of _compute_att_gt_fast() / _compute_att_gt_rc().
Concrete fix: None.

Tech Debt

Severity: None
Impact: The implemented CS NaN-materialization TODO was removed; no new untracked blocker found. Location: TODO.md:L27-L32.
Concrete fix: None.

Security

Severity: None
Impact: No secrets or security-sensitive changes found.
Concrete fix: None.

Documentation/Tests

Severity: P3
Impact: CallawaySantAnna prose says NaN cells are excluded from “calendar” aggregation, but the public fit() options are simple, event_study, group, and all. Locations: CHANGELOG.md:L54-L56, docs/methodology/REGISTRY.md:L529, diff_diff/staggered.py:L1665-L1671.
Concrete fix: Remove “calendar” from CallawaySantAnna-specific prose unless a calendar aggregation API is added.
Severity: P3 informational
Impact: I could not execute the focused tests because pytest is not installed. AST parsing passed for touched Python files.
Concrete fix: Run tests/test_staggered.py::TestCallawaySantAnnaNonEstimableMaterialization in CI or a dev environment with pytest installed.

…cell notes (review #582 P3) CallawaySantAnna's aggregation options are simple / event_study / group / all (there is no calendar aggregation). Remove "calendar" from the REGISTRY edge-case Note and the CHANGELOG entry listing which aggregations exclude materialized NaN cells. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-29T13:47:38Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 4291a25358e5a3d89e70b48fbb50f5fb65b16737

Overall Assessment

✅ Looks good - no unmitigated P0/P1 findings.

Executive Summary

Affected method: CallawaySantAnna.
The NaN materialization behavior is documented in REGISTRY.md as a Deviation from R, so it is not a methodology defect.
Prior event-study all-NaN bucket issue is resolved: empty relative-time buckets are omitted and counts use finite contributors only.
Non-blocking issues remain around warning granularity and metadata on newly materialized NaN rows.
I could not run pytest because pytest/runtime deps are unavailable; AST parsing of touched Python files passed.

Methodology

Severity: P3 informational
Impact: CallawaySantAnna now differs from R did::att_gt by surfacing non-estimable (g,t) cells as NaN rows, but this is explicitly documented with **Deviation from R:**. Aggregation finite-masks these cells and bootstrap filters to IF-backed cells, matching the registry contract for aggte() behavior. Locations: docs/methodology/REGISTRY.md:L528-L530, diff_diff/staggered.py:L225-L260, diff_diff/staggered_aggregation.py:L108-L126, diff_diff/staggered_bootstrap.py:L223-L269.
Concrete fix: None required.
Severity: None
Impact: Previous P1 is resolved. Event-study aggregation now skips all-NaN relative-time buckets and counts only finite contributing cells. Group aggregation similarly counts finite periods only. Locations: diff_diff/staggered_aggregation.py:L662-L675, diff_diff/staggered_aggregation.py:L694-L701, diff_diff/staggered_aggregation.py:L736-L745, diff_diff/staggered_aggregation.py:L881-L899.
Concrete fix: None.

Code Quality

Severity: P2
Impact: Some newly materialized NaN rows can report misleading n_treated / n_control metadata because helper sentinel returns use (0, 0) even after observed counts are known. This does not affect estimates or aggregation, but group_time_effects / to_dataframe("group_time") can show zero counts for cells that actually had treated observations, control observations, or zero survey-weight mass. Locations: diff_diff/staggered.py:L910-L914, diff_diff/staggered.py:L925-L929, diff_diff/staggered.py:L1373-L1378, diff_diff/staggered.py:L3378-L3384, diff_diff/staggered.py:L3399-L3404.
Concrete fix: Return the observed per-cell counts on non-estimable exits once masks have been computed; for RCS, return n_gt / n_ct for display even when the missing arm is pre-period.
Severity: P3
Impact: Warning accounting still does not fully match the registry wording that the consolidated warning lists skip reasons/counts. zero_weight_mass is bucketed under empty_cell, and vectorized non-finite no-covariate cells are materialized without adding a corresponding skip bucket. Locations: diff_diff/staggered.py:L1137-L1143, diff_diff/staggered.py:L1177-L1184, diff_diff/staggered.py:L1257-L1260, diff_diff/staggered.py:L2234-L2255.
Concrete fix: Track skip counts by actual skip_reason and render those reason codes in the consolidated warning.

Performance

Severity: None
Impact: NaN cells are excluded before bootstrap work via IF membership and finite masks, so the materialization should not add bootstrap computation for non-estimable cells. Locations: diff_diff/staggered_bootstrap.py:L223-L269, diff_diff/staggered_bootstrap.py:L762-L769.
Concrete fix: None.

Maintainability

Severity: None
Impact: Return arity changes are propagated to the visible callers of _compute_att_gt_fast() and _compute_att_gt_rc(). Locations: diff_diff/staggered.py:L2015-L2031, diff_diff/staggered.py:L2137-L2146, diff_diff/staggered.py:L3297-L3313.
Concrete fix: None.

Tech Debt

Severity: None
Impact: The prior deferred TODO for materializing CallawaySantAnna NaN cells was removed, consistent with this PR implementing it. Location: TODO.md:L27-L32.
Concrete fix: None.

Security

Severity: None
Impact: No secrets or security-sensitive changes found in the reviewed diff.
Concrete fix: None.

Documentation/Tests

Severity: P3 informational
Impact: New tests cover materialization, skip_reason, DataFrame output, all-non-estimable raising, and event-study omission behavior. Runtime execution was not possible in this sandbox because pytest is not installed and importing the package also failed on missing numpy; AST parsing passed for touched Python files. Locations: tests/test_staggered.py:L1335-L1564.
Concrete fix: Run the focused test class in CI or a dev environment with project dependencies installed.

…ells (review #582 P2) The per-cell helper sentinel returns (and the covariate-reg empty-control batch site) hardcoded n_treated=n_control=0 for materialized NaN cells even after the observation masks had been computed, so group_time_effects / to_dataframe could show zero counts for a cell that actually had treated (or control) observations. The zero-control / zero-weight exits in _compute_att_gt_fast and _compute_att_gt_rc, and the covariate-reg empty-control batch site, now return the observed counts; missing-period exits (masks not yet built) keep 0. Display-only metadata -- estimates, SEs, and aggregation are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-29T13:56:09Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 9d57e7678721ea4bc5b4e839f4ff944aa174c97b

Overall Assessment

✅ Looks good - no unmitigated P0/P1 findings.

Executive Summary

Affected method: CallawaySantAnna.
The per-cell NaN materialization is a documented Deviation from R, so it is not a methodology defect.
Prior event-study/all-NaN bucket concern is resolved: all-NaN buckets are omitted, and n_groups / n_periods count finite contributors only.
NaN cells are created with all inference fields NaN and no IF entry, so aggregation/bootstrap paths exclude them.
Remaining issue is P3 warning/accounting granularity, not statistical correctness.
I could not run pytest/import smoke tests because this sandbox lacks pytest and numpy; AST parsing of touched Python files passed.

Methodology

Severity: P3 informational
Impact: CallawaySantAnna now materializes non-estimable (g,t) cells as NaN rows, which differs from R did::att_gt, but this is explicitly documented as a Deviation from R. Aggregation behavior remains aligned with the documented aggte() contract because simple/event-study/group aggregations finite-mask NaN effects and bootstrap filters to IF-backed cells only. Locations: docs/methodology/REGISTRY.md:L526-L531, diff_diff/staggered.py:L225-L260, diff_diff/staggered_aggregation.py:L108-L126, diff_diff/staggered_aggregation.py:L662-L711, diff_diff/staggered_bootstrap.py:L223-L269.
Concrete fix: None required.
Severity: None
Impact: Previous P1 is resolved. Event-study aggregation now omits relative-time buckets with no finite contributors and counts finite cells only; group aggregation similarly counts finite post cells only. Locations: diff_diff/staggered_aggregation.py:L662-L701, diff_diff/staggered_aggregation.py:L736-L745, diff_diff/staggered_aggregation.py:L881-L899.
Concrete fix: None.

Code Quality

Severity: P3 informational
Impact: The consolidated warning still does not fully list actual skip_reason codes/counts as described in the registry. zero_weight_mass is still counted under the generic empty-cell bucket, and vectorized non-finite cells can be materialized without contributing a reason-specific warning count. The result rows themselves carry skip_reason, so this is not a statistical-output bug. Locations: diff_diff/staggered.py:L1137-L1143, diff_diff/staggered.py:L1177-L1184, diff_diff/staggered.py:L1257-L1260, diff_diff/staggered.py:L2244-L2265.
Concrete fix: Track skip counts by the actual skip_reason across all paths and render those reason codes in the consolidated warning.

Performance

Severity: None
Impact: NaN cells are excluded before bootstrap work through IF membership filtering, so materialization should not add bootstrap computation for non-estimable cells. Locations: diff_diff/staggered_bootstrap.py:L223-L269, diff_diff/staggered_bootstrap.py:L762-L769, diff_diff/staggered_bootstrap.py:L829-L835.
Concrete fix: None.

Maintainability

Severity: None
Impact: The helper return-arity changes for _compute_att_gt_fast() and _compute_att_gt_rc() are propagated to their visible callers. Locations: diff_diff/staggered.py:L2025-L2056, diff_diff/staggered.py:L2147-L2170, diff_diff/staggered.py:L3307-L3532.
Concrete fix: None.

Tech Debt

Severity: None
Impact: The prior TODO for materializing CallawaySantAnna NaN cells was removed, consistent with this PR implementing that behavior. Location: TODO.md:L27-L32.
Concrete fix: None.

Security

Severity: None
Impact: No secrets or security-sensitive changes found in the reviewed diff.
Concrete fix: None.

Documentation/Tests

Severity: P3 informational
Impact: Tests were added for materialization, skip_reason, DataFrame output, non-finite effects, all-non-estimable raising, and event-study omission behavior. Local runtime validation was unavailable because pytest and numpy are not installed; AST parsing passed for touched Python files. Locations: tests/test_staggered.py:L1333-L1565, diff_diff/staggered_results.py:L39-L55, diff_diff/staggered_results.py:L430-L445.
Concrete fix: Run the focused TestCallawaySantAnnaNonEstimableMaterialization suite in CI or a dev environment with project dependencies installed.

Three test_csdid_ported.py tests relied on non-estimable (g,t) cells being OMITTED from group_time_effects; with the new NaN-cell materialization those cells are present as NaN, so the tests' membership / golden-iteration guards no longer skip them. Update them to preserve their intent on the FINITE cells: - test_some_units_treated_first_period: a first-period cohort (no base period) is now all-NaN (missing_period) rather than absent -> assert it is all-NaN. - test_zero_pretreatment_outcomes: skip NaN pre-cells (the last cohort under not_yet_treated has no controls); finite pre-cells are still ~0. - test_golden_fewer_periods: skip NaN cells (gapped panel where base g-1 is unobserved -> missing_period; R falls back to an available base) -> R-parity on the finite cells. No source change; the cells are correctly non-estimable, only now visible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-29T14:44:49Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d1f2bb44105185a2c43875f9782ea444c1639235

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Affected method: CallawaySantAnna.
The NaN materialization of non-estimable (g,t) cells is explicitly documented as a Deviation from R, so it is not a methodology defect.
Prior event-study/all-NaN bucket concern remains resolved: all-NaN relative-time buckets are omitted, and metadata counts finite contributors only.
NaN cells carry all-NaN inference fields and no IF entry, so aggregation and bootstrap paths exclude them.
Remaining items are P3 informational: warning reason-count granularity and a small survey-specific test-coverage gap.

Methodology

Severity: P3 informational
Impact: CallawaySantAnna now materializes non-estimable cells as NaN rows, differing from R did::att_gt, but this is documented with a Deviation from R note. Aggregation behavior remains aligned with the registry: simple/group/event-study finite-mask NaN effects, balance_e uses only finite anchor cells, and bootstrap filters to IF-backed cells. Locations: docs/methodology/REGISTRY.md:L526-L531, diff_diff/staggered.py:L225-L260, diff_diff/staggered_aggregation.py:L108-L126, diff_diff/staggered_aggregation.py:L620-L646, diff_diff/staggered_bootstrap.py:L223-L269.
Concrete fix: None required.
Severity: None
Impact: Previous P1 is resolved. Event-study aggregation now skips relative-time buckets with no finite contributors and counts only finite cells; group aggregation similarly counts finite post cells only. Locations: diff_diff/staggered_aggregation.py:L662-L711, diff_diff/staggered_aggregation.py:L736-L745, diff_diff/staggered_aggregation.py:L881-L899.
Concrete fix: None.

Code Quality

Severity: P3 informational
Impact: The consolidated warning still groups some real skip_reason values under coarse buckets. For example, vectorized zero_weight_mass contributes to empty_cell, while general/RCS non-estimable returns increment generic “insufficient data or non-estimable cells.” Result rows still carry the correct skip_reason, so this is not a statistical-output bug. Locations: diff_diff/staggered.py:L1257-L1260, diff_diff/staggered.py:L1637-L1640, diff_diff/staggered.py:L2044-L2056, diff_diff/staggered.py:L2158-L2170, diff_diff/staggered.py:L2244-L2265.
Concrete fix: Track warning counts from the materialized cell’s actual skip_reason across all paths.

Performance

Severity: None
Impact: NaN materialized cells do not add bootstrap work because bootstrap gt_pairs are filtered to keys present in influence_func_info. Locations: diff_diff/staggered_bootstrap.py:L223-L269.
Concrete fix: None.

Maintainability

Severity: None
Impact: The changed helper return shapes for _compute_att_gt_fast() and _compute_att_gt_rc() are propagated to visible callers; no stale arity call sites found. Locations: diff_diff/staggered.py:L819-L829, diff_diff/staggered.py:L2025-L2041, diff_diff/staggered.py:L2147-L2156, diff_diff/staggered.py:L3307-L3323.
Concrete fix: None.

Tech Debt

Severity: None
Impact: The prior TODO for materializing CallawaySantAnna NaN cells was removed consistently with this PR implementing that behavior. Location: TODO.md:L27-L32.
Concrete fix: None.

Security

Severity: None
Impact: No secrets or security-sensitive changes found in the reviewed diff.
Concrete fix: None.

Documentation/Tests

Severity: P3 informational
Impact: New tests cover materialization across main panel/RCS/reg/IPW/DR paths, DataFrame output, all-non-estimable raising, non-finite ATT handling, and all-NaN event-time omission. Survey zero-mass currently has a skip-warning test, but not a direct assertion that the new materialized cell has skip_reason="zero_weight_mass". Locations: tests/test_staggered.py:L1347-L1569, tests/test_staggered.py:L4851-L4882, tests/test_csdid_ported.py:L636-L645, tests/test_csdid_ported.py:L1212-L1224.
Concrete fix: Optional: extend the survey zero-mass test to assert the affected cells are NaN with skip_reason="zero_weight_mass".

Validation note: focused pytest could not run because this sandbox lacks pytest, numpy, pandas, and scipy; AST parsing of touched Python files passed.

igerber added the ready-for-ci Triggers CI test workflows label Jun 29, 2026

igerber merged commit 81f4e84 into main Jun 29, 2026
42 of 43 checks passed

igerber deleted the feat/cs-nan-cell-materialization branch June 29, 2026 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(staggered): materialize non-estimable (g,t) cells as NaN in CallawaySantAnna#582

feat(staggered): materialize non-estimable (g,t) cells as NaN in CallawaySantAnna#582
igerber merged 8 commits into
mainfrom
feat/cs-nan-cell-materialization

igerber commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jun 29, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant