Skip to content

feat(staggered): materialize non-estimable (g,t) cells as NaN in CallawaySantAnna#582

Merged
igerber merged 8 commits into
mainfrom
feat/cs-nan-cell-materialization
Jun 29, 2026
Merged

feat(staggered): materialize non-estimable (g,t) cells as NaN in CallawaySantAnna#582
igerber merged 8 commits into
mainfrom
feat/cs-nan-cell-materialization

Conversation

@igerber

@igerber igerber commented Jun 29, 2026

Copy link
Copy Markdown
Owner

Summary

  • CallawaySantAnna now materializes non-estimable (g,t) cells as NaN entries (carrying a machine-readable skip_reason) uniformly across all estimation paths — no-covariate regression, covariate regression, IPW/DR, repeated cross-section, and survey-weighted — instead of silently dropping them from the grid. Previously only the covariate-regression singular case did this; the other paths omitted the cell.
  • NaN cells carry no influence-function entry, so they are excluded from every aggregation (simple/group/calendar/event-study), from balance_e, and from the bootstrap. All aggregate point estimates and standard errors — and the event-study n_groups / by-group n_periods metadata — are numerically unchanged and continue to match R did's aggte().
  • The "could not estimate any" guard now fires on no-finite-effect (not empty-dict), so an all-non-estimable fit still raises ValueError.
  • to_dataframe("group_time") now includes the NaN rows and a skip_reason column; the GroupTimeEffect dataclass documents the new field.

Methodology references (required if estimator / math changes)

  • Method name(s): CallawaySantAnna (Callaway & Sant'Anna 2021)
  • Paper / source link(s): docs/methodology/REGISTRY.md → "CallawaySantAnna" edge cases
  • Any intentional deviations from the source (and why): Per-cell surface deviation from R's did::att_gt, which omits non-estimable cells from its result table; diff-diff materializes them as NaN rows (with skip_reason) for inspectability. R's aggte() aggregation behavior is matched exactly — non-estimable cells contribute nothing to any aggregate. Documented with a **Deviation from R:** label in REGISTRY.md.

Validation

  • Tests added/updated: tests/test_staggered.py — new TestCallawaySantAnnaNonEstimableMaterialization (per-path materialization parametrized over reg/ipw/dr panel + reg RCS, skip_reason mapping, to_dataframe column, all-non-estimable raise); extended test_nan_cell_preserved_not_dropped to assert skip_reason. Existing CS NaN / aggregation / balance_e and test_methodology_callaway.py R-parity suites unchanged and green; shared-aggregation-mixin consumers (ContinuousDiD, TripleDifference, HonestDiD, ImputationDiD, EfficientDiD) and the diagnostic-report suite all green.
  • Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…awaySantAnna

Uniformly materialize a NaN entry (with a machine-readable skip_reason) for every
non-estimable (g,t) group-time cell across all CS estimation paths (no-covariate
regression, covariate regression, IPW/DR, repeated cross-section, survey-weighted)
instead of omitting it. Previously only the covariate-singular case materialized
NaN; the other paths dropped the cell silently from the grid.

Cells carry no influence-function entry, so they are excluded from every
aggregation (simple/group/calendar/event-study), balance_e, and bootstrap -- all
aggregate point estimates and SEs, plus event-study n_groups / by-group n_periods,
are numerically unchanged and continue to match R did's aggte(). A fit where no
cell is estimable still raises ValueError. to_dataframe("group_time") now includes
the NaN rows and a skip_reason column.

Documented per-cell surface deviation from R's att_gt (which omits the rows).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Overall Assessment

✅ Looks good. I found no unmitigated P0/P1 issues. The R did::att_gt surface deviation is explicitly documented in docs/methodology/REGISTRY.md, and the aggregate paths finite-mask materialized NaN cells.

Executive Summary

  • Affected method: CallawaySantAnna, including panel, RCS, IPW/DR, regression, survey-weighted, aggregation, and bootstrap surfaces.
  • Methodology deviation from R is documented with a **Deviation from R:** label, so it is not a defect.
  • Aggregation and bootstrap aggregate prep exclude non-finite effects before weighting.
  • No new inline inference anti-patterns found; new NaN rows use all-NaN inference fields.
  • Remaining items are P2/P3 hygiene: stale TODO, release metadata/version rollback, and test coverage gaps.

Methodology

  • Severity: P2
    Finding: Non-finite regression NaN cells in the covariate-vectorized path still receive influence_func_info, despite the new registry/helper contract saying NaN cells carry no IF entry and are excluded from bootstrap. The aggregate bootstrap finite-masks them, so I do not see wrong aggregate estimates/SEs. Location: diff_diff/staggered.py:L1523-L1557, diff_diff/staggered_bootstrap.py:L223-L226, docs/methodology/REGISTRY.md:L528-L530.
    Impact: Internal contract/documentation mismatch and avoidable per-cell bootstrap work for NaN cells.
    Concrete fix: When nan_cell is true, store via _nan_gt_entry(..., skip_reason="non_finite_regression") and do not add influence_func_info / task_keys for that cell, or narrow the registry wording.

  • Severity: P3 informational
    Finding: Materializing non-estimable (g,t) cells differs from R did::att_gt, but this is documented with **Deviation from R:**. Location: docs/methodology/REGISTRY.md:L528-L530.
    Impact: No action required under the review rules; aggregate behavior matches the documented omit-equivalent contract.
    Concrete fix: None required.

Code Quality

  • Severity: P3
    Finding: TODO.md still lists this exact feature as actionable and “currently omitted.” Location: TODO.md:L31-L32.
    Impact: Future work tracking is stale and may cause duplicate effort.
    Concrete fix: Remove the row or mark it completed with this PR reference.

Performance

  • Severity: P3
    Finding: The non-finite regression IF-entry issue above can allocate bootstrap columns for NaN cells.
    Impact: Minor avoidable bootstrap memory/CPU if many cells are non-finite.
    Concrete fix: Same as Methodology P2: omit IF entries for NaN cells.

Maintainability

  • Severity: P2
    Finding: Release metadata is downgraded from 3.6.0 to 3.5.3 while adding unreleased feature content. Locations: pyproject.toml:L7, diff_diff/__init__.py:L304, CITATION.cff:L8-L9, rust/Cargo.toml:L3.
    Impact: If 3.6.0 is already the branch baseline, this regresses package/citation metadata.
    Concrete fix: Confirm intended release baseline; if not intentional, keep version metadata at 3.6.0 or move all feature text under the correct unreleased version policy.

Tech Debt

  • Severity: P3
    Finding: The old TODO entry for materializing CS NaN cells was not reconciled.
    Impact: Tracked debt no longer reflects implementation state.
    Concrete fix: Update TODO.md alongside this PR.

Security

  • Severity: None
    Finding: No secrets, unsafe execution, or security-sensitive changes found.
    Impact: None.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3
    Finding: New tests cover panel no-cov regression, panel IPW/DR without covariates, and RCS regression, but not survey zero-weight-mass materialization, missing-period materialization, or covariate IPW/DR/RCS IPW/DR paths. Location: tests/test_staggered.py:L1343-L1346.
    Impact: Some documented “uniform across all paths” behavior is not directly pinned.
    Concrete fix: Add parametrized tests for zero_weight_mass with survey weights and covariate IPW/DR/RCS cells.

I could not run the selected tests in this environment because pytest and numpy are not installed.

#582)

- The covariate-regression non-finite cell now materializes via _nan_gt_entry
  with NO influence-function entry, matching the other paths and the documented
  REGISTRY/helper contract (previously it wrote a zero-IF entry and ran batch
  inference). Aggregates and SEs are unchanged -- the cell was finite-masked /
  IF-membership-filtered out either way; now the "NaN cells carry no IF entry"
  invariant holds uniformly across all paths.
- Extend the materialization test to cover covariate IPW/DR (panel + RCS) paths.
- Remove the now-implemented CallawaySantAnna NaN-cell row from TODO.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: fa149d55eabcc831551a2338af2fb5b0500d248c


Overall Assessment

⚠️ Needs changes — one unmitigated P1. No P0 findings.

Executive Summary

  • Affected method: CallawaySantAnna, especially non-estimable (g,t) materialization across panel, RCS, general IPW/DR/regression, aggregation, and bootstrap.
  • The documented R did::att_gt surface deviation is properly labeled in REGISTRY.md; that part is not a defect.
  • The prior covariate-vectorized IF leak is fixed, but the same non-finite ATT pattern still exists in the general panel/RCS callers.
  • Aggregations finite-mask NaN/non-finite effects, so I do not see wrong aggregate ATT/SE output from the issue.
  • pytest could not be run in this environment: /usr/bin/python: No module named pytest.

Methodology

  • Severity: P1 [Newly identified]
    Finding: General panel and RCS paths still treat att_gt is not None as estimable, even when att_gt is non-finite. Those cells are stored with "effect": att_gt, may surface inf rather than a NaN entry, are not counted in the consolidated skip warning, and still add influence_func_info. This conflicts with the registry contract that non-estimable cells are NaN entries, carry no IF entry, and are excluded from bootstrap. Locations: diff_diff/staggered.py:L2021-L2067, diff_diff/staggered.py:L2131-L2176, diff_diff/staggered.py:L3482-L3501, docs/methodology/REGISTRY.md:L528-L530.
    Impact: Non-finite regression cells can violate the documented per-cell surface and bootstrap membership contract. Aggregate estimates remain protected by finite masks, but the per-cell result/IF contract is not uniformly implemented.
    Concrete fix: Before building gte_entry or adding influence_func_info, branch on not np.isfinite(att_gt) and store _nan_gt_entry(..., skip_reason="non_finite_regression"); do not add that key to influence_func_info.

  • Severity: P3 informational
    Finding: Materializing non-estimable cells differs from R did::att_gt, but the deviation is documented with a **Deviation from R:** label. Location: docs/methodology/REGISTRY.md:L530.
    Impact: No action required under the review rules.
    Concrete fix: None.

Code Quality

  • Severity: None
    Impact: No additional code-quality findings beyond the methodology item.
    Concrete fix: None.

Performance

  • Severity: P3
    Finding: The P1 issue can still allocate/bootstrap IF columns for non-finite cells that should have no IF entry. Locations: diff_diff/staggered.py:L2066-L2067, diff_diff/staggered.py:L2175-L2176, diff_diff/staggered_bootstrap.py:L223-L226.
    Impact: Avoidable bootstrap work for cells that should be omitted from bootstrap membership.
    Concrete fix: Same as the P1 fix: omit IF entries for all non-finite materialized cells.

Maintainability

  • Severity: P2
    Finding: Release metadata is downgraded from 3.6.0 to 3.5.3 while the PR adds unreleased feature behavior. Locations: pyproject.toml:L7, diff_diff/__init__.py:L304, CITATION.cff:L10-L11, rust/Cargo.toml:L3, CHANGELOG.md:L8-L60.
    Impact: Built artifacts and citation metadata can report an older version while containing newer behavior.
    Concrete fix: Restore the intended release/version metadata, or clearly align the changelog and package metadata with the project’s unreleased-version policy.

Tech Debt

  • Severity: None
    Impact: The previous stale TODO.md item for this exact feature was removed.
    Concrete fix: None.

Security

  • Severity: None
    Impact: No secrets, unsafe execution, or security-sensitive changes found.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3
    Finding: New registry/changelog prose says cells are excluded from “calendar” aggregation, but CallawaySantAnna.fit() documents only simple, event_study, group, and all. Locations: docs/methodology/REGISTRY.md:L529, CHANGELOG.md:L54-L56, diff_diff/staggered.py:L1652-L1658.
    Impact: Documentation overstates the public aggregation surface.
    Concrete fix: Remove “calendar” from the new CallawaySantAnna prose unless a calendar aggregation path is added.

  • Severity: P3
    Finding: Tests cover many materialization paths, but not the non-finite general/RCS path with bootstrap membership.
    Impact: The P1 regression was not pinned.
    Concrete fix: Add tests that poison a general panel path and an RCS path to return non-finite att_gt with n_bootstrap > 0, then assert the cell is all-NaN, has skip_reason="non_finite_regression", and is absent from bootstrap_results.group_time_ses.

Path to Approval

  1. Add a non-finite att_gt branch in both RCS and general panel callers before gte_entry construction.
  2. Ensure non-finite cells are stored through _nan_gt_entry(..., skip_reason="non_finite_regression") and never inserted into influence_func_info.
  3. Add focused regression tests for non-finite general panel and RCS cells with bootstrap enabled, asserting all-NaN fields and no bootstrap membership.

…/RCS paths (review #582 P1)

The general (IPW/DR) and RCS estimable builders treated `att_gt is not None` as
estimable even when att_gt was non-finite (NaN/inf): they stored effect=att_gt
(which could surface inf), kept the influence-function entry, and did not count
the cell in the consolidated skip total. Both now branch on
`att_gt is None or not np.isfinite(att_gt)` first and materialize via
_nan_gt_entry(skip_reason="non_finite_regression") with NO IF entry, so the
documented contract (non-estimable cells are NaN entries, carry no IF, excluded
from the bootstrap) holds uniformly across every path. Aggregate estimates and
SEs are unchanged (these cells were finite-masked / IF-filtered out either way).
Adds a regression test mimicking the inf-ATT-with-IF leak.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6910a57a846bd2af1455daa739a7c185c4c9bccf


Overall Assessment

⛔ Blocker — one unmitigated P0. The prior general panel/RCS non-finite ATT issue is fixed, but the no-covariate vectorized path still violates the new registry contract.

Executive Summary

  • Affected method: CallawaySantAnna, specifically non-estimable (g,t) materialization.
  • Prior P1 for general panel/RCS non-finite ATT handling appears resolved at diff_diff/staggered.py:L2021-L2033 and diff_diff/staggered.py:L2135-L2147.
  • [Newly identified] The no-covariate vectorized regression path still stores non-finite att as the cell effect and adds IF membership.
  • The documented R did::att_gt surface deviation is properly labeled in REGISTRY.md; that is not a defect.
  • I could not run tests locally: the environment is missing numpy.

Methodology

  • Severity: P0 [Newly identified]
    Finding: _compute_all_att_gt_vectorized() detects non-finite att only by setting skip_reason, but it still stores the raw non-finite effect, inserts influence_func_info, appends the cell to batch inference, and allows bootstrap membership. This violates the registry contract that non-estimable cells have all inference fields NaN, carry no IF entry, and are excluded from bootstrap. If att=inf and se remains finite, safe_inference_batch() can produce t_stat=inf, p_value=0, and infinite CI endpoints. Locations: diff_diff/staggered.py:L1173-L1199, diff_diff/staggered.py:L1222-L1242, docs/methodology/REGISTRY.md:L528-L530.
    Impact: Per-cell statistical output can be silently misleading, and bootstrap membership contradicts the documented method contract.
    Concrete fix: Before building gte_entry or inf_info_gt, branch on not np.isfinite(att), store _nan_gt_entry(n_treated=n_t, n_control=n_c, skip_reason="non_finite_regression"), count the skip, and continue.

  • Severity: P3 informational
    Finding: Materializing non-estimable cells differs from R did::att_gt, but this is documented with **Deviation from R:**. Location: docs/methodology/REGISTRY.md:L530.
    Impact: No action required.
    Concrete fix: None.

Code Quality

  • Severity: None
    Impact: No separate code-quality findings beyond the P0 control-flow issue.
    Concrete fix: None.

Performance

  • Severity: P3
    Finding: The P0 issue also leaves non-finite no-covariate cells eligible for IF/bootstrap work. Location: diff_diff/staggered.py:L1199.
    Impact: Avoidable bootstrap work for cells that should be omitted.
    Concrete fix: Same as the P0 fix: do not add IF info for materialized NaN cells.

Maintainability

  • Severity: P2
    Finding: Version metadata is downgraded from 3.6.0 to 3.5.3 while the PR adds unreleased behavior. Locations: pyproject.toml:L7, diff_diff/__init__.py:L304, CITATION.cff:L10-L11, rust/Cargo.toml:L3, CHANGELOG.md:L8-L60.
    Impact: Built packages/citation metadata can report an older version than the code behavior.
    Concrete fix: Restore the intended version metadata, or clearly align the changelog and package metadata with the project’s unreleased-version policy.

Tech Debt

  • Severity: None
    Impact: The old TODO entry for this feature was removed after implementation.
    Concrete fix: None.

Security

  • Severity: None
    Impact: No secrets or security-sensitive issues found.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3
    Finding: Tests cover zero-control materialization and the general IPW non-finite path, but not the no-covariate vectorized non-finite path or bootstrap exclusion for that case. Location: tests/test_staggered.py:L1440-L1477.
    Impact: The P0 regression is not pinned.
    Concrete fix: Add a no-covariate regression test with n_bootstrap > 0 asserting the poisoned/non-finite cell is all-NaN, has skip_reason="non_finite_regression", and is absent from bootstrap_results.group_time_ses.

  • Severity: P3
    Finding: New prose says cells are excluded from “calendar” aggregation, but CallawaySantAnna.fit() documents only simple, event_study, group, and all. Locations: CHANGELOG.md:L54-L56, docs/methodology/REGISTRY.md:L529, diff_diff/staggered.py:L1652-L1658.
    Impact: Documentation overstates the public aggregation surface.
    Concrete fix: Remove “calendar” from the CallawaySantAnna-specific prose unless a calendar aggregation API exists.

Path to Approval

  1. Add the non-finite branch in _compute_all_att_gt_vectorized() before gte_entry and influence_func_info construction.
  2. Ensure that branch writes _nan_gt_entry(..., skip_reason="non_finite_regression") and does not append to atts, ses, task_keys, or influence_func_info.
  3. Add a focused no-covariate regression test with bootstrap enabled that asserts all-NaN cell inference and no bootstrap membership.

…view #582 P0)

Completes the uniform invariant: every CallawaySantAnna estimable-cell builder
(no-covariate vectorized, covariate regression, general IPW/DR, RCS) now routes a
non-finite ATT(g,t) to _nan_gt_entry(skip_reason="non_finite_regression") with NO
influence-function entry, skipping batch inference and bootstrap membership. The
no-covariate diff-in-means ATT is finite given n_t,n_c>0, but a non-finite outcome
(inf survives the NaN-only valid mask) could otherwise store inf as the effect and
produce t_stat=inf / p=0 / infinite CI via safe_inference_batch. Aggregate
estimates and SEs are unchanged. Adds a regression test injecting an inf outcome
through the no-covariate path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 5e499d2e1f95d1c263bbcaf1c42c8cde12d08cc2


Overall Assessment

⚠️ Needs changes — prior P0 is resolved, but I found one new unmitigated P1 aggregate-surface mismatch.

Executive Summary

  • Prior P0 is fixed: no-covariate vectorized non-finite ATT cells now become _nan_gt_entry(...) before IF membership/bootstrap inclusion.
  • The per-cell materialization deviation from R is documented in REGISTRY.md; that part is not a defect.
  • [Newly identified] Event-study aggregation can still emit all-NaN relative-time buckets, which contradicts the registry claim that NaN cells are excluded from aggregation and aggregate surfaces match prior omit/R aggte() behavior.
  • Tests cover per-cell materialization and finite overall aggregation, but not the all-NaN event-time bucket surface.
  • I could not run tests locally because numpy is not installed.

Methodology

  • Severity: P1 [Newly identified]
    Impact: _aggregate_event_study() finite-filters cells, but when a relative-time bucket contains only materialized NaN cells it appends an event-study row with effect=NaN, se=NaN, and n_groups=0 instead of omitting the bucket. That is a new aggregate surface difference for cells that were previously omitted, and it conflicts with the documented contract that NaN cells are excluded from event-study aggregation and aggregate behavior matches R did::aggte() exactly. Locations: diff_diff/staggered_aggregation.py:L661-L672, docs/methodology/REGISTRY.md:L528-L530.
    Concrete fix: In _aggregate_event_study(), when finite filtering leaves len(effs) == 0, continue rather than appending a NaN aggregate row. Add a test with control_group="not_yet_treated" where the final event-time has only non-estimable cells and assert that relative time is absent from event_study_effects.

  • Severity: P3 informational
    Impact: Materializing non-estimable (g,t) cells differs from R did::att_gt, but the deviation is explicitly documented. Location: docs/methodology/REGISTRY.md:L530.
    Concrete fix: None.

Code Quality

  • Severity: None
    Impact: The previous no-covariate non-finite control-flow issue is fixed cleanly at diff_diff/staggered.py:L1177-L1184.
    Concrete fix: None.

Performance

  • Severity: None
    Impact: NaN cells no longer enter IF/bootstrap membership in the reviewed paths.
    Concrete fix: None.

Maintainability

  • Severity: P2
    Impact: Version metadata is still downgraded from 3.6.0 to 3.5.3 across package/citation/Rust metadata while the PR adds unreleased behavior. Locations: pyproject.toml:L7, diff_diff/__init__.py:L304, CITATION.cff:L10-L11, rust/Cargo.toml:L3.
    Concrete fix: Confirm this is intentional release-management work; otherwise restore the intended unreleased/package version alignment.

Tech Debt

  • Severity: None
    Impact: The old TODO entry for materializing CS NaN cells was removed after implementation.
    Concrete fix: None.

Security

  • Severity: None
    Impact: No secrets or security-sensitive changes found.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3
    Impact: CallawaySantAnna docs/changelog mention “calendar” aggregation, but the public fit() aggregation options are simple, event_study, group, and all. Locations: docs/methodology/REGISTRY.md:L529, CHANGELOG.md:L51-L56, diff_diff/staggered.py:L1665-L1671.
    Concrete fix: Remove “calendar” from CallawaySantAnna-specific prose unless a calendar aggregation API exists.

Path to Approval

  1. Change _aggregate_event_study() so all-NaN relative-time buckets are omitted after finite filtering.
  2. Add a focused event-study regression test asserting a relative time with only materialized NaN (g,t) cells is absent from event_study_effects.
  3. Re-run the CS staggered tests, including the new non-estimable materialization tests.

…nt study (review #582 P1)

_aggregate_event_study() appended an all-NaN row (effect=NaN, se=NaN, n_groups=0)
for a relative-time bucket whose cells are all non-estimable, instead of omitting
the bucket. With NaN-cell materialization this surfaced new all-NaN event-study
rows where the bucket previously had no cells (and thus no row) -- an aggregate-
surface change vs the prior omit behavior and R did::aggte(). The bucket is now
dropped when finite-filtering leaves no cell (tracked via a kept-periods list so
the result lists stay aligned), matching _aggregate_by_group, which already omits
all-NaN groups. Adds a test asserting an all-non-estimable relative time is absent
from event_study_effects.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 785773d34d965871e4fb3fb13b9ed7a76df79916


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Prior P1 is resolved: all-NaN event-time buckets are now omitted, and n_groups counts only finite contributing cells.
  • The per-cell NaN materialization deviation from R is documented in REGISTRY.md, so it is informational, not a defect.
  • NaN cells carry no IF entry and are filtered out of aggregation/bootstrap paths.
  • Remaining findings are P3 documentation/warning-surface polish only.
  • I could not run pytest because pytest is not installed; AST parsing of touched Python files passed.

Methodology

  • Severity: P3 informational
    Impact: Materializing non-estimable (g,t) cells differs from R did::att_gt, but this is explicitly documented as a **Deviation from R:**; aggregate behavior still matches the registry contract. Locations: docs/methodology/REGISTRY.md:L528-L530, diff_diff/staggered.py:L225-L260.
    Concrete fix: None required.

  • Severity: None
    Impact: The prior event-study aggregate mismatch is fixed: all-NaN buckets continue, emitted periods are tracked via agg_periods, and n_groups is finite-filtered. Locations: diff_diff/staggered_aggregation.py:L662-L675, diff_diff/staggered_aggregation.py:L696-L744.
    Concrete fix: None.

Code Quality

  • Severity: P3
    Impact: Warning accounting is not fully aligned with the registry wording that the consolidated warning lists skip reasons/counts. The no-covariate vectorized non-finite path materializes skip_reason="non_finite_regression" but does not add it to _skip_info; zero-weight cells are counted under empty_cell. Locations: diff_diff/staggered.py:L1137-L1143, diff_diff/staggered.py:L1177-L1184, diff_diff/staggered.py:L2235-L2249.
    Concrete fix: Add explicit skip buckets for zero_weight_mass and non_finite_regression and include them in the consolidated warning.

Performance

  • Severity: None
    Impact: NaN cells are excluded before bootstrap by IF membership and finite filtering. Locations: diff_diff/staggered_bootstrap.py:L223-L225, diff_diff/staggered_bootstrap.py:L261-L269, diff_diff/staggered_bootstrap.py:L762-L769.
    Concrete fix: None.

Maintainability

  • Severity: None
    Impact: Helper return arity changes are propagated to both callers of _compute_att_gt_fast() / _compute_att_gt_rc().
    Concrete fix: None.

Tech Debt

  • Severity: None
    Impact: The implemented CS NaN-materialization TODO was removed; no new untracked blocker found. Location: TODO.md:L27-L32.
    Concrete fix: None.

Security

  • Severity: None
    Impact: No secrets or security-sensitive changes found.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3
    Impact: CallawaySantAnna prose says NaN cells are excluded from “calendar” aggregation, but the public fit() options are simple, event_study, group, and all. Locations: CHANGELOG.md:L54-L56, docs/methodology/REGISTRY.md:L529, diff_diff/staggered.py:L1665-L1671.
    Concrete fix: Remove “calendar” from CallawaySantAnna-specific prose unless a calendar aggregation API is added.

  • Severity: P3 informational
    Impact: I could not execute the focused tests because pytest is not installed. AST parsing passed for touched Python files.
    Concrete fix: Run tests/test_staggered.py::TestCallawaySantAnnaNonEstimableMaterialization in CI or a dev environment with pytest installed.

…cell notes (review #582 P3)

CallawaySantAnna's aggregation options are simple / event_study / group / all
(there is no calendar aggregation). Remove "calendar" from the REGISTRY edge-case
Note and the CHANGELOG entry listing which aggregations exclude materialized NaN
cells.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 4291a25358e5a3d89e70b48fbb50f5fb65b16737


Overall Assessment

✅ Looks good - no unmitigated P0/P1 findings.

Executive Summary

  • Affected method: CallawaySantAnna.
  • The NaN materialization behavior is documented in REGISTRY.md as a Deviation from R, so it is not a methodology defect.
  • Prior event-study all-NaN bucket issue is resolved: empty relative-time buckets are omitted and counts use finite contributors only.
  • Non-blocking issues remain around warning granularity and metadata on newly materialized NaN rows.
  • I could not run pytest because pytest/runtime deps are unavailable; AST parsing of touched Python files passed.

Methodology

  • Severity: P3 informational
    Impact: CallawaySantAnna now differs from R did::att_gt by surfacing non-estimable (g,t) cells as NaN rows, but this is explicitly documented with **Deviation from R:**. Aggregation finite-masks these cells and bootstrap filters to IF-backed cells, matching the registry contract for aggte() behavior. Locations: docs/methodology/REGISTRY.md:L528-L530, diff_diff/staggered.py:L225-L260, diff_diff/staggered_aggregation.py:L108-L126, diff_diff/staggered_bootstrap.py:L223-L269.
    Concrete fix: None required.

  • Severity: None
    Impact: Previous P1 is resolved. Event-study aggregation now skips all-NaN relative-time buckets and counts only finite contributing cells. Group aggregation similarly counts finite periods only. Locations: diff_diff/staggered_aggregation.py:L662-L675, diff_diff/staggered_aggregation.py:L694-L701, diff_diff/staggered_aggregation.py:L736-L745, diff_diff/staggered_aggregation.py:L881-L899.
    Concrete fix: None.

Code Quality

  • Severity: P2
    Impact: Some newly materialized NaN rows can report misleading n_treated / n_control metadata because helper sentinel returns use (0, 0) even after observed counts are known. This does not affect estimates or aggregation, but group_time_effects / to_dataframe("group_time") can show zero counts for cells that actually had treated observations, control observations, or zero survey-weight mass. Locations: diff_diff/staggered.py:L910-L914, diff_diff/staggered.py:L925-L929, diff_diff/staggered.py:L1373-L1378, diff_diff/staggered.py:L3378-L3384, diff_diff/staggered.py:L3399-L3404.
    Concrete fix: Return the observed per-cell counts on non-estimable exits once masks have been computed; for RCS, return n_gt / n_ct for display even when the missing arm is pre-period.

  • Severity: P3
    Impact: Warning accounting still does not fully match the registry wording that the consolidated warning lists skip reasons/counts. zero_weight_mass is bucketed under empty_cell, and vectorized non-finite no-covariate cells are materialized without adding a corresponding skip bucket. Locations: diff_diff/staggered.py:L1137-L1143, diff_diff/staggered.py:L1177-L1184, diff_diff/staggered.py:L1257-L1260, diff_diff/staggered.py:L2234-L2255.
    Concrete fix: Track skip counts by actual skip_reason and render those reason codes in the consolidated warning.

Performance

  • Severity: None
    Impact: NaN cells are excluded before bootstrap work via IF membership and finite masks, so the materialization should not add bootstrap computation for non-estimable cells. Locations: diff_diff/staggered_bootstrap.py:L223-L269, diff_diff/staggered_bootstrap.py:L762-L769.
    Concrete fix: None.

Maintainability

  • Severity: None
    Impact: Return arity changes are propagated to the visible callers of _compute_att_gt_fast() and _compute_att_gt_rc(). Locations: diff_diff/staggered.py:L2015-L2031, diff_diff/staggered.py:L2137-L2146, diff_diff/staggered.py:L3297-L3313.
    Concrete fix: None.

Tech Debt

  • Severity: None
    Impact: The prior deferred TODO for materializing CallawaySantAnna NaN cells was removed, consistent with this PR implementing it. Location: TODO.md:L27-L32.
    Concrete fix: None.

Security

  • Severity: None
    Impact: No secrets or security-sensitive changes found in the reviewed diff.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3 informational
    Impact: New tests cover materialization, skip_reason, DataFrame output, all-non-estimable raising, and event-study omission behavior. Runtime execution was not possible in this sandbox because pytest is not installed and importing the package also failed on missing numpy; AST parsing passed for touched Python files. Locations: tests/test_staggered.py:L1335-L1564.
    Concrete fix: Run the focused test class in CI or a dev environment with project dependencies installed.

…ells (review #582 P2)

The per-cell helper sentinel returns (and the covariate-reg empty-control batch
site) hardcoded n_treated=n_control=0 for materialized NaN cells even after the
observation masks had been computed, so group_time_effects / to_dataframe could
show zero counts for a cell that actually had treated (or control) observations.
The zero-control / zero-weight exits in _compute_att_gt_fast and _compute_att_gt_rc,
and the covariate-reg empty-control batch site, now return the observed counts;
missing-period exits (masks not yet built) keep 0. Display-only metadata --
estimates, SEs, and aggregation are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 9d57e7678721ea4bc5b4e839f4ff944aa174c97b


Overall Assessment

✅ Looks good - no unmitigated P0/P1 findings.

Executive Summary

  • Affected method: CallawaySantAnna.
  • The per-cell NaN materialization is a documented Deviation from R, so it is not a methodology defect.
  • Prior event-study/all-NaN bucket concern is resolved: all-NaN buckets are omitted, and n_groups / n_periods count finite contributors only.
  • NaN cells are created with all inference fields NaN and no IF entry, so aggregation/bootstrap paths exclude them.
  • Remaining issue is P3 warning/accounting granularity, not statistical correctness.
  • I could not run pytest/import smoke tests because this sandbox lacks pytest and numpy; AST parsing of touched Python files passed.

Methodology

  • Severity: P3 informational
    Impact: CallawaySantAnna now materializes non-estimable (g,t) cells as NaN rows, which differs from R did::att_gt, but this is explicitly documented as a Deviation from R. Aggregation behavior remains aligned with the documented aggte() contract because simple/event-study/group aggregations finite-mask NaN effects and bootstrap filters to IF-backed cells only. Locations: docs/methodology/REGISTRY.md:L526-L531, diff_diff/staggered.py:L225-L260, diff_diff/staggered_aggregation.py:L108-L126, diff_diff/staggered_aggregation.py:L662-L711, diff_diff/staggered_bootstrap.py:L223-L269.
    Concrete fix: None required.

  • Severity: None
    Impact: Previous P1 is resolved. Event-study aggregation now omits relative-time buckets with no finite contributors and counts finite cells only; group aggregation similarly counts finite post cells only. Locations: diff_diff/staggered_aggregation.py:L662-L701, diff_diff/staggered_aggregation.py:L736-L745, diff_diff/staggered_aggregation.py:L881-L899.
    Concrete fix: None.

Code Quality

  • Severity: P3 informational
    Impact: The consolidated warning still does not fully list actual skip_reason codes/counts as described in the registry. zero_weight_mass is still counted under the generic empty-cell bucket, and vectorized non-finite cells can be materialized without contributing a reason-specific warning count. The result rows themselves carry skip_reason, so this is not a statistical-output bug. Locations: diff_diff/staggered.py:L1137-L1143, diff_diff/staggered.py:L1177-L1184, diff_diff/staggered.py:L1257-L1260, diff_diff/staggered.py:L2244-L2265.
    Concrete fix: Track skip counts by the actual skip_reason across all paths and render those reason codes in the consolidated warning.

Performance

  • Severity: None
    Impact: NaN cells are excluded before bootstrap work through IF membership filtering, so materialization should not add bootstrap computation for non-estimable cells. Locations: diff_diff/staggered_bootstrap.py:L223-L269, diff_diff/staggered_bootstrap.py:L762-L769, diff_diff/staggered_bootstrap.py:L829-L835.
    Concrete fix: None.

Maintainability

  • Severity: None
    Impact: The helper return-arity changes for _compute_att_gt_fast() and _compute_att_gt_rc() are propagated to their visible callers. Locations: diff_diff/staggered.py:L2025-L2056, diff_diff/staggered.py:L2147-L2170, diff_diff/staggered.py:L3307-L3532.
    Concrete fix: None.

Tech Debt

  • Severity: None
    Impact: The prior TODO for materializing CallawaySantAnna NaN cells was removed, consistent with this PR implementing that behavior. Location: TODO.md:L27-L32.
    Concrete fix: None.

Security

  • Severity: None
    Impact: No secrets or security-sensitive changes found in the reviewed diff.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3 informational
    Impact: Tests were added for materialization, skip_reason, DataFrame output, non-finite effects, all-non-estimable raising, and event-study omission behavior. Local runtime validation was unavailable because pytest and numpy are not installed; AST parsing passed for touched Python files. Locations: tests/test_staggered.py:L1333-L1565, diff_diff/staggered_results.py:L39-L55, diff_diff/staggered_results.py:L430-L445.
    Concrete fix: Run the focused TestCallawaySantAnnaNonEstimableMaterialization suite in CI or a dev environment with project dependencies installed.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jun 29, 2026
Three test_csdid_ported.py tests relied on non-estimable (g,t) cells being
OMITTED from group_time_effects; with the new NaN-cell materialization those
cells are present as NaN, so the tests' membership / golden-iteration guards no
longer skip them. Update them to preserve their intent on the FINITE cells:
- test_some_units_treated_first_period: a first-period cohort (no base period) is
  now all-NaN (missing_period) rather than absent -> assert it is all-NaN.
- test_zero_pretreatment_outcomes: skip NaN pre-cells (the last cohort under
  not_yet_treated has no controls); finite pre-cells are still ~0.
- test_golden_fewer_periods: skip NaN cells (gapped panel where base g-1 is
  unobserved -> missing_period; R falls back to an available base) -> R-parity on
  the finite cells.

No source change; the cells are correctly non-estimable, only now visible.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: d1f2bb44105185a2c43875f9782ea444c1639235


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Affected method: CallawaySantAnna.
  • The NaN materialization of non-estimable (g,t) cells is explicitly documented as a Deviation from R, so it is not a methodology defect.
  • Prior event-study/all-NaN bucket concern remains resolved: all-NaN relative-time buckets are omitted, and metadata counts finite contributors only.
  • NaN cells carry all-NaN inference fields and no IF entry, so aggregation and bootstrap paths exclude them.
  • Remaining items are P3 informational: warning reason-count granularity and a small survey-specific test-coverage gap.

Methodology

  • Severity: P3 informational
    Impact: CallawaySantAnna now materializes non-estimable cells as NaN rows, differing from R did::att_gt, but this is documented with a Deviation from R note. Aggregation behavior remains aligned with the registry: simple/group/event-study finite-mask NaN effects, balance_e uses only finite anchor cells, and bootstrap filters to IF-backed cells. Locations: docs/methodology/REGISTRY.md:L526-L531, diff_diff/staggered.py:L225-L260, diff_diff/staggered_aggregation.py:L108-L126, diff_diff/staggered_aggregation.py:L620-L646, diff_diff/staggered_bootstrap.py:L223-L269.
    Concrete fix: None required.

  • Severity: None
    Impact: Previous P1 is resolved. Event-study aggregation now skips relative-time buckets with no finite contributors and counts only finite cells; group aggregation similarly counts finite post cells only. Locations: diff_diff/staggered_aggregation.py:L662-L711, diff_diff/staggered_aggregation.py:L736-L745, diff_diff/staggered_aggregation.py:L881-L899.
    Concrete fix: None.

Code Quality

  • Severity: P3 informational
    Impact: The consolidated warning still groups some real skip_reason values under coarse buckets. For example, vectorized zero_weight_mass contributes to empty_cell, while general/RCS non-estimable returns increment generic “insufficient data or non-estimable cells.” Result rows still carry the correct skip_reason, so this is not a statistical-output bug. Locations: diff_diff/staggered.py:L1257-L1260, diff_diff/staggered.py:L1637-L1640, diff_diff/staggered.py:L2044-L2056, diff_diff/staggered.py:L2158-L2170, diff_diff/staggered.py:L2244-L2265.
    Concrete fix: Track warning counts from the materialized cell’s actual skip_reason across all paths.

Performance

  • Severity: None
    Impact: NaN materialized cells do not add bootstrap work because bootstrap gt_pairs are filtered to keys present in influence_func_info. Locations: diff_diff/staggered_bootstrap.py:L223-L269.
    Concrete fix: None.

Maintainability

  • Severity: None
    Impact: The changed helper return shapes for _compute_att_gt_fast() and _compute_att_gt_rc() are propagated to visible callers; no stale arity call sites found. Locations: diff_diff/staggered.py:L819-L829, diff_diff/staggered.py:L2025-L2041, diff_diff/staggered.py:L2147-L2156, diff_diff/staggered.py:L3307-L3323.
    Concrete fix: None.

Tech Debt

  • Severity: None
    Impact: The prior TODO for materializing CallawaySantAnna NaN cells was removed consistently with this PR implementing that behavior. Location: TODO.md:L27-L32.
    Concrete fix: None.

Security

  • Severity: None
    Impact: No secrets or security-sensitive changes found in the reviewed diff.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3 informational
    Impact: New tests cover materialization across main panel/RCS/reg/IPW/DR paths, DataFrame output, all-non-estimable raising, non-finite ATT handling, and all-NaN event-time omission. Survey zero-mass currently has a skip-warning test, but not a direct assertion that the new materialized cell has skip_reason="zero_weight_mass". Locations: tests/test_staggered.py:L1347-L1569, tests/test_staggered.py:L4851-L4882, tests/test_csdid_ported.py:L636-L645, tests/test_csdid_ported.py:L1212-L1224.
    Concrete fix: Optional: extend the survey zero-mass test to assert the affected cells are NaN with skip_reason="zero_weight_mass".

Validation note: focused pytest could not run because this sandbox lacks pytest, numpy, pandas, and scipy; AST parsing of touched Python files passed.

@igerber igerber merged commit 81f4e84 into main Jun 29, 2026
42 of 43 checks passed
@igerber igerber deleted the feat/cs-nan-cell-materialization branch June 29, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant