igerber · igerber · Jun 29, 2026 · Jun 29, 2026 · Jun 29, 2026 · Jun 29, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -43,6 +43,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   Korn-Graubard (1990), and Solon-Haider-Wooldridge (2015) to `docs/references.rst`.
 
 ### Changed
+- **CallawaySantAnna now materializes non-estimable `(g,t)` cells as NaN entries instead of
+  omitting them.** Cells that cannot be estimated (missing base/post period, zero
+  treated/control, zero survey-weight mass, or a non-finite regression solve) are stored in
+  `group_time_effects` as NaN entries carrying a machine-readable `skip_reason`
+  (`"missing_period"` / `"zero_treated_control"` / `"zero_weight_mass"` /
+  `"non_finite_regression"`; estimable cells carry `None`), uniformly across all estimation paths
+  (no-covariate regression, covariate regression, IPW/DR, repeated cross-section, survey-weighted)
+  — previously only the covariate-regression singular case did this and the other paths dropped
+  the cell silently from the grid. The cells are excluded from every aggregation
+  (simple/group/event-study), from `balance_e`, and from the bootstrap, so all aggregate
+  point estimates and standard errors — and the event-study `n_groups` / by-group `n_periods`
+  metadata — are numerically **unchanged** and continue to match R `did`'s `aggte()`; a fit where
+  no cell is estimable still raises `ValueError`. `to_dataframe("group_time")` now includes these
+  NaN rows and a `skip_reason` column. This is a documented per-cell surface **deviation from R**'s
+  `att_gt` (which omits the rows). See REGISTRY.md "CallawaySantAnna" edge cases.
 - **CallawaySantAnna multiplier bootstrap now tiles weight generation over draws, cutting
   peak memory at large `n_units`.** The dense `(n_bootstrap × n_units)` multiplier-weight
   matrix (the dominant allocation for the default unit-level bootstrap — `cluster=None`,

diff --git a/TODO.md b/TODO.md
@@ -29,7 +29,6 @@ The `Origin` column (Actionable tables) and the `PR` column (Deferred tables) bo
 | Issue | Location | Origin | Effort | Priority |
 |-------|----------|--------|--------|----------|
 | `SyntheticControl` cv: thread an `"infeasible"` reason-code from `_outer_solve_V_cv()` / `_placebo_fit_unit()` so `in_space_placebo()` / `leave_one_out()` distinguish a structural cv-refit exclusion (donor-indistinguishable re-aggregated window) from a genuine inner-solver non-convergence — mirror the split `in_time_placebo()` already emits. Warnings already distinguish the two causes; only the machine-readable status/count is missing. | `synthetic_control.py`, `synthetic_control_results.py` | follow-up | Mid | Low |
-| `CallawaySantAnna`: materialize NaN entries for non-estimable `(g,t)` cells in `group_time_effects` (currently omitted with a consolidated warning); requires updating downstream consumers (event study, `balance_e`, aggregation). | `staggered.py` | #256 | Mid | Low |
 | Survey-design resolution / collapse patterns are inconsistent across panel estimators — `ContinuousDiD` rebuilds unit-level design in SE code, `EfficientDiD` builds once in `fit()`, `StackedDiD` re-resolves on stacked data. Extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation. | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Mid | Low |
 | `SyntheticControl` remaining ADH-2015 §4 items: the regression-weight `W^reg = X_0'(X_0 X_0')^{-1} X_1` extrapolation diagnostic (flag implied OLS weights outside `[0,1]`) and sparse-SC subset search (`l < J`, holding `V` fixed). LOO, in-time placebo, CV `V`-selection, and inverse-variance `V` have landed; these two are the deferred tail. | `synthetic_control.py`, `synthetic_control_results.py` | ADH-2015 | Mid | Low |
 | `SyntheticControl` conformal (CWZ 2021) extensions: (a) one-sided / signed-`t` variants (§7); (b) covariates in the conformal proxy (`X_jt`, eqs 4/6 — current proxy is outcomes-only); (c) AR / innovation-permutation path (Lemmas 5-7) for time-series proxies. The joint test, pointwise CIs, and average-effect CI have landed. | `conformal.py`, `synthetic_control_results.py` | CWZ-2021 | Heavy | Low |

diff --git a/diff_diff/staggered.py b/diff_diff/staggered.py
diff --git a/diff_diff/staggered_aggregation.py b/diff_diff/staggered_aggregation.py
@@ -651,6 +651,7 @@ def _aggregate_event_study(
         agg_ses_list = []
         agg_n_groups = []
         agg_effective_dfs = []  # Per-horizon effective df (replicate designs)
+        agg_periods = []  # Relative times that yielded an estimable aggregate row
         _psi_vectors = []  # Per-event-time combined IF vectors for VCV
         _psi_event_times = []  # Event times that contributed a psi column
         for e, effect_list in sorted_periods:
@@ -665,10 +666,12 @@ def _aggregate_event_study(
                 ns = ns[finite_mask]
                 gt_pairs = [gt for gt, m in zip(gt_pairs, finite_mask) if m]
                 if len(effs) == 0:
-                    agg_effects_list.append(np.nan)
-                    agg_ses_list.append(np.nan)
-                    agg_n_groups.append(0)
-                    agg_effective_dfs.append(None)
+                    # Every cell in this relative-time bucket is non-estimable
+                    # (materialized NaN). Omit the bucket entirely so the
+                    # event-study surface matches the prior omit behavior and R
+                    # did::aggte() (a relative time with no estimable cell yields
+                    # no row), and stays consistent with _aggregate_by_group,
+                    # which already drops all-NaN groups.
                     continue
 
             weights = ns / np.sum(ns)
@@ -690,8 +693,12 @@ def _aggregate_event_study(
 
             agg_effects_list.append(agg_effect)
             agg_ses_list.append(agg_se)
-            agg_n_groups.append(len(effect_list))
+            # Count only finite-contributing cells (gt_pairs is finite-filtered
+            # above) so materialized NaN cells don't inflate n_groups — matches
+            # the all-NaN early-return which already reports 0.
+            agg_n_groups.append(len(gt_pairs))
             agg_effective_dfs.append(eff_df)
+            agg_periods.append(e)
             _psi_vectors.append(psi_e)
             _psi_event_times.append(e)
 
@@ -727,7 +734,7 @@ def _aggregate_event_study(
         )
 
         event_study_effects = {}
-        for idx, (e, _) in enumerate(sorted_periods):
+        for idx, e in enumerate(agg_periods):
             event_study_effects[e] = {
                 "effect": agg_effects_list[idx],
                 "se": agg_ses_list[idx],
@@ -887,7 +894,9 @@ def _aggregate_by_group(
             agg_se, eff_df = self._compute_aggregated_se_with_wif(
                 gt_pairs, weights, effs, groups_for_gt, influence_func_info, df, unit, precomputed
             )
-            group_data_list.append((g, agg_effect, agg_se, len(g_effects), eff_df))
+            # Count only finite-contributing cells (gt_pairs is finite-filtered
+            # above) so materialized NaN cells don't inflate n_periods.
+            group_data_list.append((g, agg_effect, agg_se, len(gt_pairs), eff_df))
 
         if not group_data_list:
             return {}

diff --git a/diff_diff/staggered_results.py b/diff_diff/staggered_results.py
@@ -36,6 +36,11 @@ class GroupTimeEffect:
         Number of treated observations.
     n_control : int
         Number of control observations.
+    skip_reason : str or None
+        ``None`` for an estimable cell; otherwise a machine-readable reason the
+        cell is non-estimable (``"missing_period"``, ``"zero_treated_control"``,
+        ``"zero_weight_mass"``, ``"non_finite_regression"``) and ``effect``/``se``
+        are NaN. Non-estimable cells are excluded from all aggregation.
     """
 
     group: Any
@@ -47,6 +52,7 @@ class GroupTimeEffect:
     conf_int: Tuple[float, float]
     n_treated: int
     n_control: int
+    skip_reason: Optional[str] = None
 
     @property
     def is_significant(self) -> bool:
@@ -433,6 +439,9 @@ def to_dataframe(self, level: str = "group_time") -> pd.DataFrame:
                     "p_value": data["p_value"],
                     "conf_int_lower": data["conf_int"][0],
                     "conf_int_upper": data["conf_int"][1],
+                    # None for estimable cells; a reason code for non-estimable
+                    # (NaN) cells materialized in group_time_effects.
+                    "skip_reason": data.get("skip_reason"),
                 }
                 if self.epv_diagnostics and (g, t) in self.epv_diagnostics:
                     row["epv"] = self.epv_diagnostics[(g, t)].get("epv")

diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -525,8 +525,9 @@ The multiplier bootstrap uses random weights w_i with E[w]=0 and Var(w)=1:
 
 *Edge cases:*
 - Groups with single observation: included but may have high variance
-- Missing group-time cells: omitted from `group_time_effects` with a consolidated warning listing skip reasons and counts
-  - **Note:** Non-estimable cells (missing base/post period, zero treated/control, insufficient data) are omitted rather than stored as NaN. A consolidated UserWarning is emitted from `fit()` across all estimation paths. R's `did` package also omits these cells from `aggte()` results.
+- Non-estimable group-time cells: materialized as NaN entries in `group_time_effects` with a consolidated warning listing skip reasons and counts
+  - **Note:** Non-estimable cells (missing base/post period, zero treated/control, zero survey-weight mass, non-finite regression solve) are stored as NaN entries — `effect`/`se`/`t_stat`/`p_value`/`conf_int` all NaN — carrying a machine-readable `skip_reason` code (`"missing_period"`, `"zero_treated_control"`, `"zero_weight_mass"`, `"non_finite_regression"`; estimable cells carry `None`). This is uniform across ALL estimation paths (no-covariate regression, covariate regression, IPW/DR, repeated cross-section, survey-weighted). A consolidated `UserWarning` is still emitted from `fit()`. The NaN cells are **excluded from every aggregation** (simple/overall, group, event-study), from `balance_e`, and from the bootstrap (they carry no influence-function entry, and all consumers finite-mask on `np.isfinite(effect)` or filter to IF members), so all aggregate point estimates and SEs — and `n_groups`/`n_periods` metadata — are **unchanged** from the prior omit behavior and match R `did`'s `aggte()` exactly. A fit where no cell is estimable (no finite effect) still raises a `ValueError`.
+  - **Deviation from R:** R's `did::att_gt` omits non-estimable cells from its result table entirely; diff-diff materializes them as NaN rows (with `skip_reason`) so the `(g,t)` grid is inspectable via `group_time_effects` / `to_dataframe("group_time")`. This is a per-cell *surface* difference only — R's `aggte()` aggregation behavior is matched exactly (non-estimable cells contribute nothing to any aggregate).
   - **Note:** When `balance_e` is specified, cohorts with NaN effects at the anchor horizon are excluded from the balanced panel
 - Anticipation: `anticipation` parameter shifts reference period
   - Group aggregation includes periods t >= g - anticipation (not just t >= g)

diff --git a/tests/test_csdid_ported.py b/tests/test_csdid_ported.py
@@ -633,9 +633,16 @@ def test_some_units_treated_first_period(self):
                 time="period",
                 first_treat="first_treat",
             )
-        # G=2 should be excluded (no pre-treatment period available)
-        groups_in_results = set(k[0] for k in results.group_time_effects.keys())
-        assert 2 not in groups_in_results, "G=2 treated in first period should be excluded"
+        # G=2 is treated in the first observed period, so it has no valid base
+        # period -> all its (g,t) cells are non-estimable. They are now materialized
+        # as NaN entries (skip_reason="missing_period") rather than omitted, so G=2
+        # contributes no FINITE estimate (the prior "excluded from the analysis"
+        # intent: it is not silently dropped, but it is never estimated).
+        g2_cells = [v for (g, t), v in results.group_time_effects.items() if g == 2]
+        assert g2_cells, "G=2 cells should be materialized (as NaN), not silently dropped"
+        assert all(
+            np.isnan(v["effect"]) and v["skip_reason"] == "missing_period" for v in g2_cells
+        ), "G=2 (no pre-treatment period) must be all-NaN (missing_period), never estimated"
 
 
 class TestCSDIDBugFixRegressions:
@@ -774,6 +781,11 @@ def test_zero_pretreatment_outcomes(self):
         gt = results.group_time_effects
         pre_effects = {k: v for k, v in gt.items() if k[1] < k[0]}
         for (g, t), eff in pre_effects.items():
+            # Non-estimable pre-cells are now materialized as NaN (e.g. the last
+            # cohort under not_yet_treated has no controls); skip them. Finite
+            # pre-treatment cells (DiD of 0-0 vs 0-0) must still be ~0.
+            if np.isnan(eff["effect"]):
+                continue
             assert abs(eff["effect"]) < 0.01, (
                 f"Pre-treatment ATT(g={g}, t={t}) = {eff['effect']:.4f}, " "expected 0"
             )
@@ -1199,6 +1211,13 @@ def test_golden_fewer_periods(self, golden_values):
             g, t = int(g), int(t)
             if (g, t) in results.group_time_effects:
                 py_att = results.group_time_effects[(g, t)]["effect"]
+                # Skip cells we materialize as non-estimable (e.g. a gapped panel
+                # where the base period g-1 is not observed -> missing_period). R
+                # falls back to an available base and reports a value where our
+                # impl does not; compare only cells both estimate (R-parity on the
+                # finite cells, which is what this golden test pins).
+                if not np.isfinite(py_att):
+                    continue
                 r_att = r_gt["att"][i]
                 assert abs(py_att - r_att) < 0.05, (
                     f"Fewer periods ATT(g={g},t={t}): " f"Py={py_att:.4f}, R={r_att:.4f}"