Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
exit-event dynamics, R-package parity (PR-C2), and survey-design support remain follow-ups.
Pure-Python validation covers the absorbing reduction, the re-entry mechanism, pre-trend
placebos, non-negative weighting, stabilized-control admission, and DGP recovery.
- **Weighted multiple absorbed fixed effects (`absorb=[a, b, ...]`) now supported in
`DifferenceInDifferences` / `MultiPeriodDiD`.** The prior `ValueError` rejecting multi-absorb
with survey weights is lifted: the absorb path now uses the method of alternating projections
(`diff_diff.utils.demean_by_groups`), the exact weighted Frisch-Waugh-Lovell residualization for
N > 1 dimensions. New `demean_by_groups()` N-way helper; the two-way `within_transform()` now
delegates to it. Single-absorb and balanced-panel results are byte-stable (weighted
`within_transform` output is bit-identical; balanced multi-way matches the prior closed-form
demean to machine precision).
- **`LPDiD` R-parity validation (absorbing).** `tests/test_methodology_lpdid.py` pins the
estimator against the method authors' own R recipes (`danielegirardi/lpdid` event-study /
reweight / premean / pooled `fixest::feols` specifications) with an `alexCardazzi/lpdid`
Expand Down Expand Up @@ -174,6 +182,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
numerical, or public-API change.

### Fixed
- **Unbalanced-panel correctness: N > 1 absorbed fixed effects now use iterative alternating
projections instead of single-pass sequential demeaning.** Affected `DifferenceInDifferences` /
`MultiPeriodDiD` with `absorb=[a, b, ...]` and the shared unweighted two-way `within_transform`
(used by `TwoWayFixedEffects`, `SunAbraham`, `BaconDecomposition`). A single sequential demean
sweep is the exact Frisch-Waugh-Lovell residualization only when the fixed-effect subspaces are
orthogonal (balanced fully-crossed panels); on unbalanced panels it was a biased approximation
(coefficients off by ~1e-2 in tested cases). The within transformation now iterates to
convergence (`diff_diff.utils.demean_by_groups`), matching R `fixest` / `reghdfe` / `lfe`.
Balanced-panel and single-absorb results are unchanged to machine precision; the unweighted
two-way path now also emits the non-convergence `UserWarning` (previously only the weighted path
could).
- **Structural (non-covariate) matrix inverses are now rank-guarded.** The internal design-Gram
bread inversions in `ContinuousDiD` (ACRT-variance `Psi'WPsi`), `TwoStageDiD` (Stage-2
`X_2'WX_2`, both the analytical and multiplier-bootstrap surfaces), `SpilloverDiD` (Wave D
Expand Down
3 changes: 1 addition & 2 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ The `Origin` column (Actionable tables) and the `PR` column (Deferred tables) bo
| `EfficientDiD` survey-weighted Silverman bandwidth in conditional Omega* — `_silverman_bandwidth()` uses unweighted mean/std; survey-weighted statistics better reflect the population distribution (second-order refinement). | `efficient_did_covariates.py` | — | Quick | Low |
| Survey sandwich SE is not exactly invariant to zero-weight (subpopulation / padded) rows: `_compute_stratified_psu_meat`'s finite-sample correction counts zero-weight units as PSUs, so padding shifts the SE ~2e-4 relative. Point estimate is exactly invariant. Fix: count only positive-weight PSUs in the correction (cross-cutting across all survey-enabled estimators). | `survey.py` (`_compute_stratified_psu_meat`) | PR-B | Mid | Low |
| `ImputationDiD` LOO conservative-variance refinement (BJS 2024 Supp. Appendix A.9) — a finite-sample improvement to the auxiliary-model residuals reducing overfit of `tau_tilde_g` to `epsilon`. Asymptotic Theorem-3 variance is implemented and matches R `didimputation` (which also omits LOO by default). | `imputation.py` | imputation-validation | Mid | Low |
| Multi-absorb weighted demeaning needs iterative alternating projections for `N > 1` absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (exact only for balanced panels). | `estimators.py` | #218 | Heavy | Medium |
| `TwoWayFixedEffects(vcov_type in {hc2, hc2_bm})` with replicate-weight designs raises `NotImplementedError` (`twfe.py:~233`). The replicate path re-demeans per replicate, which doesn't compose with the full-dummy HC2/HC2-BM build — a correct impl needs per-replicate full-dummy refit. Workaround: `hc1` for replicate-weight CR1. | `twfe.py::fit` | follow-up | Heavy | Low |
| TWFE's HC2/HC2-BM inline full-dummy build (`twfe.py:280-315`) duplicates the dummy-construction logic in `DifferenceInDifferences(fixed_effects=...)` (`estimators.py:478-486`). Extract a shared helper, or delegate TWFE's HC2/HC2-BM path to DiD's `fixed_effects=` branch (with TWFE-specific cluster-default threading), to reduce drift risk on FE naming / survey behavior / result-surface conventions. Substantive refactor — touches both estimators. | `twfe.py::fit`, `estimators.py::DifferenceInDifferences.fit` | follow-up | Heavy | Low |
| Decide whether to formally deprecate `CallawaySantAnna.cluster=X` in favor of `survey_design=SurveyDesign(psu=X)` (the bare-cluster path already synthesizes a minimal SurveyDesign). Two equivalent paths = redundant surface. Mirrors the question for ImputationDiD / EfficientDiD / TwoStageDiD. | `staggered.py` | follow-up | Mid | Low |
Expand Down Expand Up @@ -223,7 +222,7 @@ and `TwoWayFixedEffects` accept `vcov_type ∈ {classical, hc1, hc2, hc2_bm, con
(the validated set in `linalg.py::_VALID_VCOV_TYPES`); cluster-robust variance comes from
`cluster=` alongside the heteroscedasticity kind (`hc1+cluster` ⇒ CR1 Liang-Zeger;
`hc2_bm+cluster` ⇒ CR2 Bell-McCaffrey, including the weighted WLS-CR2 port; the N>1
absorbed-FE + weights composition remains gated by the open multi-absorb row in Actionable);
absorbed-FE + weights composition is supported via iterative alternating-projection demeaning, #586);
wild cluster bootstrap is the separate `inference="wild_bootstrap"` path. Threading
`vcov_type` through the 8 standalone estimators is **complete** (Phase 1b); four
(`CallawaySantAnna`, `TripleDifference`, `ImputationDiD`, `EfficientDiD`) are permanently
Expand Down
77 changes: 31 additions & 46 deletions diff_diff/estimators.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
from diff_diff.results import DiDResults, MultiPeriodDiDResults, PeriodEffect
from diff_diff.utils import (
WildBootstrapResults,
demean_by_group,
demean_by_groups,
fe_dummy_names,
safe_inference,
validate_binary,
Expand Down Expand Up @@ -414,17 +414,9 @@ def fit(
absorbed_vars = []
n_absorbed_effects = 0

# Reject multi-absorb with survey weights (single-pass demeaning is
# not the correct weighted FWL projection for N > 1 dimensions). Only
# fires when absorb is still set — i.e., the auto-route above didn't
# consume it.
if absorb and len(absorb) > 1 and survey_weights is not None:
raise ValueError(
f"Multiple absorbed fixed effects (absorb={absorb}) with survey "
"weights is not supported. Single-pass sequential demeaning is not "
"the correct weighted FWL projection for multiple absorbed dimensions. "
"Use absorb with a single variable, or use fixed_effects= instead."
)
# Weighted multiple absorbed FE is supported: the absorb path below uses
# iterative alternating projections (demean_by_groups), the exact weighted
# FWL projection for N > 1 dimensions on both balanced and unbalanced panels.

# Validate vcov_type="conley" wire-up. DiD.fit() accepts `unit`
# as a fit-time arg (NOT on __init__) because cluster/unit
Expand Down Expand Up @@ -462,16 +454,18 @@ def fit(
float
) * working_data[time].values.astype(float)
vars_to_demean = [outcome, treatment, time, "_treat_time"] + (covariates or [])
for ab_var in absorb:
working_data, n_fe = demean_by_group(
working_data,
vars_to_demean,
ab_var,
inplace=True,
weights=survey_weights,
)
n_absorbed_effects += n_fe
absorbed_vars.append(ab_var)
# Method of alternating projections: for N > 1 absorbed dimensions a
# single sequential sweep is only exact on balanced (orthogonal-FE)
# panels; demean_by_groups iterates to the exact (W)LS-FWL residual.
working_data, n_fe = demean_by_groups(
working_data,
vars_to_demean,
list(absorb),
inplace=True,
weights=survey_weights,
)
n_absorbed_effects += n_fe
absorbed_vars = list(absorb)

# Extract variables (may be demeaned if absorb was used)
y = working_data[outcome].values.astype(float)
Expand Down Expand Up @@ -644,8 +638,7 @@ def _refit_did_absorb(w_r):
float
)
vars_dm = [outcome, treatment, time, "_treat_time"] + (covariates or [])
for ab_var in _absorb_list:
wd, _ = demean_by_group(wd, vars_dm, ab_var, inplace=True, weights=w_nz)
wd, _ = demean_by_groups(wd, vars_dm, _absorb_list, inplace=True, weights=w_nz)
y_r = wd[outcome].values.astype(float)
d_r = wd[treatment].values.astype(float)
t_r = wd[time].values.astype(float)
Expand Down Expand Up @@ -1572,17 +1565,9 @@ def fit( # type: ignore[override]
absorb = None
n_absorbed_effects = 0

# Reject multi-absorb with survey weights (single-pass demeaning is
# not the correct weighted FWL projection for N > 1 dimensions).
# Only fires when absorb is still set — i.e., the auto-route above
# didn't consume it.
if absorb and len(absorb) > 1 and survey_weights is not None:
raise ValueError(
f"Multiple absorbed fixed effects (absorb={absorb}) with survey "
"weights is not supported. Single-pass sequential demeaning is not "
"the correct weighted FWL projection for multiple absorbed dimensions. "
"Use absorb with a single variable, or use fixed_effects= instead."
)
# Weighted multiple absorbed FE is supported: the absorb path below uses
# iterative alternating projections (demean_by_groups), the exact weighted
# FWL projection for N > 1 dimensions on both balanced and unbalanced panels.

# MultiPeriodDiD is intrinsically a multi-period panel estimator;
# Phase 2 panel block-decomposed Conley (matches R conleyreg) needs
Expand Down Expand Up @@ -1622,15 +1607,16 @@ def fit( # type: ignore[override]
+ [f"_did_interact_{p}" for p in non_ref_periods]
+ (covariates or [])
)
for ab_var in absorb:
working_data, n_fe = demean_by_group(
working_data,
vars_to_demean,
ab_var,
inplace=True,
weights=survey_weights,
)
n_absorbed_effects += n_fe
# Method of alternating projections (exact for unbalanced panels; a
# single sequential sweep is exact only on balanced orthogonal-FE panels).
working_data, n_fe = demean_by_groups(
working_data,
vars_to_demean,
list(absorb),
inplace=True,
weights=survey_weights,
)
n_absorbed_effects += n_fe

# Extract outcome and treatment (may be demeaned if absorb was used)
y = working_data[outcome].values.astype(float)
Expand Down Expand Up @@ -1854,8 +1840,7 @@ def _refit_mp_absorb(w_r):
+ [f"_did_interact_{p}" for p in non_ref_periods]
+ (covariates or [])
)
for ab_var_ in _absorb_list_mp:
wd, _ = demean_by_group(wd, vars_dm_, ab_var_, inplace=True, weights=w_nz)
wd, _ = demean_by_groups(wd, vars_dm_, _absorb_list_mp, inplace=True, weights=w_nz)
y_r = wd[outcome].values.astype(float)
d_r = wd["_did_treatment"].values.astype(float)
X_r = np.column_stack([np.ones(len(y_r)), d_r])
Expand Down
Loading
Loading