Skip to content

Compiler fixes for auxiliary constant generation#473

Open
johnchandlerburnham wants to merge 3 commits into
mainfrom
jcb/compiler
Open

Compiler fixes for auxiliary constant generation#473
johnchandlerburnham wants to merge 3 commits into
mainfrom
jcb/compiler

Conversation

@johnchandlerburnham

@johnchandlerburnham johnchandlerburnham commented Jul 4, 2026

Copy link
Copy Markdown
Member

Summary

Two related changes to the content-addressing layer and its aux-constant handling, fixing underlying issue reported in #465

1. Ixon no longer stores recr/refl/nested on Inductive. These are derivable
from constructor structure, so storing them was redundant and trusting declared values
was an adversarial surface (e.g. is_rec = false on a recursive inductive enables
improper struct-eta). The kernel now computes is_rec on demand, memoized in a new env
cache with a provisional entry to break the whnf → try_struct_eta_iota → is_struct_like
cycle; this replaces the declared-vs-computed check in check_inductive. The compile
side gains compute_lean_ind_flags to recompute Lean's block-wide
isRec/isReflexive/numNested wherever an InductiveVal is reconstructed without a
source Lean env (kernel egress, decompile), and validate_lean_ind_flags to check a
whole env against the recomputation.

2. Evaporated auxiliaries get a canonical form. When SCC splitting strands a nested
aux's spec-param inductives outside the owner's SCC, no SCC holds the joint family, and
dropping the irrelevant over-merged motives leaves exactly the external inductive's
generic recursor. Canonical treatment: rec_N claims alias <Ext>.rec (e.g.
List.rec), call sites are rebuilt onto the external telescope via head-rewrite
CallSitePlans (owner-gated, single-motive targets), and below_N/brecOn_N compile as
surgered originals like _sizeOf_N. Fixes the AuxDedup kernel-check failures (28 → 0);
AuxDedup1 now generates auxiliaries identical to AuxDedup2's canonical structure.
Documented in docs/ix_canonicity.md §6.5.

Byte-exact aux roundtrip

roundtrip_block Phase A (recompile the regenerated Lean form, compare against the
stored original address) silently failed for 1529 of 1545 aux constants — including
plain stdlib like Nat.casesOn. Root cause: every production compile path preseeds the
ref/univ tables in sorted order (preseed_expr_tables) before compiling, and the
serialized constant embeds those tables; Phase A compiled without the preseed, filling
the tables in traversal order instead — every Ref/univ index permuted, byte-different
but semantically identical constants (decode resolves through the embedded table). A
debug probe recompiling the Lean original through the identical path proved
compile(original) == compile(regen) in every case: regeneration was always faithful,
the comparison context was not.

With the preseed mirrored in Phase A the invariant holds corpus-wide, so a Phase-A
recompile-hash mismatch is now a hard error with no aux exemption, and every
roundtrip arm records failures in aux_gen_errors (recovery keeps the Lean-facing env
populated for diagnosis but is never silent). Related hardening: call-site surgery
detection is durable across serialization (Named.original.is_some() alongside the
in-memory map), shift-aware instantiate_rev in the type-walking helpers (fixes fvar
leaks in .brecOn.go bodies), and the below-def roundtrip loop filters by the
original-gated members like its sibling loops. IX_ROUNDTRIP_DEBUG now dumps hashed
component summaries and runs an original-form recompile probe on any mismatch.

Test fixes and fixtures

  • kernel-tutorial: bad_raw_consts inductive fixtures carry recomputation-honest flags
    so the whole-env validate_ind_flags no longer poisons the shared tutorial env
    (73/335 → 335/335, with the kernel rejecting each bad fixture as designed).
  • validate-aux: seeds match module-private fixture names via privateToUserName?, the
    Canonicity prefix is enabled, and Phase 4b gains per-module markers so a fully absent
    identity group fails loudly when its fixture module is loaded (previously vacuous at
    0 pass / 0 fail, now 109 pass / 0 fail).
  • New fixtures: AuxDedup1/AuxDedup2 (cross-block aux dedup), AuxDedupMixed (a perm
    mixing a canonical slot and PERM_OUT_OF_SCC for the same owner), plus a
    CompileMutualFixtures benchmark lib.

Gates

  • kernel-check-env: 201296/201296
  • rust-compile: all phases, 0 aux_gen errors / 0 mismatches / 0 Phase-A address
    divergences on the full 213k env (live and deserialized)
  • validate-aux: 0 failures at 4393-constant scope
  • rust-serialize: byte-exact; kernel-ixon-roundtrip: 143694/0
  • kernel-tutorial: 335/335; cargo test workspace and lake test green;
    cargo clippy --all-targets clean
  • lake exe ix check-rs compilemathlib.ixe: 736618/736618 passed, 0 failed (325.3s)
  • lake exe ix validate Benchmarks/Compile/CompileMathlib.lean: 0 failures (1528.33s total)

Remove the `recr`/`refl` bools and the `nested` count from the Ixon
`Inductive` constant and its serialization (Rust and Lean), and from
the `Indc` reveal-proof variant, renumbering the field-presence mask
bits. These flags are derivable from constructor structure, so storing
them was redundant and trusting declared values was an adversarial
surface (e.g. is_rec = false on a recursive inductive enables improper
struct-eta).

- kernel: KConst::Indc loses is_rec/is_refl/nested. is_rec is now
  computed on demand (computed_is_rec), memoized in the new env
  is_rec_cache with a provisional entry to break the whnf ->
  try_struct_eta_iota -> is_struct_like cycle. This replaces the
  declared-vs-computed H1 verification in check_inductive.
- compile: new compute_lean_ind_flags recomputes Lean's block-wide
  isRec/isReflexive/numNested wherever an InductiveVal is reconstructed
  without a source Lean env (kernel egress, decompile), since Ixon no
  longer stores the flags; validate_lean_ind_flags checks a whole env
  against the recomputation.
- tests/benchmarks: add AuxDedup1/AuxDedup2 mutual fixtures exercising
  aux-constant dedup across blocks (fix forthcoming); add a
  CompileMutualFixtures benchmark lib building the mutual test
  fixtures; ignore *.ixe.
Evaporated auxiliaries (over-merge splits): when SCC splitting strands a
nested aux's spec-param inductives outside the owner's SCC, no SCC holds
the joint family, and dropping the irrelevant over-merged motives leaves
exactly the external inductive's generic recursor. Canonical treatment:
`rec_N` claims alias `<Ext>.rec` (e.g. `List.rec`), call sites are
rebuilt onto the external telescope via head-rewrite CallSitePlans
(owner-gated, single-motive targets), and `below_N`/`brecOn_N` compile
as surgered originals like `_sizeOf_N`. Fixes the AuxDedup kernel-check
failures (28 -> 0); AuxDedup1 now generates identical auxiliaries to
AuxDedup2 (the canonical structure). New AuxDedupMixed fixture covers a
perm mixing a canonical slot and PERM_OUT_OF_SCC for the same owner.
Documented in docs/ix_canonicity.md 6.5.

Call-site surgery guard is now durable across serialization: aux-regen
detection accepts `Named.original.is_some()` in addition to the
in-memory `aux_name_to_addr`, so deserialized-state roundtrip recompiles
no longer misapply surgery. Shift-aware `instantiate_rev` replaces
unshifted substitution in the type-walking helpers (fixes fvar leaks in
`.brecOn.go` bodies).

Byte-exact aux roundtrip: `roundtrip_block` Phase A now preseeds the
ref/univ tables (`preseed_expr_tables`) like every production compile
path. The serialized constant embeds those tables in sorted order;
compiling without the preseed filled them in traversal order instead,
permuting every `Ref`/univ index — byte-different but semantically
identical constants (decode resolves through the embedded table). This
silently failed the Phase-A address comparison against
`Named.original.0` for 1529 of 1545 aux constants (including plain
stdlib like `Nat.casesOn`); a debug probe proved
compile(original) == compile(regen) in every case, i.e. the
regeneration itself was always faithful.

With the invariant holding corpus-wide, the Phase-A recompile-hash
mismatch is now a hard error with no aux exemption, and every roundtrip
arm records failures in `aux_gen_errors` (recovery keeps the
Lean-facing env populated for diagnosis but is never silent). Pass-2
scope hygiene: the below-def roundtrip loop filters by the
original-gated `aux_members` like its sibling loops, so evaporated
`below_N` keep their faithful Pass-1 decompile. IX_ROUNDTRIP_DEBUG now
dumps hashed component scalars/hashes and runs an original-form
recompile probe for any mismatch.

Test fixes: kernel-tutorial `bad_raw_consts` inductive fixtures carry
recomputation-honest flags so compile-side `validate_ind_flags` no
longer poisons the shared tutorial env (73/335 -> 335/335, with the
kernel rejecting each bad fixture as designed); validate-aux seeds
match module-private fixture names via `privateToUserName?` and enable
the Canonicity prefix; Phase 4b gains per-module markers so a fully
absent identity group fails loudly when its fixture module is loaded
(previously vacuous at 0 pass / 0 fail, now 109 pass).

Gates: kernel-check-env 201296/201296; rust-compile all phases with 0
aux_gen errors, 0 mismatches, and 0 Phase-A address divergences on the
full 213k env (live and deserialized); validate-aux 0 failures at
4393-constant scope; rust-serialize byte-exact; kernel-ixon-roundtrip
143694/0; kernel-tutorial 335/335; cargo test and lake test green.
Behavior-neutral cleanups flagged by `cargo clippy --all-targets`:
map_or over map+unwrap_or and slice::contains in surgery.rs, an
enumerate loop for the motive-peeling walk in aux_motive_sigs, and
let-chain collapses for the inductive-flags fixup loops in decompile.rs
and kernel_egress.rs. Plus `cargo fmt` line-wrapping drift left over
from the previous commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant