Skip to content

Build #97 + #99 + #100 MVPs (recommended defaults): nudges, benchmark engine, multi-device sync#110

Merged
0bserver07 merged 3 commits into
mainfrom
feat/roadmap-mvps-97-99-100
Jul 3, 2026
Merged

Build #97 + #99 + #100 MVPs (recommended defaults): nudges, benchmark engine, multi-device sync#110
0bserver07 merged 3 commits into
mainfrom
feat/roadmap-mvps-97-99-100

Conversation

@0bserver07

Copy link
Copy Markdown
Owner

Builds the MVPs of the three designed roadmap issues (#97, #99, #100) with the maintainer-ratified recommended defaults. Three file-disjoint commits (only cli.py is shared, in non-overlapping regions), integrated clean, full suite green.

#97 — Active-surfacing nudges (hooks/proactive.py)

Phase 0 (governance) + Phase 1 (command-cluster nudge), default-off. A deterministic should_surface gate with per-session dedupe, frequency cap, cross-session cooldown, and dismiss-based adaptive quieting; state in a file-locked ~/.stackunderflow/proactive_state.json (never the DB, no writer contention). The shipped recall.py file-risk nudge is wrapped so default-off = byte-identical shipped behavior (zero regression); the command-cluster signal is precomputed on ingest and looked up O(1) via _normalise_command. No LLM/network on the hook path. 43 tests.

#99 — Comparative benchmark engine (reports/benchmark.py)

Observational, stratified (intent × size) benchmark over local history with statistical honesty: Wilson intervals, seeded bootstrap, Benjamini–Hochberg FDR, direct standardization (Simpson's-paradox-safe), sample floors, and "insufficient evidence" as a first-class verdict. services/benchmark_stats.py is stdlib-only. Rubric ratified in docs/specs/benchmark-rubric-v1.md (weights .45/.35/.20, τ=7.0, 90% CI). Route + benchmark CLI group (--json via the memory envelope) + recommend_model_for_task meta-agent tool + a Compare-tab "Which model wins" panel. Cost only ever read from session_mart. /api/benchmark warm = 1.7ms vs a 200ms budget. Move 0 unified task-classification into one canonical classify_task (tag/recommender tests kept green). 62 tests.

#100 — Multi-device sync MVP (stackunderflow/sync/)

Phase 1: one-way, client-side-encrypted, BYO-bucket backup of the Overview/Cost-core mart aggregates only (never transcripts/usage_events/price_book). age via pyrage; a narrow ObjectStore (boto3 + an in-memory fake); deterministic shards re-keyed local project_id → stable (provider, slug); two-phase manifest commit; skip-if-unchanged outbox. sync init/push/status. Additive v028 migration (sync_identity, sync_outbox); an import-guarded [sync] extra (version untouched). Default-off is byte-identical. 59 tests (crypto path gated on pyrage).

Verification

  • Full suite: 4001 passed, 2 skipped (the 1 transient failure was a wall-clock hook-latency p99 budget under concurrent build load — passes 5/5 in isolation, one of the known load-sensitive perf tests)
  • ruff --select E,F clean · tsc typecheck clean · vite build clean (benchmark panel bundled) · version guard green · contract validator green · test_pricing_invariants green
  • Recommended defaults applied throughout; all maintainer-owned knobs surfaced/configurable

🤖 Generated with Claude Code

Note

Supersedes #109 (its branch was deleted before merge). Fixes a platform-dependent float-precision assert in a #99 Wilson-interval test (lo == 0.0approx(0.0, abs=1e-9)) that failed only on the CI runner (2.8e-17 != 0.0).

0bserver07 and others added 3 commits July 3, 2026 10:18
…er + command-cluster nudge

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, aggregates-only, v028

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…CI-guarded, Compare-tab panel

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@socket-security

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedpypi/​boto3@​1.43.4099100100100100
Addedpypi/​pyrage@​1.3.0100100100100100

View full report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant