Build #97 + #99 + #100 MVPs (recommended defaults): nudges, benchmark engine, multi-device sync#109
Closed
0bserver07 wants to merge 3 commits into
Closed
Build #97 + #99 + #100 MVPs (recommended defaults): nudges, benchmark engine, multi-device sync#1090bserver07 wants to merge 3 commits into
0bserver07 wants to merge 3 commits into
Conversation
…er + command-cluster nudge Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, aggregates-only, v028 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…CI-guarded, Compare-tab panel Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds the MVPs of the three designed roadmap issues (#97, #99, #100) with the maintainer-ratified recommended defaults. Three file-disjoint commits (only
cli.pyis shared, in non-overlapping regions), integrated clean, full suite green.#97 — Active-surfacing nudges (
hooks/proactive.py)Phase 0 (governance) + Phase 1 (command-cluster nudge), default-off. A deterministic
should_surfacegate with per-session dedupe, frequency cap, cross-session cooldown, and dismiss-based adaptive quieting; state in a file-locked~/.stackunderflow/proactive_state.json(never the DB, no writer contention). The shippedrecall.pyfile-risk nudge is wrapped so default-off = byte-identical shipped behavior (zero regression); the command-cluster signal is precomputed on ingest and looked up O(1) via_normalise_command. No LLM/network on the hook path. 43 tests.#99 — Comparative benchmark engine (
reports/benchmark.py)Observational, stratified
(intent × size)benchmark over local history with statistical honesty: Wilson intervals, seeded bootstrap, Benjamini–Hochberg FDR, direct standardization (Simpson's-paradox-safe), sample floors, and "insufficient evidence" as a first-class verdict.services/benchmark_stats.pyis stdlib-only. Rubric ratified indocs/specs/benchmark-rubric-v1.md(weights .45/.35/.20, τ=7.0, 90% CI). Route +benchmarkCLI group (--jsonvia the memory envelope) +recommend_model_for_taskmeta-agent tool + a Compare-tab "Which model wins" panel. Cost only ever read fromsession_mart./api/benchmarkwarm = 1.7ms vs a 200ms budget. Move 0 unified task-classification into one canonicalclassify_task(tag/recommender tests kept green). 62 tests.#100 — Multi-device sync MVP (
stackunderflow/sync/)Phase 1: one-way, client-side-encrypted, BYO-bucket backup of the Overview/Cost-core mart aggregates only (never transcripts/
usage_events/price_book).ageviapyrage; a narrowObjectStore(boto3 + an in-memory fake); deterministic shards re-keyed localproject_id→ stable(provider, slug); two-phase manifest commit; skip-if-unchanged outbox.sync init/push/status. Additivev028migration (sync_identity,sync_outbox); an import-guarded[sync]extra (versionuntouched). Default-off is byte-identical. 59 tests (crypto path gated onpyrage).Verification
ruff --select E,Fclean ·tsctypecheck clean · vite build clean (benchmark panel bundled) · version guard green · contract validator green ·test_pricing_invariantsgreen🤖 Generated with Claude Code