From 0fe9cd1fe03ef3886ab5306723a635b0687a2c6a Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 25 Jun 2026 08:59:49 +0000
Subject: [PATCH 1/2] =?UTF-8?q?docs:=20add=20SDD=20&=20Spec=20Kit=20crash?=
 =?UTF-8?q?=20course=20(philosophy=20=E2=86=92=20shipped=20product)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Comprehensive guide covering the SDD philosophy (power inversion, core
principles), the software design approach (constitution-as-law, intent vs.
mechanism layers, gates, template-driven quality, spec persistence), a
command-by-command Spec Kit how-to, and a t0-to-shipped end-to-end walkthrough
that impersonates a senior AI engineer shipping an agentic product with a
human-in-the-loop frontend. Includes anti-patterns, team workflow, calibration
guidance, and cheat sheets.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01K7A9nEeKpPESM9ZQJcs2g1
---
 docs/guides/sdd-crash-course.md | 890 ++++++++++++++++++++++++++++++++
 1 file changed, 890 insertions(+)
 create mode 100644 docs/guides/sdd-crash-course.md

diff --git a/docs/guides/sdd-crash-course.md b/docs/guides/sdd-crash-course.md
new file mode 100644
index 0000000000..5bdf21a1fe
--- /dev/null
+++ b/docs/guides/sdd-crash-course.md
@@ -0,0 +1,890 @@
+# The SDD & Spec Kit Crash Course
+
+*From philosophy to a shipped product — how a senior engineer uses Spec-Driven
+Development as load-bearing infrastructure, not ceremony.*
+
+> **Who this is for.** Engineers who already ship software and want SDD to be a
+> real part of their lifecycle — not a demo. The running example is built by a
+> fictional senior AI engineer shipping an **agentic product with a real
+> frontend**, because that is where SDD earns its keep: non-determinism, evals,
+> human-in-the-loop review, and a UI that has to feel good.
+>
+> **How to read it.** Parts I–II are the *why* and the *mental model*. Part III
+> is the *reference* (every command, when to reach for it). Part IV is the
+> *t0 → shipped* walkthrough with real prompts. Part V is how to operate it like
+> a senior — including when **not** to use SDD.
+>
+> **A note on the code samples.** Prompts are shown in full because they are the
+> thing you actually type. Generated artifacts (`spec.md`, `plan.md`, etc.) are
+> shown as **captions and structure only** — Spec Kit writes the prose; your job
+> is to learn the *shape* so you can review it.
+
+---
+
+## Table of Contents
+
+- [Part I — The Philosophy](#part-i--the-philosophy)
+- [Part II — The Software Design Approach](#part-ii--the-software-design-approach)
+- [Part III — Spec Kit Features & How-Tos](#part-iii--spec-kit-features--how-tos)
+- [Part IV — End-to-End: t0 to a Shipped Product](#part-iv--end-to-end-t0-to-a-shipped-product)
+- [Part V — Operating SDD Like a Senior](#part-v--operating-sdd-like-a-senior)
+- [Appendix — Cheat Sheets](#appendix--cheat-sheets)
+
+---
+
+## Part I — The Philosophy
+
+### 1.1 The power inversion
+
+For decades, **code was the source of truth** and the spec was scaffolding —
+written once, approved, then quietly abandoned the moment "real work" began.
+The artifact you trusted was the one the machine ran.
+
+Spec-Driven Development **inverts that relationship**: the **specification
+becomes the durable, executable artifact**, and code becomes its *expression* —
+regenerable, replaceable, downstream. You no longer hand-translate intent into
+syntax and lose fidelity at every step. You maintain intent precisely enough
+that an AI agent can produce a faithful implementation from it, again and again.
+
+The practical consequence: when requirements change, you **change the spec and
+regenerate**, instead of archaeology-ing through code to reconstruct what was
+meant. The spec stops being documentation *about* the system and becomes the
+*lever that moves* the system.
+
+### 1.2 Why this is viable *now* (and wasn't before)
+
+This idea is old; what changed is that LLMs got good enough to do the
+"specification → working code" step with real fidelity. Three forces converge:
+
+- **Capability** — models can now interpret rich natural-language intent and
+  produce coherent, multi-file implementations.
+- **Cost of change collapsed** — re-deriving an implementation from an updated
+  spec is now minutes, not a sprint, so treating code as disposable is rational.
+- **Complexity demands it** — modern products carry constraints (compliance,
+  multi-stack, AI non-determinism) that *need* a structured intent layer to stay
+  coherent.
+
+The bottleneck moved. It is no longer "how fast can you type code." It is **"how
+precisely can you express what you actually want, and how well can you keep that
+expression honest."** SDD is the discipline for that.
+
+### 1.3 Core principles
+
+SDD rests on a handful of commitments:
+
+- **Intent before mechanism** — define the *what* and *why* completely before
+  the *how*. The stack is a downstream decision, not the starting point.
+- **Multi-step refinement over one-shot generation** — you do not "prompt once
+  and pray." You build a spec, interrogate it, plan against it, decompose it,
+  and only then implement. Each step is a checkpoint.
+- **Explicit uncertainty** — unknowns are *marked*, not silently guessed.
+  Ambiguity surfaces as `[NEEDS CLARIFICATION]`, not as a confident hallucination
+  three files deep.
+- **Guardrails over vibes** — organizational principles (the *constitution*) and
+  structured templates constrain the model toward good outcomes instead of
+  hoping the prompt was perfect.
+- **The spec is the contract** — reviews, debates, and decisions happen at the
+  spec/plan layer where they are cheap, not in a code review where they are
+  expensive and late.
+
+### 1.4 What SDD is **not**
+
+A senior recognizes SDD by what it refuses to be:
+
+- **Not "write a giant doc, then code by hand."** The artifacts are
+  *operational* — they drive generation. A spec nobody regenerates from is just
+  waterfall with extra steps.
+- **Not vibe coding.** Vibe coding optimizes for "something that runs." SDD
+  optimizes for "the thing we agreed to build, traceable to why."
+- **Not a replacement for engineering judgment.** The model drafts; **you** own
+  the constitution, the clarifications, the gate decisions, and the review. SDD
+  *concentrates* your judgment at the highest-leverage points.
+- **Not all-or-nothing.** You can run the full pipeline on a load-bearing feature
+  and skip straight to `/speckit.implement` on a throwaway. Match ceremony to
+  stakes (see §5.4).
+
+---
+
+## Part II — The Software Design Approach
+
+SDD is not just a workflow; it is an opinion about *how software should be
+designed*. Five ideas carry the weight.
+
+### 2.1 The constitution as architectural law
+
+Before any feature exists, you write a **constitution** — a small set of binding
+principles that govern *every* subsequent decision. It lives at
+`.specify/memory/constitution.md` and is referenced during specify, plan, and
+implement.
+
+Think of it as the project's **non-negotiable invariants**: code-quality rules,
+a testing mandate, UX consistency requirements, performance budgets, dependency
+discipline. Crucially, principles are **enforced as gates**, not posted as
+inspirational wallpaper. A plan that violates "Test-First" or "Minimal
+Dependencies" is supposed to *fail the gate* and force either a fix or an
+explicit, justified exception.
+
+> The original SDD essay frames these as numbered "Articles" (Library-First,
+> CLI Interface, Test-First, Simplicity, Anti-Abstraction, Integration-First,
+> etc.). The exact articles matter less than the move: **promote your hardest-won
+> engineering values into machine-checkable gates that the agent must pass
+> through.**
+
+### 2.2 Intent layer vs. mechanism layer
+
+SDD draws a hard line through your artifacts:
+
+| Layer | Artifact | Answers | Owner of truth |
+| --- | --- | --- | --- |
+| **Intent** | `spec.md` | *What* / *why* — user stories, requirements, acceptance criteria | Product + Eng, no stack |
+| **Mechanism** | `plan.md`, `research.md`, `data-model.md`, `contracts/` | *How* — stack, schemas, APIs, trade-offs | Eng |
+| **Execution** | `tasks.md` → code | *Do it* — ordered, testable units | Agent, reviewed by Eng |
+
+The discipline: **keep the stack out of the spec.** "Users can drag-and-drop
+albums" belongs in the spec. "We use React DnD Kit with optimistic updates"
+belongs in the plan. This separation is what lets the same intent be
+re-implemented in a different stack, and what keeps spec reviews about *product*
+rather than *frameworks*.
+
+### 2.3 Template-driven quality: how structure constrains the model
+
+Spec Kit's templates are not formatting — they are **guardrails that make LLMs
+produce better output**. Concretely, the templates:
+
+- **Prevent premature implementation detail** — the spec template structurally
+  pushes "how" content into later phases.
+- **Force explicit uncertainty** — `[NEEDS CLARIFICATION]` markers are *required*
+  where the input is silent, converting silent assumptions into visible
+  questions.
+- **Encode checklists and gates** — the model must walk a structured sequence
+  (requirement completeness, constitution check, simplicity/anti-abstraction
+  gates) rather than free-associate.
+- **Order file creation** — research before data-model before contracts before
+  tasks, so each artifact stands on a settled foundation.
+
+The insight worth internalizing: **a good template is a prompt that runs every
+time, on every feature, without you remembering to type it.** That is where
+consistency at scale comes from.
+
+### 2.4 The pre-implementation gates ("Phase −1")
+
+Before code is generated, the plan passes through gates derived from your
+constitution. Canonical examples:
+
+- **Simplicity gate** — are you adding more moving parts than the problem
+  requires? Default to fewer projects, fewer layers.
+- **Anti-abstraction gate** — are you wrapping frameworks in needless
+  indirection? Use the platform directly until you have a concrete reason not to.
+- **Integration-first gate** — are contracts defined and is testing wired against
+  real integrations before unit minutiae?
+
+A gate failure is a **feature, not a blocker**: it catches over-engineering and
+ambiguity while the cost of changing course is a paragraph edit, not a rewrite.
+
+### 2.5 Spec persistence: greenfield, brownfield, and evolution
+
+SDD does **not** prescribe a single way to keep `spec.md`/`plan.md`/`tasks.md`
+alive after requirements shift. You choose a persistence model:
+
+- **0→1 (greenfield)** — generate everything from high-level intent; the spec is
+  born with the code.
+- **Iterative enhancement (brownfield)** — add features to an existing system;
+  specs describe deltas and integrate with reality on disk.
+- **Spec evolution** — when requirements change, update the spec and re-derive,
+  using `/speckit.converge` to fold reality and intent back together.
+
+The senior move is to **decide your persistence model deliberately** at project
+start, and to treat the spec as a living asset under version control alongside
+the code — not a one-time generation prompt.
+
+---
+
+## Part III — Spec Kit Features & How-Tos
+
+Spec Kit is the toolkit that operationalizes SDD: a CLI (`specify`) that
+bootstraps your repo, and a set of agent slash-commands (`/speckit.*`) that drive
+the pipeline.
+
+### 3.1 Install and initialize
+
+```bash
+# Install the CLI (requires uv). Pin to the latest release tag.
+uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@vX.Y.Z
+
+# Scaffold a project wired for your agent. --integration picks the agent.
+specify init redaktor --integration copilot      # or: claude, gemini, codex, ...
+
+# Initialize inside an existing repo (brownfield):
+specify init . --here --integration copilot
+
+# Keep the CLI current:
+specify self check          # read-only: is a newer release available?
+specify self upgrade        # upgrade in place
+```
+
+> **Agent-agnostic, with one syntax note.** Most agents expose commands as
+> `/speckit.specify`, `/speckit.plan`, etc. **Codex CLI in skills mode** uses
+> `$speckit-*`; **GitHub Copilot CLI** uses `/agents` to select the agent. This
+> guide uses the `/speckit.*` form, and the examples are written against
+> **Claude Code**.
+
+What `init` gives you: the `.specify/` directory (constitution, templates,
+scripts), the `/speckit.*` commands registered with your agent, and the repo
+prepared for the `specs/NNN-feature/` layout.
+
+### 3.2 The core pipeline at a glance
+
+The spine of every feature is five commands. The supporting cast (clarify,
+analyze, checklist, converge, taskstoissues) wraps around them.
+
+```text
+/speckit.constitution   →  .specify/memory/constitution.md      (project law, once-ish)
+        │
+/speckit.specify        →  specs/NNN-feature/spec.md            (WHAT/WHY; new branch)
+        │   ⟂ /speckit.clarify   (de-risk ambiguity before planning)
+/speckit.plan           →  plan.md + research.md + data-model.md + contracts/ + quickstart.md
+        │   ⟂ /speckit.checklist (quality-test the requirements)
+/speckit.tasks          →  tasks.md                             (ordered, [P]-parallelizable)
+        │   ⟂ /speckit.analyze   (cross-artifact consistency audit, read-only)
+        │   ⟂ /speckit.taskstoissues (optional: fan out to GitHub issues)
+/speckit.implement      →  working code, task-by-task, marking [X] as it goes
+        │
+/speckit.converge       →  reconcile code-on-disk with spec; append missing work
+```
+
+Commands **hand off** to each other — after `specify`, the agent offers to jump
+to `plan` or `clarify`; after `plan`, to `tasks` or `checklist`; and so on. You
+can follow the rail or jump around as judgment dictates.
+
+### 3.3 Command-by-command how-to
+
+Each entry: **what it does · when to reach for it · example prompt · what it
+emits · senior tips.**
+
+#### `/speckit.constitution`
+
+- **What** — creates/updates the project constitution and keeps dependent
+  templates in sync.
+- **When** — once at project birth; revisit when a principle genuinely changes
+  (rare; version it).
+- **Prompt**
+
+  ```text
+  # Establish binding principles. Name the values you will NOT compromise,
+  # and say how they should arbitrate technical decisions.
+  /speckit.constitution Create principles focused on: (1) every behavior change
+  ships with a test, (2) AI components are evaluated, not just "tried" — every
+  prompt/agent has an eval set and a regression threshold, (3) the human is in
+  the loop for any irreversible action, (4) minimal dependencies and idempotent
+  file ops, (5) p95 latency budgets per endpoint. Include governance for how
+  these gate planning and implementation.
+  ```
+
+- **Emits** — `.specify/memory/constitution.md` (with a versioned "Sync Impact"
+  header).
+- **Tips** — keep it short and *enforceable*. A principle you can't imagine
+  failing a plan against is just decoration. Treat edits as semver: principle
+  changes are MAJOR.
+
+#### `/speckit.specify`
+
+- **What** — turns a natural-language feature description into a structured
+  spec; creates the feature branch and `specs/NNN-*/` directory.
+- **When** — the start of every feature.
+- **Prompt** — describe **what** and **why**; resist naming the stack.
+
+  ```text
+  # Note what's present: actors, flows, rules, edge cases. Note what's absent:
+  # frameworks, libraries, schemas. That's deliberate.
+  /speckit.specify Build a service that detects and masks personal data (PII)
+  in Turkish-language documents before they're shared. A user pastes or uploads
+  text; the system highlights detected PII (names, national IDs, IBANs, phones,
+  addresses) with a confidence level; a human reviewer can accept, reject, or
+  edit each detection; on confirmation the system returns a masked copy plus an
+  audit record. Reviewers must never be able to export a document with unresolved
+  high-confidence detections.
+  ```
+
+- **Emits** — `spec.md` with **user stories**, **functional requirements**,
+  **acceptance criteria**, and `[NEEDS CLARIFICATION]` markers where you were
+  silent. *(Caption only — learn the shape: numbered stories, testable
+  requirements, explicit edge cases.)*
+- **Tips** — over-specify the *rules and edge cases*; under-specify the
+  *mechanism*. If you catch yourself writing a library name, move it to the plan.
+
+#### `/speckit.clarify`
+
+- **What** — interrogates the spec and asks up to **5 targeted questions** about
+  underspecified areas, then encodes your answers back into `spec.md`.
+- **When** — immediately after `specify`, *before* `plan`, on anything
+  load-bearing. This is the cheapest bug-prevention in the whole pipeline.
+- **Prompt**
+
+  ```text
+  # Let it drive. Answer crisply; vague answers produce vague specs.
+  /speckit.clarify
+  ```
+
+- **Emits** — an updated `spec.md` with ambiguities resolved and clarifications
+  recorded.
+- **Tips** — if it asks something you "obviously" know, that's the point: the
+  spec didn't say it, so the implementation would have guessed. Answer it.
+
+#### `/speckit.plan`
+
+- **What** — produces the technical design from the spec and your stack input.
+- **When** — once the spec is clarified and you're ready to commit to *how*.
+- **Prompt** — *now* you bring the stack and constraints.
+
+  ```text
+  # This is where architecture lives. Be specific about stack, boundaries,
+  # and the things you've already decided.
+  /speckit.plan Build with: FastAPI + Python 3.12 backend, a hybrid PII pipeline
+  (deterministic regex/checksum validators for IDs/IBANs + an LLM extractor for
+  names/addresses), an LLM-as-judge verification pass, Postgres for audit records,
+  and a React + TypeScript review UI with optimistic accept/reject. Stream
+  detections to the UI via SSE. Keep the detection engine importable and testable
+  independently of the web layer. No PII leaves the boundary unmasked in logs.
+  ```
+
+- **Emits** — `plan.md`, plus (as warranted) `research.md` (trade-off studies),
+  `data-model.md` (schemas/entities), `contracts/` (API/event contracts), and
+  `quickstart.md` (key validation scenarios). *(Caption only.)*
+- **Tips** — read `research.md` and `contracts/` like a design doc, because that
+  is what they are. This is your cheapest chance to kill a bad abstraction.
+
+#### `/speckit.checklist`
+
+- **What** — generates a **"unit test for your requirements."** It validates the
+  *quality* of the spec (completeness, clarity, consistency, coverage) — **not**
+  the implementation.
+- **When** — after plan, before tasks, on domains where requirement quality is
+  the risk (UX, security, compliance, anything fuzzy).
+- **Prompt**
+
+  ```text
+  # Ask for a domain-scoped checklist. Think "are the requirements well-written?"
+  # not "does the button work?"
+  /speckit.checklist Generate a requirements-quality checklist for the human
+  review UX and the PII-handling rules: are confidence thresholds quantified?
+  is every detection category's masking behavior specified? are export-blocking
+  conditions unambiguous? are accessibility requirements for the reviewer
+  defined?
+  ```
+
+- **Emits** — a checklist of questions that probe whether your *spec* is
+  airtight. Items like *"Is 'high confidence' quantified?"* — not *"Test that
+  masking works."*
+- **Tips** — failing checklist items send you back to `specify`/`clarify`, not to
+  code. That's the loop working.
+
+#### `/speckit.tasks`
+
+- **What** — decomposes plan + design artifacts into an **ordered, dependency-
+  aware `tasks.md`**, marking parallelizable tasks `[P]`.
+- **When** — once the plan is settled.
+- **Prompt**
+
+  ```text
+  /speckit.tasks Break the plan into tasks
+  ```
+
+- **Emits** — `tasks.md`: setup → foundational → per-story → polish, each task
+  small and testable, with `[P]` on independent ones. *(Caption only — note the
+  phasing; you'll scope `implement` against it.)*
+- **Tips** — skim for tasks that are too big to verify; a task you can't write a
+  test for is a task that's underspecified upstream.
+
+#### `/speckit.analyze`
+
+- **What** — a **non-destructive, read-only** cross-artifact audit. Checks
+  `spec.md`, `plan.md`, and `tasks.md` for inconsistencies, gaps, and drift.
+- **When** — after tasks, before implement; and any time you suspect artifacts
+  have diverged.
+- **Prompt**
+
+  ```text
+  /speckit.analyze Run a project analysis for consistency
+  ```
+
+- **Emits** — a report of mismatches (e.g., a requirement with no task, a task
+  with no requirement, a contract the plan never mentions). It **changes
+  nothing**; you decide the fixes.
+- **Tips** — treat a clean `analyze` as your "ready to implement" gate.
+
+#### `/speckit.taskstoissues` *(optional)*
+
+- **What** — converts `tasks.md` into dependency-ordered **GitHub issues**.
+- **When** — when work is shared across a team or you want the backlog tracked in
+  GitHub rather than a file.
+- **Prompt**
+
+  ```text
+  /speckit.taskstoissues Create GitHub issues from the tasks, preserving order
+  and dependencies, labeled by phase.
+  ```
+
+- **Emits** — GitHub issues (via the GitHub MCP server) mirroring your task DAG.
+- **Tips** — great for human/AI hand-offs: a teammate or another agent can pick up
+  an issue with full spec context linked.
+
+#### `/speckit.implement`
+
+- **What** — executes `tasks.md`, building the code task-by-task and marking each
+  `[X]` as it completes.
+- **When** — after a clean analyze. This is the generation step.
+- **Prompt** — and critically, **how to scope it** (see §3.5).
+
+  ```text
+  # Full run for small features:
+  /speckit.implement Start the implementation in phases
+
+  # Scoped run for large ones (recommended — keeps context healthy):
+  /speckit.implement only execute the Setup and Foundational phases, then stop
+  and report progress
+  ```
+
+- **Emits** — working code, committed task-by-task; `tasks.md` updated with `[X]`.
+- **Tips** — review *per phase*, not at the end. The completed-task markers mean
+  you can stop, review, and resume without losing your place.
+
+#### `/speckit.converge`
+
+- **What** — assesses **code-on-disk against spec/plan/tasks**, then **appends
+  the remaining unbuilt work** as new tasks so `implement` can finish it.
+- **When** — brownfield reality checks; after manual edits; when a long
+  implementation drifted; whenever "what's actually built?" ≠ "what we said."
+- **Prompt**
+
+  ```text
+  /speckit.converge Assess the codebase against the spec and append any missing
+  work as tasks.
+  ```
+
+- **Emits** — newly appended tasks in `tasks.md` capturing the delta between
+  intent and reality. Then you re-run `implement`.
+- **Tips** — this is the *spec-evolution* workhorse. Change the spec, run
+  `converge`, run `implement` — that's how a living spec stays load-bearing.
+
+### 3.4 Extensions, presets, and bundles
+
+Spec Kit is customizable so it can match *your* lifecycle, not just the default.
+
+- **Extensions** — opt-in command packs that hook into the pipeline. Built-ins
+  include:
+  - **`git`** — commit/validate/feature/remote helpers so SDD phases map cleanly
+    to branches and commits.
+  - **`bug`** — `assess` → `test` → `fix`: a spec-driven *bug* workflow (reproduce
+    with a failing test first, then fix).
+  - **`agent-context`** — keeps the agent's working context updated as the project
+    grows.
+  - Extensions hook at defined points (`before_specify`, `before_plan`, …) via
+    `.specify/extensions.yml`.
+- **Presets** — alternative command/template sets. **`lean`** trims the pipeline
+  for speed; **`scaffold`** is a starting point for authoring your own; the
+  templates are yours to fork.
+- **Bundles** — role-based setups: **developer**, **product-manager**,
+  **business-analyst**, **security-researcher**. A bundle pre-wires the commands
+  and emphasis a given role needs.
+
+> Senior framing: extensions/presets/bundles are how you **encode your team's
+> SDLC into the tool**. The default pipeline is a strong opinion; these let you
+> make it *your* opinion.
+
+### 3.5 Handling complex features (the context problem)
+
+Big features sail through specify/plan/tasks and then **degrade mid-
+`implement`** — the agent loses the plan, skips tasks, or hallucinates as the
+context window fills and compaction kicks in. The root cause is context
+exhaustion, and there are four fixes:
+
+```text
+# 1) Scope each run — the simplest fix, works on any agent:
+/speckit.implement only execute tasks T001-T010, then stop and report progress
+
+# 2) Delegate to sub-agents (if your agent supports them):
+/speckit.implement delegate each parallel [P] task to a sub-agent
+
+# 3) Combine scoping + delegation for very large features:
+/speckit.implement execute only the Core phase, delegate [P] tasks to sub-agents
+
+# 4) Decompose into smaller specs ("spec of specs") — when even one phase is
+#    too big, split the feature into sub-features, each with its own
+#    spec/plan/tasks cycle. Highest overhead; reserve for genuinely huge work.
+```
+
+Because completed tasks are marked `[X]`, scoped runs resume cleanly. **Default
+to option 1**; reach for sub-agents when you want parallelism; decompose only
+when a single phase still overflows.
+
+---
+
+## Part IV — End-to-End: t0 to a Shipped Product
+
+> **The cast.** Deniz, a senior AI engineer, is shipping **Redaktör** — an
+> agentic Turkish-document **PII detection & masking** service with a
+> **human-in-the-loop review UI**. It's representative of real agentic-product
+> work: non-deterministic model components, an eval discipline, a human approval
+> step, and a frontend that has to feel trustworthy.
+>
+> Watch for the **load-bearing moves** (⚖️) — the moments where SDD does real
+> work a senior refuses to skip. The point isn't the product; it's the *operating
+> discipline*.
+
+### t0 — Frame the problem before touching the tool
+
+Deniz doesn't open the agent first. He writes three sentences in a notebook:
+*who hurts today, what "done" means, what must never happen.* For Redaktör:
+"Compliance teams manually redact Turkish docs; done = reviewer-approved masked
+copy + audit trail; **never** export a doc with unresolved high-confidence PII."
+
+⚖️ **Load-bearing move:** the spec is only as good as the intent behind it. Five
+minutes of human framing prevents an hour of regenerating a confidently-wrong
+spec.
+
+### Phase 1 — Bootstrap & constitution
+
+```bash
+specify init redaktor --integration claude
+cd redaktor
+```
+
+Then, before any feature, he encodes the **values that arbitrate every later
+decision** — and for an AI product, that explicitly includes *evals* and
+*human-in-the-loop*:
+
+```text
+# The constitution is where an AI product's hardest invariants become gates.
+/speckit.constitution Create binding principles: (1) every behavior change ships
+with a test; (2) NON-NEGOTIABLE — every AI component (extractor, judge) has a
+versioned eval set with a regression threshold; promotion requires passing it;
+(3) the human reviewer is the final authority — no irreversible export without
+explicit approval; (4) PII never appears in logs, traces, or error messages;
+(5) the detection engine is a library, importable and testable without the web
+layer; (6) p95 detection latency budget: 2s for a 2-page document. Add governance:
+plans that violate these must fail the gate or carry a written, justified
+exception.
+```
+
+⚖️ **Load-bearing move:** principle (2) turns "we tried the prompt and it seemed
+fine" into a *gate*. Principle (4) will later cause `analyze` and review to flag
+any logging task that touches raw text. The constitution is doing architecture.
+
+### Phase 2 — Specify the *what*, ruthlessly stack-free
+
+```text
+# Actors, flows, rules, edge cases. Zero frameworks. Zero model names.
+/speckit.specify Build a service that detects and masks personal data in
+Turkish-language documents before sharing. A reviewer pastes or uploads text;
+the system streams back detected PII (full names, T.C. national IDs, IBANs,
+phone numbers, postal addresses) each with a category and a confidence level;
+the reviewer accepts, rejects, or edits each detection inline; on confirmation
+the system returns a masked copy and writes an immutable audit record (who,
+when, what changed). Hard rule: the system must block export while any
+high-confidence detection is unresolved. Documents may be up to 20 pages.
+```
+
+The agent creates branch `001-pii-masking` and `specs/001-pii-masking/spec.md`
+with user stories, functional requirements, acceptance criteria — and a few
+`[NEEDS CLARIFICATION]` markers where Deniz was silent.
+
+> *(Caption of `spec.md`, not its contents:)* §User Stories (reviewer-centric),
+> §Functional Requirements (FR-1 detect, FR-2 stream w/ confidence, FR-3 inline
+> resolve, FR-4 export-block rule, FR-5 audit), §Edge Cases (empty doc, all-PII
+> doc, mixed-language doc), §`[NEEDS CLARIFICATION]` on retention + masking
+> format.
+
+⚖️ **Load-bearing move:** Deniz *notices* he never said how masking should look
+(`[ALİ YILMAZ]` → `[AD-SOYAD]`? black box? partial?) — but he doesn't fix it by
+editing prose. He lets `clarify` extract it, so the resolution is recorded
+systematically.
+
+### Phase 3 — Clarify: spend questions now, not bugs later
+
+```text
+/speckit.clarify
+```
+
+The agent asks five sharp questions. Deniz answers crisply:
+
+```text
+# (paraphrased Q→A — answers get encoded back into spec.md)
+Q: Confidence threshold for "high"?        → ≥ 0.85 blocks export; 0.6–0.85 warns.
+Q: Masking representation?                 → category tokens, e.g. [TCKN], [IBAN].
+Q: Retention of original text?             → originals purged after audit write;
+                                             audit stores only masked + diff.
+Q: Multi-language docs?                    → detect TR PII only; flag non-TR spans.
+Q: Who can override an export block?       → nobody in v1; it's a hard stop.
+```
+
+⚖️ **Load-bearing move:** "nobody can override the export block" is now a
+*requirement with a test target*, not a tribal assumption. This single
+clarification will shape a contract, a task, and a regression test downstream.
+
+### Phase 4 — Plan: now, and only now, the architecture
+
+```text
+# Stack, boundaries, and the decisions already made enter here.
+/speckit.plan Build with: FastAPI + Python 3.12; a HYBRID detection pipeline —
+deterministic validators (regex + checksum) for T.C. IDs/IBANs/phones, and an
+LLM extractor for names/addresses; an LLM-as-judge pass that scores each
+candidate and assigns confidence; Postgres for immutable audit records; React +
+TypeScript review UI with optimistic accept/reject and SSE streaming of
+detections. Keep `redaktor.engine` importable and web-independent. Structured
+logging with a redaction filter so raw text can never be logged. Provide an eval
+harness (golden Turkish docs with labeled PII) for both extractor and judge.
+```
+
+> *(Caption of emitted design artifacts:)*
+>
+> - `research.md` — trade study: regex-only vs. LLM-only vs. **hybrid** (chosen:
+>   hybrid for precision on structured IDs + recall on names); SSE vs.
+>   WebSocket (chosen: SSE, one-way stream).
+> - `data-model.md` — entities: `Document`, `Detection{category, span,
+>   confidence, status}`, `AuditRecord`.
+> - `contracts/` — `POST /documents`, `GET /documents/{id}/detections` (SSE),
+>   `POST /detections/{id}/resolve`, `POST /documents/{id}/export` (enforces the
+>   block rule).
+> - `quickstart.md` — the golden-path validation scenario.
+
+⚖️ **Load-bearing move:** Deniz reads `research.md` and pushes back: the judge
+adds latency. He keeps it but adds a plan note — *deterministic validators
+short-circuit the judge for checksum-valid IDs* — protecting the 2s p95 budget
+from principle (6). This is a design debate happening in `plan.md`, where it's
+free.
+
+### Phase 5 — Checklist the requirements, then task it out
+
+Because the **review UX** and **PII rules** are the real risk, he quality-tests
+the *requirements* first:
+
+```text
+/speckit.checklist Generate a requirements-quality checklist for the review UX
+and PII rules: is every detection category's masking token specified? is the
+export-block condition unambiguous and testable? are reviewer keyboard/a11y
+requirements defined? is the SSE reconnect behavior on dropped connection
+specified?
+```
+
+Two items fail — SSE reconnect behavior and a11y were never specified. He loops
+back through `clarify` to fix the *spec*, not the code. Then:
+
+```text
+/speckit.tasks Break the plan into tasks
+```
+
+> *(Caption of `tasks.md`:)* Setup (repo, CI, db) → Foundational (engine library,
+> validators, eval harness) → Story A (detect+stream) → Story B (resolve UI) →
+> Story C (export-block + audit) → Polish (a11y, latency, observability). `[P]`
+> on the independent validator and UI-component tasks.
+
+### Phase 6 — Analyze: the "ready to build" gate
+
+```text
+/speckit.analyze Run a project analysis for consistency
+```
+
+The read-only audit flags one real gap: **no task covers the "non-TR span"
+flagging** from the clarify step. Deniz adds it (re-runs `tasks`), re-analyzes,
+gets a clean report.
+
+⚖️ **Load-bearing move:** a clean `analyze` is his hard gate to implementation. A
+requirement with no task is a feature that silently won't exist — caught here for
+the price of a re-read.
+
+### Phase 7 — Implement, scoped by phase
+
+He never one-shots a feature this size. He scopes by phase to keep context
+healthy (§3.5):
+
+```text
+# 1) Foundation first — the engine + evals must exist before stories.
+/speckit.implement only execute the Setup and Foundational phases, then stop and
+report progress
+
+# (Deniz reviews: runs the eval harness, confirms validators pass on golden IDs.)
+
+# 2) Then story by story, reviewing each:
+/speckit.implement execute Story A (detect + stream), then stop
+
+# 3) Large parallelizable polish — delegate if the agent supports sub-agents:
+/speckit.implement execute the Polish phase, delegate [P] tasks to sub-agents
+```
+
+Each task lands as `[X]`. Between phases, Deniz **runs the app and the evals**,
+not just the unit tests — because for an AI component, "tests pass" ≠ "it
+detects well." The eval set from principle (2) is the real acceptance gate.
+
+⚖️ **Load-bearing move:** reviewing per phase means a bad masking decision is
+caught after Story A, not after the whole product is wired. The `[X]` markers let
+him stop and resume without losing the thread.
+
+### Phase 8 — Converge: reconcile reality with intent
+
+Mid-build, product asks for one change: *warn-level detections (0.6–0.85) should
+also be reviewable, not just high-confidence ones.* Deniz does the
+**spec-evolution loop**, not a hand-patch:
+
+```text
+# 1) Update the spec/clarify with the new rule, then:
+/speckit.converge Assess the codebase against the updated spec and append any
+missing work as tasks.
+
+# 2) converge appends the delta tasks (UI filter for warn-level, export rule
+#    stays high-confidence-only). Then finish them:
+/speckit.implement execute the newly appended tasks
+```
+
+⚖️ **Load-bearing move:** the requirement change entered through the *spec* and
+propagated *down* — not as a stray commit. The spec stays the source of truth, so
+the next engineer (or agent) reads intent that matches reality.
+
+### Phase 9 — Ship: CI, audit, and the human gate
+
+Before release, Deniz wires the lifecycle closure (often via the `git` extension
+and his CI):
+
+- **CI gates** run the test suite *and* the eval suite; a regression below the
+  eval threshold **fails the build** (principle 2, now load-bearing in CI).
+- **A redaction-filter test** asserts no raw PII reaches logs (principle 4).
+- **An export-block test** asserts the hard stop from the clarify step.
+- He has the agent fan remaining work to issues for a teammate:
+
+  ```text
+  /speckit.taskstoissues Create GitHub issues for the remaining polish tasks,
+  preserving dependency order, labeled by phase.
+  ```
+
+Redaktör ships: a reviewer pastes a document, watches detections stream in with
+confidence, resolves each inline, and exports a masked copy with an audit trail —
+and **cannot** export while a high-confidence detection is unresolved, because
+that rule has been load-bearing since Phase 3.
+
+### What made this "senior," in one breath
+
+The model wrote most of the code. **Deniz owned the leverage points**: framing
+intent (t0), promoting invariants to gates (constitution), spending clarify
+questions early, debating architecture in `plan.md`, gating on a clean `analyze`,
+reviewing *per phase* against **evals** not vibes, and routing every change
+through the spec via `converge`. SDD didn't replace his judgment — it
+*concentrated* it where it mattered.
+
+---
+
+## Part V — Operating SDD Like a Senior
+
+### 5.1 Habits that separate load-bearing SDD from theater
+
+- **Always `clarify` before `plan`** on anything that matters. It's the highest
+  ROI command in the kit.
+- **Read `research.md` and `contracts/`.** If you're not reviewing the design
+  artifacts, you're vibe coding with extra files.
+- **Gate on a clean `analyze`.** Make "no orphan requirements, no orphan tasks"
+  your definition of implementation-ready.
+- **Scope `implement` by phase, review between phases.** Don't discover problems
+  at the end.
+- **Route changes through the spec → `converge` → `implement`.** Hand-patches rot
+  the spec; once it's stale, it stops being load-bearing and you're back to
+  code-as-truth.
+
+### 5.2 Anti-patterns to refuse
+
+- **Stack in the spec.** Frameworks in `spec.md` poison the intent layer and
+  block re-implementation.
+- **Skipping clarify to "save time."** You don't save it; you move it to
+  debugging, at 10× the cost.
+- **One-shot `implement` on a big feature.** Context exhaustion will corrupt the
+  back half of the run.
+- **A constitution of platitudes.** If a principle can't fail a plan, delete it.
+- **Treating generated artifacts as write-once.** A spec you never regenerate
+  from is documentation, not SDD.
+
+### 5.3 Fitting SDD to a team
+
+- Use **`taskstoissues`** to turn a task DAG into a shared backlog; each issue
+  carries spec context, so humans and agents can both pick up work.
+- Use **bundles** (developer / PM / BA / security) so each role gets the
+  pipeline emphasis they need.
+- Use **extensions** (`git`, `bug`, `agent-context`) to bind SDD phases to your
+  real branch/commit/triage flow.
+- Keep `spec.md`/`plan.md`/`tasks.md` **in version control** next to the code and
+  review them in PRs — the spec diff *is* the design review.
+
+### 5.4 When **not** to reach for the full pipeline
+
+SDD ceremony should match stakes:
+
+| Situation | Reach for |
+| --- | --- |
+| Load-bearing feature, multiple stories, real risk | Full pipeline + clarify + checklist + analyze |
+| Small, well-understood change | specify → plan → tasks → implement (skip the wrappers) |
+| Throwaway spike / proof of concept | Straight to `/speckit.implement` with a tight prompt |
+| Bug fix | The `bug` extension (assess → failing test → fix) |
+| Requirements changed on a live feature | `clarify`/edit spec → `converge` → `implement` |
+
+The senior skill is calibration: enough structure to stay honest, not so much
+that it's theater.
+
+### 5.5 The one-sentence summary
+
+**Make intent the durable artifact, promote your invariants to gates, spend your
+judgment at the spec/plan/analyze checkpoints, and let the agent regenerate the
+code — so that when the world changes, you change the spec and the system
+follows.**
+
+---
+
+## Appendix — Cheat Sheets
+
+### A. Command map
+
+| Command | Phase | Emits | Reach for it when |
+| --- | --- | --- | --- |
+| `/speckit.constitution` | Govern | `constitution.md` | Project birth; principle change |
+| `/speckit.specify` | Intent | `spec.md` (+ branch) | Every new feature |
+| `/speckit.clarify` | Intent | updated `spec.md` | Before planning anything load-bearing |
+| `/speckit.plan` | Design | `plan.md`, `research.md`, `data-model.md`, `contracts/`, `quickstart.md` | Spec is clarified |
+| `/speckit.checklist` | Design | requirements-quality checklist | Fuzzy/high-risk domains (UX, security) |
+| `/speckit.tasks` | Decompose | `tasks.md` | Plan is settled |
+| `/speckit.analyze` | Gate | read-only consistency report | Before implement; on suspected drift |
+| `/speckit.taskstoissues` | Decompose | GitHub issues | Team/shared backlog |
+| `/speckit.implement` | Build | code + `[X]` tasks | After a clean analyze |
+| `/speckit.converge` | Evolve | appended delta tasks | Spec changed; brownfield reconcile |
+
+### B. Artifact graph
+
+```text
+constitution.md ──governs──▶ spec.md ──derives──▶ plan.md ─┬─▶ research.md
+                                  ▲                          ├─▶ data-model.md
+                            clarify│ checklist               ├─▶ contracts/
+                                  │                          └─▶ quickstart.md
+                                  │                                  │
+                                  └────────── analyze (audits all) ──┤
+                                                                     ▼
+                                                                 tasks.md ──▶ code
+                                                                     ▲          │
+                                                                     └ converge ┘
+```
+
+### C. Glossary
+
+- **Constitution** — binding project principles, enforced as plan-time gates.
+- **Gate (Phase −1)** — a pre-implementation check (simplicity, anti-abstraction,
+  integration-first) a plan must pass or explicitly waive.
+- **`[NEEDS CLARIFICATION]`** — an explicit uncertainty marker; a question, not a
+  guess.
+- **Persistence model** — your chosen strategy for keeping specs alive
+  (greenfield / brownfield / evolution).
+- **`[P]` task** — a task with no blocking dependency; parallelizable.
+- **Converge** — reconcile code-on-disk with intent by appending the missing
+  work as tasks.
+- **Bundle / Preset / Extension** — role setups / alternative pipelines / opt-in
+  command packs that tailor Spec Kit to your SDLC.
+
+---
+
+*Further reading in this repo: [`spec-driven.md`](../../spec-driven.md) (the
+full SDD essay and the constitutional articles),
+[`docs/concepts/sdd.md`](../concepts/sdd.md),
+[`docs/concepts/complex-features.md`](../concepts/complex-features.md), and
+[`docs/guides/evolving-specs.md`](./evolving-specs.md).*

From a58091e25877625eb80b8e7ee2c1129489cc5d5a Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 25 Jun 2026 09:21:56 +0000
Subject: [PATCH 2/2] docs: rewrite SDD crash course to teach judgment over
 mechanics
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Replace the feature-catalog/happy-path draft with a field guide centered on how
a senior engineer actually thinks in SDD: the model as a fast, confidently-wrong
collaborator; SDD as the apparatus that makes disagreement cheap and early; and
the irreducible human work the tool can't do. The end-to-end example now goes
wrong on purpose — a misframed problem at t0, a clean spec aimed at the wrong
target, an over-engineered plan, an architecture inverted toward determinism,
model drift caught by a pre-built trap, and a spec found wrong at demo and
recovered cheaply via converge. Adds an adversarial artifact 'smell guide', an
honest section on where SDD fights you, and process-calibration guidance.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01K7A9nEeKpPESM9ZQJcs2g1
---
 docs/guides/sdd-crash-course.md | 1163 +++++++++++--------------------
 1 file changed, 389 insertions(+), 774 deletions(-)

diff --git a/docs/guides/sdd-crash-course.md b/docs/guides/sdd-crash-course.md
index 5bdf21a1fe..222b833e3c 100644
--- a/docs/guides/sdd-crash-course.md
+++ b/docs/guides/sdd-crash-course.md
@@ -1,890 +1,505 @@
-# The SDD & Spec Kit Crash Course
+# Thinking in SDD
 
-*From philosophy to a shipped product — how a senior engineer uses Spec-Driven
-Development as load-bearing infrastructure, not ceremony.*
+*A field guide to Spec-Driven Development for engineers who already know the
+work is in the judgment, not the typing.*
 
-> **Who this is for.** Engineers who already ship software and want SDD to be a
-> real part of their lifecycle — not a demo. The running example is built by a
-> fictional senior AI engineer shipping an **agentic product with a real
-> frontend**, because that is where SDD earns its keep: non-determinism, evals,
-> human-in-the-loop review, and a UI that has to feel good.
+> This is not a tour of Spec Kit's commands. The command list is at the bottom in
+> one table; you'll learn it in an afternoon. What takes years is knowing **what
+> to distrust, when to go backwards, and which mistakes to make cheaply** — and
+> that's what SDD is actually for. This guide teaches the thinking. The demo in
+> the middle goes wrong on purpose, because that's where the lessons live.
 >
-> **How to read it.** Parts I–II are the *why* and the *mental model*. Part III
-> is the *reference* (every command, when to reach for it). Part IV is the
-> *t0 → shipped* walkthrough with real prompts. Part V is how to operate it like
-> a senior — including when **not** to use SDD.
->
-> **A note on the code samples.** Prompts are shown in full because they are the
-> thing you actually type. Generated artifacts (`spec.md`, `plan.md`, etc.) are
-> shown as **captions and structure only** — Spec Kit writes the prose; your job
-> is to learn the *shape* so you can review it.
-
----
-
-## Table of Contents
-
-- [Part I — The Philosophy](#part-i--the-philosophy)
-- [Part II — The Software Design Approach](#part-ii--the-software-design-approach)
-- [Part III — Spec Kit Features & How-Tos](#part-iii--spec-kit-features--how-tos)
-- [Part IV — End-to-End: t0 to a Shipped Product](#part-iv--end-to-end-t0-to-a-shipped-product)
-- [Part V — Operating SDD Like a Senior](#part-v--operating-sdd-like-a-senior)
-- [Appendix — Cheat Sheets](#appendix--cheat-sheets)
+> Prompts are shown in full (they're what you type). Generated artifacts are
+> shown as captions — you should learn their *shape*, not memorize someone else's
+> spec.
 
 ---
 
-## Part I — The Philosophy
-
-### 1.1 The power inversion
-
-For decades, **code was the source of truth** and the spec was scaffolding —
-written once, approved, then quietly abandoned the moment "real work" began.
-The artifact you trusted was the one the machine ran.
-
-Spec-Driven Development **inverts that relationship**: the **specification
-becomes the durable, executable artifact**, and code becomes its *expression* —
-regenerable, replaceable, downstream. You no longer hand-translate intent into
-syntax and lose fidelity at every step. You maintain intent precisely enough
-that an AI agent can produce a faithful implementation from it, again and again.
-
-The practical consequence: when requirements change, you **change the spec and
-regenerate**, instead of archaeology-ing through code to reconstruct what was
-meant. The spec stops being documentation *about* the system and becomes the
-*lever that moves* the system.
-
-### 1.2 Why this is viable *now* (and wasn't before)
-
-This idea is old; what changed is that LLMs got good enough to do the
-"specification → working code" step with real fidelity. Three forces converge:
-
-- **Capability** — models can now interpret rich natural-language intent and
-  produce coherent, multi-file implementations.
-- **Cost of change collapsed** — re-deriving an implementation from an updated
-  spec is now minutes, not a sprint, so treating code as disposable is rational.
-- **Complexity demands it** — modern products carry constraints (compliance,
-  multi-stack, AI non-determinism) that *need* a structured intent layer to stay
-  coherent.
-
-The bottleneck moved. It is no longer "how fast can you type code." It is **"how
-precisely can you express what you actually want, and how well can you keep that
-expression honest."** SDD is the discipline for that.
-
-### 1.3 Core principles
-
-SDD rests on a handful of commitments:
-
-- **Intent before mechanism** — define the *what* and *why* completely before
-  the *how*. The stack is a downstream decision, not the starting point.
-- **Multi-step refinement over one-shot generation** — you do not "prompt once
-  and pray." You build a spec, interrogate it, plan against it, decompose it,
-  and only then implement. Each step is a checkpoint.
-- **Explicit uncertainty** — unknowns are *marked*, not silently guessed.
-  Ambiguity surfaces as `[NEEDS CLARIFICATION]`, not as a confident hallucination
-  three files deep.
-- **Guardrails over vibes** — organizational principles (the *constitution*) and
-  structured templates constrain the model toward good outcomes instead of
-  hoping the prompt was perfect.
-- **The spec is the contract** — reviews, debates, and decisions happen at the
-  spec/plan layer where they are cheap, not in a code review where they are
-  expensive and late.
-
-### 1.4 What SDD is **not**
-
-A senior recognizes SDD by what it refuses to be:
-
-- **Not "write a giant doc, then code by hand."** The artifacts are
-  *operational* — they drive generation. A spec nobody regenerates from is just
-  waterfall with extra steps.
-- **Not vibe coding.** Vibe coding optimizes for "something that runs." SDD
-  optimizes for "the thing we agreed to build, traceable to why."
-- **Not a replacement for engineering judgment.** The model drafts; **you** own
-  the constitution, the clarifications, the gate decisions, and the review. SDD
-  *concentrates* your judgment at the highest-leverage points.
-- **Not all-or-nothing.** You can run the full pipeline on a load-bearing feature
-  and skip straight to `/speckit.implement` on a throwaway. Match ceremony to
-  stakes (see §5.4).
+## The one idea everything else hangs on
+
+You are working with a collaborator that is **fast, fluent, tireless, and
+confidently wrong in predictable ways**. It optimizes for *plausibility* — output
+that looks like what a good answer looks like — not for *correctness*. It will
+hand you a spec that reads beautifully and solves the wrong problem. It will
+propose an architecture that's clean, conventional, and two layers heavier than
+your problem needs. It will agree with whatever you said last. And it never tells
+you what it doesn't know unless you force it to.
+
+Given that collaborator, the value of SDD is not "specs generate code." It's
+this: **SDD front-loads disagreement to the cheapest possible layer.** A wrong
+assumption caught in `spec.md` costs a sentence. The same assumption caught in
+code costs a day. Caught in production, it costs a customer. Every loop in the
+process — clarify, checklist, analyze, converge — exists to drag a future
+expensive mistake forward into a present cheap one.
+
+So the senior's stance toward the whole apparatus is **adversarial, not
+compliant.** You are not filling out a workflow. You are running a series of cheap
+experiments designed to find out where you and the model are wrong *before* it's
+expensive to be wrong. If a step isn't surfacing disagreement, you're doing it as
+theater.
+
+Three corollaries fall out of this, and the rest of the guide is just these three
+applied:
+
+1. **Every artifact the model produces is a suspect, not a deliverable.** Your
+   job when you read a generated spec or plan is to attack it: *what did it assume
+   that I didn't say? what failure mode does this gloss? where is it being
+   plausible instead of right?*
+
+2. **The process is non-linear because discovery is non-linear.** You will learn
+   things during planning that invalidate the spec, and things during
+   implementation that invalidate the plan. A senior expects this and *instruments
+   for it*. Going backwards isn't failure; refusing to is.
+
+3. **The hardest part isn't writing the spec — it's knowing what you actually
+   want, and you usually don't, at the start.** The real problem is often not the
+   one you typed. The process's job is to help you find the real one before you've
+   built the wrong one.
 
 ---
 
-## Part II — The Software Design Approach
-
-SDD is not just a workflow; it is an opinion about *how software should be
-designed*. Five ideas carry the weight.
-
-### 2.1 The constitution as architectural law
-
-Before any feature exists, you write a **constitution** — a small set of binding
-principles that govern *every* subsequent decision. It lives at
-`.specify/memory/constitution.md` and is referenced during specify, plan, and
-implement.
+## What the model is good at, and what is yours alone
 
-Think of it as the project's **non-negotiable invariants**: code-quality rules,
-a testing mandate, UX consistency requirements, performance budgets, dependency
-discipline. Crucially, principles are **enforced as gates**, not posted as
-inspirational wallpaper. A plan that violates "Test-First" or "Minimal
-Dependencies" is supposed to *fail the gate* and force either a fix or an
-explicit, justified exception.
+Keep this division sharp, because most wasted effort comes from blurring it.
 
-> The original SDD essay frames these as numbered "Articles" (Library-First,
-> CLI Interface, Test-First, Simplicity, Anti-Abstraction, Integration-First,
-> etc.). The exact articles matter less than the move: **promote your hardest-won
-> engineering values into machine-checkable gates that the agent must pass
-> through.**
-
-### 2.2 Intent layer vs. mechanism layer
-
-SDD draws a hard line through your artifacts:
-
-| Layer | Artifact | Answers | Owner of truth |
+| The model is genuinely good at | Which means you should | And it is bad at | Which means you must never |
 | --- | --- | --- | --- |
-| **Intent** | `spec.md` | *What* / *why* — user stories, requirements, acceptance criteria | Product + Eng, no stack |
-| **Mechanism** | `plan.md`, `research.md`, `data-model.md`, `contracts/` | *How* — stack, schemas, APIs, trade-offs | Eng |
-| **Execution** | `tasks.md` → code | *Do it* — ordered, testable units | Agent, reviewed by Eng |
-
-The discipline: **keep the stack out of the spec.** "Users can drag-and-drop
-albums" belongs in the spec. "We use React DnD Kit with optimistic updates"
-belongs in the plan. This separation is what lets the same intent be
-re-implemented in a different stack, and what keeps spec reviews about *product*
-rather than *frameworks*.
-
-### 2.3 Template-driven quality: how structure constrains the model
-
-Spec Kit's templates are not formatting — they are **guardrails that make LLMs
-produce better output**. Concretely, the templates:
-
-- **Prevent premature implementation detail** — the spec template structurally
-  pushes "how" content into later phases.
-- **Force explicit uncertainty** — `[NEEDS CLARIFICATION]` markers are *required*
-  where the input is silent, converting silent assumptions into visible
-  questions.
-- **Encode checklists and gates** — the model must walk a structured sequence
-  (requirement completeness, constitution check, simplicity/anti-abstraction
-  gates) rather than free-associate.
-- **Order file creation** — research before data-model before contracts before
-  tasks, so each artifact stands on a settled foundation.
-
-The insight worth internalizing: **a good template is a prompt that runs every
-time, on every feature, without you remembering to type it.** That is where
-consistency at scale comes from.
-
-### 2.4 The pre-implementation gates ("Phase −1")
-
-Before code is generated, the plan passes through gates derived from your
-constitution. Canonical examples:
-
-- **Simplicity gate** — are you adding more moving parts than the problem
-  requires? Default to fewer projects, fewer layers.
-- **Anti-abstraction gate** — are you wrapping frameworks in needless
-  indirection? Use the platform directly until you have a concrete reason not to.
-- **Integration-first gate** — are contracts defined and is testing wired against
-  real integrations before unit minutiae?
-
-A gate failure is a **feature, not a blocker**: it catches over-engineering and
-ambiguity while the cost of changing course is a paragraph edit, not a rewrite.
-
-### 2.5 Spec persistence: greenfield, brownfield, and evolution
-
-SDD does **not** prescribe a single way to keep `spec.md`/`plan.md`/`tasks.md`
-alive after requirements shift. You choose a persistence model:
-
-- **0→1 (greenfield)** — generate everything from high-level intent; the spec is
-  born with the code.
-- **Iterative enhancement (brownfield)** — add features to an existing system;
-  specs describe deltas and integrate with reality on disk.
-- **Spec evolution** — when requirements change, update the spec and re-derive,
-  using `/speckit.converge` to fold reality and intent back together.
-
-The senior move is to **decide your persistence model deliberately** at project
-start, and to treat the spec as a living asset under version control alongside
-the code — not a one-time generation prompt.
-
----
-
-## Part III — Spec Kit Features & How-Tos
-
-Spec Kit is the toolkit that operationalizes SDD: a CLI (`specify`) that
-bootstraps your repo, and a set of agent slash-commands (`/speckit.*`) that drive
-the pipeline.
+| Producing a complete, well-structured first draft fast | Let it draft everything; never start from a blank page | Knowing what's worth building | Outsource the "what" or the "why" |
+| Surfacing options and conventions | Use it to widen your view of the solution space | Saying "I don't know" or "this is risky" | Trust silence as a sign of safety |
+| Filling in plausible detail | Let it handle mechanism once intent is fixed | Distinguishing plausible from correct | Accept an artifact because it reads well |
+| Tireless, consistent mechanical work | Hand it the decomposition and the typing | Holding a large design in coherent context | Let it one-shot a big feature |
 
-### 3.1 Install and initialize
+The work that is *yours, irreducibly*: deciding what problem is real, deciding
+what "correct" means here, deciding what must never happen, deciding what to cut,
+and noticing when the beautiful artifact in front of you is solving a problem you
+don't have. SDD doesn't reduce that work. It *clears the noise away from it* so
+you can spend your scarce judgment there instead of on boilerplate.
 
-```bash
-# Install the CLI (requires uv). Pin to the latest release tag.
-uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@vX.Y.Z
+---
 
-# Scaffold a project wired for your agent. --integration picks the agent.
-specify init redaktor --integration copilot      # or: claude, gemini, codex, ...
+## The build that went sideways (and what each turn taught)
 
-# Initialize inside an existing repo (brownfield):
-specify init . --here --integration copilot
+> Deniz is shipping **Redaktör**: a tool that masks personal data (PII) in
+> Turkish documents so a compliance team can share them safely. It's a good
+> teaching case because "correct" here is adversarial, non-obvious, and carries
+> real liability — exactly the conditions under which plausible-but-wrong is
+> dangerous. Watch the *reasoning*, not the commands. Several turns are mistakes,
+> caught at different costs.
 
-# Keep the CLI current:
-specify self check          # read-only: is a newer release available?
-specify self upgrade        # upgrade in place
-```
+### t0 — The first framing is wrong, and finding that out is the whole job
 
-> **Agent-agnostic, with one syntax note.** Most agents expose commands as
-> `/speckit.specify`, `/speckit.plan`, etc. **Codex CLI in skills mode** uses
-> `$speckit-*`; **GitHub Copilot CLI** uses `/agents` to select the agent. This
-> guide uses the `/speckit.*` form, and the examples are written against
-> **Claude Code**.
+Deniz's first instinct is "build a PII masker." He's about to type that into
+`/speckit.specify`. He stops, because he's been burned by exactly this: a clean
+spec for the wrong product.
 
-What `init` gives you: the `.specify/` directory (constitution, templates,
-scripts), the `/speckit.*` commands registered with your agent, and the repo
-prepared for the `specs/NNN-feature/` layout.
+He asks the question that the tool will never ask for him: **what is the real
+failure here?** Not "the model misses a phone number" — the real failure is *a
+compliance officer signs off on a document, and PII leaks anyway, and now it's
+their name on the breach.* The product isn't an NLP problem. It's a **trust and
+liability** problem. The reviewer has to trust an AI's redaction enough to put
+their signature on it. That reframing changes everything downstream: it means
+auditability, explainability, and a hard human gate matter *more* than detection
+F1.
 
-### 3.2 The core pipeline at a glance
+⟶ **Lesson.** The most expensive mistakes are made before you touch the tool, in
+the gap between the problem you typed and the problem you have. No command catches
+this. It's the one part that is entirely you, and it's the part most people skip.
 
-The spine of every feature is five commands. The supporting cast (clarify,
-analyze, checklist, converge, taskstoissues) wraps around them.
+He does *not* start with the constitution as a vague "code quality, testing, UX"
+prompt (the kind of boilerplate that fails no plan). He writes principles that
+encode the *liability* framing as hard gates:
 
 ```text
-/speckit.constitution   →  .specify/memory/constitution.md      (project law, once-ish)
-        │
-/speckit.specify        →  specs/NNN-feature/spec.md            (WHAT/WHY; new branch)
-        │   ⟂ /speckit.clarify   (de-risk ambiguity before planning)
-/speckit.plan           →  plan.md + research.md + data-model.md + contracts/ + quickstart.md
-        │   ⟂ /speckit.checklist (quality-test the requirements)
-/speckit.tasks          →  tasks.md                             (ordered, [P]-parallelizable)
-        │   ⟂ /speckit.analyze   (cross-artifact consistency audit, read-only)
-        │   ⟂ /speckit.taskstoissues (optional: fan out to GitHub issues)
-/speckit.implement      →  working code, task-by-task, marking [X] as it goes
-        │
-/speckit.converge       →  reconcile code-on-disk with spec; append missing work
+# A constitution earns its place only if you can imagine a plan FAILING it.
+# These are written to fail plans, not to decorate the repo.
+/speckit.constitution Establish gates that a plan must pass or carry a written
+exception: (1) Every masked document is reproducible from its audit record alone —
+if we can't reconstruct what was redacted and why, the plan fails. (2) The human
+reviewer is the last authority; no irreversible export without an explicit
+approval action — an "auto-approve high confidence" feature fails this gate by
+construction. (3) Raw document text never enters logs, traces, or error messages.
+(4) Every model-based component carries an eval set and a regression threshold;
+"we tried it and it looked good" is not acceptance. Bias toward fewer moving parts:
+a plan that adds a service must justify why a function wouldn't do.
 ```
 
-Commands **hand off** to each other — after `specify`, the agent offers to jump
-to `plan` or `clarify`; after `plan`, to `tasks` or `checklist`; and so on. You
-can follow the rail or jump around as judgment dictates.
-
-### 3.3 Command-by-command how-to
-
-Each entry: **what it does · when to reach for it · example prompt · what it
-emits · senior tips.**
-
-#### `/speckit.constitution`
-
-- **What** — creates/updates the project constitution and keeps dependent
-  templates in sync.
-- **When** — once at project birth; revisit when a principle genuinely changes
-  (rare; version it).
-- **Prompt**
-
-  ```text
-  # Establish binding principles. Name the values you will NOT compromise,
-  # and say how they should arbitrate technical decisions.
-  /speckit.constitution Create principles focused on: (1) every behavior change
-  ships with a test, (2) AI components are evaluated, not just "tried" — every
-  prompt/agent has an eval set and a regression threshold, (3) the human is in
-  the loop for any irreversible action, (4) minimal dependencies and idempotent
-  file ops, (5) p95 latency budgets per endpoint. Include governance for how
-  these gate planning and implementation.
-  ```
-
-- **Emits** — `.specify/memory/constitution.md` (with a versioned "Sync Impact"
-  header).
-- **Tips** — keep it short and *enforceable*. A principle you can't imagine
-  failing a plan against is just decoration. Treat edits as semver: principle
-  changes are MAJOR.
-
-#### `/speckit.specify`
-
-- **What** — turns a natural-language feature description into a structured
-  spec; creates the feature branch and `specs/NNN-*/` directory.
-- **When** — the start of every feature.
-- **Prompt** — describe **what** and **why**; resist naming the stack.
-
-  ```text
-  # Note what's present: actors, flows, rules, edge cases. Note what's absent:
-  # frameworks, libraries, schemas. That's deliberate.
-  /speckit.specify Build a service that detects and masks personal data (PII)
-  in Turkish-language documents before they're shared. A user pastes or uploads
-  text; the system highlights detected PII (names, national IDs, IBANs, phones,
-  addresses) with a confidence level; a human reviewer can accept, reject, or
-  edit each detection; on confirmation the system returns a masked copy plus an
-  audit record. Reviewers must never be able to export a document with unresolved
-  high-confidence detections.
-  ```
-
-- **Emits** — `spec.md` with **user stories**, **functional requirements**,
-  **acceptance criteria**, and `[NEEDS CLARIFICATION]` markers where you were
-  silent. *(Caption only — learn the shape: numbered stories, testable
-  requirements, explicit edge cases.)*
-- **Tips** — over-specify the *rules and edge cases*; under-specify the
-  *mechanism*. If you catch yourself writing a library name, move it to the plan.
-
-#### `/speckit.clarify`
-
-- **What** — interrogates the spec and asks up to **5 targeted questions** about
-  underspecified areas, then encodes your answers back into `spec.md`.
-- **When** — immediately after `specify`, *before* `plan`, on anything
-  load-bearing. This is the cheapest bug-prevention in the whole pipeline.
-- **Prompt**
-
-  ```text
-  # Let it drive. Answer crisply; vague answers produce vague specs.
-  /speckit.clarify
-  ```
-
-- **Emits** — an updated `spec.md` with ambiguities resolved and clarifications
-  recorded.
-- **Tips** — if it asks something you "obviously" know, that's the point: the
-  spec didn't say it, so the implementation would have guessed. Answer it.
-
-#### `/speckit.plan`
-
-- **What** — produces the technical design from the spec and your stack input.
-- **When** — once the spec is clarified and you're ready to commit to *how*.
-- **Prompt** — *now* you bring the stack and constraints.
-
-  ```text
-  # This is where architecture lives. Be specific about stack, boundaries,
-  # and the things you've already decided.
-  /speckit.plan Build with: FastAPI + Python 3.12 backend, a hybrid PII pipeline
-  (deterministic regex/checksum validators for IDs/IBANs + an LLM extractor for
-  names/addresses), an LLM-as-judge verification pass, Postgres for audit records,
-  and a React + TypeScript review UI with optimistic accept/reject. Stream
-  detections to the UI via SSE. Keep the detection engine importable and testable
-  independently of the web layer. No PII leaves the boundary unmasked in logs.
-  ```
-
-- **Emits** — `plan.md`, plus (as warranted) `research.md` (trade-off studies),
-  `data-model.md` (schemas/entities), `contracts/` (API/event contracts), and
-  `quickstart.md` (key validation scenarios). *(Caption only.)*
-- **Tips** — read `research.md` and `contracts/` like a design doc, because that
-  is what they are. This is your cheapest chance to kill a bad abstraction.
-
-#### `/speckit.checklist`
-
-- **What** — generates a **"unit test for your requirements."** It validates the
-  *quality* of the spec (completeness, clarity, consistency, coverage) — **not**
-  the implementation.
-- **When** — after plan, before tasks, on domains where requirement quality is
-  the risk (UX, security, compliance, anything fuzzy).
-- **Prompt**
-
-  ```text
-  # Ask for a domain-scoped checklist. Think "are the requirements well-written?"
-  # not "does the button work?"
-  /speckit.checklist Generate a requirements-quality checklist for the human
-  review UX and the PII-handling rules: are confidence thresholds quantified?
-  is every detection category's masking behavior specified? are export-blocking
-  conditions unambiguous? are accessibility requirements for the reviewer
-  defined?
-  ```
-
-- **Emits** — a checklist of questions that probe whether your *spec* is
-  airtight. Items like *"Is 'high confidence' quantified?"* — not *"Test that
-  masking works."*
-- **Tips** — failing checklist items send you back to `specify`/`clarify`, not to
-  code. That's the loop working.
-
-#### `/speckit.tasks`
-
-- **What** — decomposes plan + design artifacts into an **ordered, dependency-
-  aware `tasks.md`**, marking parallelizable tasks `[P]`.
-- **When** — once the plan is settled.
-- **Prompt**
-
-  ```text
-  /speckit.tasks Break the plan into tasks
-  ```
-
-- **Emits** — `tasks.md`: setup → foundational → per-story → polish, each task
-  small and testable, with `[P]` on independent ones. *(Caption only — note the
-  phasing; you'll scope `implement` against it.)*
-- **Tips** — skim for tasks that are too big to verify; a task you can't write a
-  test for is a task that's underspecified upstream.
-
-#### `/speckit.analyze`
-
-- **What** — a **non-destructive, read-only** cross-artifact audit. Checks
-  `spec.md`, `plan.md`, and `tasks.md` for inconsistencies, gaps, and drift.
-- **When** — after tasks, before implement; and any time you suspect artifacts
-  have diverged.
-- **Prompt**
-
-  ```text
-  /speckit.analyze Run a project analysis for consistency
-  ```
-
-- **Emits** — a report of mismatches (e.g., a requirement with no task, a task
-  with no requirement, a contract the plan never mentions). It **changes
-  nothing**; you decide the fixes.
-- **Tips** — treat a clean `analyze` as your "ready to implement" gate.
-
-#### `/speckit.taskstoissues` *(optional)*
-
-- **What** — converts `tasks.md` into dependency-ordered **GitHub issues**.
-- **When** — when work is shared across a team or you want the backlog tracked in
-  GitHub rather than a file.
-- **Prompt**
-
-  ```text
-  /speckit.taskstoissues Create GitHub issues from the tasks, preserving order
-  and dependencies, labeled by phase.
-  ```
-
-- **Emits** — GitHub issues (via the GitHub MCP server) mirroring your task DAG.
-- **Tips** — great for human/AI hand-offs: a teammate or another agent can pick up
-  an issue with full spec context linked.
-
-#### `/speckit.implement`
-
-- **What** — executes `tasks.md`, building the code task-by-task and marking each
-  `[X]` as it completes.
-- **When** — after a clean analyze. This is the generation step.
-- **Prompt** — and critically, **how to scope it** (see §3.5).
-
-  ```text
-  # Full run for small features:
-  /speckit.implement Start the implementation in phases
-
-  # Scoped run for large ones (recommended — keeps context healthy):
-  /speckit.implement only execute the Setup and Foundational phases, then stop
-  and report progress
-  ```
-
-- **Emits** — working code, committed task-by-task; `tasks.md` updated with `[X]`.
-- **Tips** — review *per phase*, not at the end. The completed-task markers mean
-  you can stop, review, and resume without losing your place.
-
-#### `/speckit.converge`
-
-- **What** — assesses **code-on-disk against spec/plan/tasks**, then **appends
-  the remaining unbuilt work** as new tasks so `implement` can finish it.
-- **When** — brownfield reality checks; after manual edits; when a long
-  implementation drifted; whenever "what's actually built?" ≠ "what we said."
-- **Prompt**
-
-  ```text
-  /speckit.converge Assess the codebase against the spec and append any missing
-  work as tasks.
-  ```
-
-- **Emits** — newly appended tasks in `tasks.md` capturing the delta between
-  intent and reality. Then you re-run `implement`.
-- **Tips** — this is the *spec-evolution* workhorse. Change the spec, run
-  `converge`, run `implement` — that's how a living spec stays load-bearing.
-
-### 3.4 Extensions, presets, and bundles
-
-Spec Kit is customizable so it can match *your* lifecycle, not just the default.
-
-- **Extensions** — opt-in command packs that hook into the pipeline. Built-ins
-  include:
-  - **`git`** — commit/validate/feature/remote helpers so SDD phases map cleanly
-    to branches and commits.
-  - **`bug`** — `assess` → `test` → `fix`: a spec-driven *bug* workflow (reproduce
-    with a failing test first, then fix).
-  - **`agent-context`** — keeps the agent's working context updated as the project
-    grows.
-  - Extensions hook at defined points (`before_specify`, `before_plan`, …) via
-    `.specify/extensions.yml`.
-- **Presets** — alternative command/template sets. **`lean`** trims the pipeline
-  for speed; **`scaffold`** is a starting point for authoring your own; the
-  templates are yours to fork.
-- **Bundles** — role-based setups: **developer**, **product-manager**,
-  **business-analyst**, **security-researcher**. A bundle pre-wires the commands
-  and emphasis a given role needs.
-
-> Senior framing: extensions/presets/bundles are how you **encode your team's
-> SDLC into the tool**. The default pipeline is a strong opinion; these let you
-> make it *your* opinion.
-
-### 3.5 Handling complex features (the context problem)
-
-Big features sail through specify/plan/tasks and then **degrade mid-
-`implement`** — the agent loses the plan, skips tasks, or hallucinates as the
-context window fills and compaction kicks in. The root cause is context
-exhaustion, and there are four fixes:
-
-```text
-# 1) Scope each run — the simplest fix, works on any agent:
-/speckit.implement only execute tasks T001-T010, then stop and report progress
+⟶ **Lesson.** A constitution is not values; it's a set of trip-wires. Principle
+(2) is written so that a *specific tempting feature* — auto-approve — is dead on
+arrival. That's what makes it load-bearing instead of inspirational.
 
-# 2) Delegate to sub-agents (if your agent supports them):
-/speckit.implement delegate each parallel [P] task to a sub-agent
+### Specify — the model writes a good spec for the wrong product
 
-# 3) Combine scoping + delegation for very large features:
-/speckit.implement execute only the Core phase, delegate [P] tasks to sub-agents
+He describes the *what*, deliberately omitting stack, and gets back a clean
+`spec.md`. It reads well. That's the trap.
 
-# 4) Decompose into smaller specs ("spec of specs") — when even one phase is
-#    too big, split the feature into sub-features, each with its own
-#    spec/plan/tasks cycle. Highest overhead; reserve for genuinely huge work.
+```text
+/speckit.specify A reviewer uploads a Turkish document; the system detects PII
+(names, T.C. national IDs, IBANs, phones, addresses), shows each detection with a
+confidence level, lets the reviewer accept/reject/edit inline, and on confirmation
+returns a masked copy plus an audit record. Export is blocked while any
+high-confidence detection is unresolved.
 ```
 
-Because completed tasks are marked `[X]`, scoped runs resume cleanly. **Default
-to option 1**; reach for sub-agents when you want parallelism; decompose only
-when a single phase still overflows.
-
----
-
-## Part IV — End-to-End: t0 to a Shipped Product
+> *(Caption — the spec the model produced:)* tidy user stories, requirements
+> organized around **detection accuracy and the masking flow**, sensible edge
+> cases. Looks done.
 
-> **The cast.** Deniz, a senior AI engineer, is shipping **Redaktör** — an
-> agentic Turkish-document **PII detection & masking** service with a
-> **human-in-the-loop review UI**. It's representative of real agentic-product
-> work: non-deterministic model components, an eval discipline, a human approval
-> step, and a frontend that has to feel trustworthy.
->
-> Watch for the **load-bearing moves** (⚖️) — the moments where SDD does real
-> work a senior refuses to skip. The point isn't the product; it's the *operating
-> discipline*.
+Deniz reads it adversarially and the smell hits immediately: **the spec is
+optimizing detection, but his real problem was trust.** The model did exactly what
+he asked and nothing he meant. There's no requirement about *why a reviewer would
+believe a mask is correct*, nothing about what happens when the model and the
+reviewer disagree, nothing about the liability trail beyond a vague "audit
+record." It's a spec for a demo, not for someone's signature.
 
-### t0 — Frame the problem before touching the tool
+He doesn't tweak it. He rewrites the intent, because a spec aimed at the wrong
+target can't be patched into the right one:
 
-Deniz doesn't open the agent first. He writes three sentences in a notebook:
-*who hurts today, what "done" means, what must never happen.* For Redaktör:
-"Compliance teams manually redact Turkish docs; done = reviewer-approved masked
-copy + audit trail; **never** export a doc with unresolved high-confidence PII."
+```text
+# Re-aim the spec at the real problem: reviewer trust and defensible sign-off.
+/speckit.specify Reframe: the user is a compliance officer who must personally
+attest that a document is safe to share. The system's job is to make that
+attestation FAST and DEFENSIBLE, not just to detect PII. Requirements should cover:
+how a reviewer sees what is under a mask without exposing it to anyone else; how
+disagreements between the model and the reviewer are recorded; what the audit
+record must contain to defend the decision later; and what the reviewer is and
+isn't allowed to do (e.g., no bulk auto-accept). Detection is necessary but not the
+point — the point is a sign-off the reviewer can stand behind.
+```
 
-⚖️ **Load-bearing move:** the spec is only as good as the intent behind it. Five
-minutes of human framing prevents an hour of regenerating a confidently-wrong
-spec.
+⟶ **Lesson.** "The model gave me a clean spec" is a warning, not a win. A clean
+artifact aimed at the wrong problem is more dangerous than a messy one aimed at
+the right problem, because the cleanliness lowers your guard. Read every generated
+spec by asking *what problem is this actually solving, and is it mine?*
 
-### Phase 1 — Bootstrap & constitution
+### Clarify — the questions it asks matter less than the one it doesn't
 
-```bash
-specify init redaktor --integration claude
-cd redaktor
+```text
+/speckit.clarify
 ```
 
-Then, before any feature, he encodes the **values that arbitrate every later
-decision** — and for an AI product, that explicitly includes *evals* and
-*human-in-the-loop*:
+The model asks five questions. Deniz's read:
 
 ```text
-# The constitution is where an AI product's hardest invariants become gates.
-/speckit.constitution Create binding principles: (1) every behavior change ships
-with a test; (2) NON-NEGOTIABLE — every AI component (extractor, judge) has a
-versioned eval set with a regression threshold; promotion requires passing it;
-(3) the human reviewer is the final authority — no irreversible export without
-explicit approval; (4) PII never appears in logs, traces, or error messages;
-(5) the detection engine is a library, importable and testable without the web
-layer; (6) p95 detection latency budget: 2s for a 2-page document. Add governance:
-plans that violate these must fail the gate or carry a written, justified
-exception.
+# Two are real, three are filler. The filler tells me where the model is pattern-
+# matching instead of thinking — and the GAP tells me more than the questions.
+Q: Confidence threshold for "high"?           → real. ≥0.85 blocks export.
+Q: Masking token format?                      → real. Category tokens: [TCKN], [IBAN].
+Q: Supported file types?                      → filler. Doesn't change the design.
+Q: Max document size?                         → filler. Pick 20 pages, move on.
+Q: Light or dark theme?                       → noise.
 ```
 
-⚖️ **Load-bearing move:** principle (2) turns "we tried the prompt and it seemed
-fine" into a *gate*. Principle (4) will later cause `analyze` and review to flag
-any logging task that touches raw text. The constitution is doing architecture.
+The signal isn't in the answers. It's that **the model did not ask the question
+that actually decides the product**: *when the model flags PII the reviewer thinks
+is fine — or misses PII the reviewer spots — what happens, and who owns that
+decision in the audit trail?* That's the entire trust mechanism, and the tool was
+blind to it because it's not a pattern that shows up in training; it's specific to
+his liability framing.
 
-### Phase 2 — Specify the *what*, ruthlessly stack-free
+So he injects it himself, into the spec, by hand-driving the clarification:
 
 ```text
-# Actors, flows, rules, edge cases. Zero frameworks. Zero model names.
-/speckit.specify Build a service that detects and masks personal data in
-Turkish-language documents before sharing. A reviewer pastes or uploads text;
-the system streams back detected PII (full names, T.C. national IDs, IBANs,
-phone numbers, postal addresses) each with a category and a confidence level;
-the reviewer accepts, rejects, or edits each detection inline; on confirmation
-the system returns a masked copy and writes an immutable audit record (who,
-when, what changed). Hard rule: the system must block export while any
-high-confidence detection is unresolved. Documents may be up to 20 pages.
+/speckit.clarify Record this decision explicitly in the spec: every reviewer
+override of a model detection (accept-despite-low-confidence, reject-despite-high,
+manual-add) is a first-class audited event with reviewer identity and timestamp.
+The model's opinion and the reviewer's decision are both stored; on conflict, the
+reviewer's decision governs the export but the model's dissent is retained in the
+record. This is the core trust mechanism — make it a requirement, not an
+implementation detail.
 ```
 
-The agent creates branch `001-pii-masking` and `specs/001-pii-masking/spec.md`
-with user stories, functional requirements, acceptance criteria — and a few
-`[NEEDS CLARIFICATION]` markers where Deniz was silent.
-
-> *(Caption of `spec.md`, not its contents:)* §User Stories (reviewer-centric),
-> §Functional Requirements (FR-1 detect, FR-2 stream w/ confidence, FR-3 inline
-> resolve, FR-4 export-block rule, FR-5 audit), §Edge Cases (empty doc, all-PII
-> doc, mixed-language doc), §`[NEEDS CLARIFICATION]` on retention + masking
-> format.
+⟶ **Lesson.** Clarify is not "answer the model's quiz." It's a prompt to ask
+*yourself* what's underspecified. The model's questions map what's *conventional*
+to underspecify; your job is to find what's *uniquely* dangerous to leave
+unsaid here. Often the most important clarification is one the tool never raised.
 
-⚖️ **Load-bearing move:** Deniz *notices* he never said how masking should look
-(`[ALİ YILMAZ]` → `[AD-SOYAD]`? black box? partial?) — but he doesn't fix it by
-editing prose. He lets `clarify` extract it, so the resolution is recorded
-systematically.
+### Plan — reading the smell of a plausible over-design
 
-### Phase 3 — Clarify: spend questions now, not bugs later
+Now stack and constraints are fair game.
 
 ```text
-/speckit.clarify
+/speckit.plan FastAPI + Python 3.12 backend; React + TypeScript reviewer UI with
+SSE streaming of detections; Postgres for audit records. Detection engine must be
+an importable library, testable without the web layer. Structured logging with a
+redaction filter. Eval harness over a labeled Turkish corpus.
 ```
 
-The agent asks five sharp questions. Deniz answers crisply:
-
-```text
-# (paraphrased Q→A — answers get encoded back into spec.md)
-Q: Confidence threshold for "high"?        → ≥ 0.85 blocks export; 0.6–0.85 warns.
-Q: Masking representation?                 → category tokens, e.g. [TCKN], [IBAN].
-Q: Retention of original text?             → originals purged after audit write;
-                                             audit stores only masked + diff.
-Q: Multi-language docs?                    → detect TR PII only; flag non-TR spans.
-Q: Who can override an export block?       → nobody in v1; it's a hard stop.
-```
+The model returns a confident, conventional plan — and Deniz reads two smells:
 
-⚖️ **Load-bearing move:** "nobody can override the export block" is now a
-*requirement with a test target*, not a tribal assumption. This single
-clarification will shape a contract, a task, and a regression test downstream.
-
-### Phase 4 — Plan: now, and only now, the architecture
+**Smell 1: it reached for an LLM-as-judge verification pass over every detection.**
+Plausible (more checking = better, right?), and wrong for v1. He reasons about the
+*cost*, which the model never does unprompted: the judge roughly doubles latency
+and per-document cost, and — worse — adds a *second non-deterministic component he
+now has to eval and keep from regressing.* For what? Marginal recall on categories
+that are already strong. He cuts it, and crucially **writes down why**, so the
+next engineer (or the next agent) doesn't helpfully "fix" the omission:
 
 ```text
-# Stack, boundaries, and the decisions already made enter here.
-/speckit.plan Build with: FastAPI + Python 3.12; a HYBRID detection pipeline —
-deterministic validators (regex + checksum) for T.C. IDs/IBANs/phones, and an
-LLM extractor for names/addresses; an LLM-as-judge pass that scores each
-candidate and assigns confidence; Postgres for immutable audit records; React +
-TypeScript review UI with optimistic accept/reject and SSE streaming of
-detections. Keep `redaktor.engine` importable and web-independent. Structured
-logging with a redaction filter so raw text can never be logged. Provide an eval
-harness (golden Turkish docs with labeled PII) for both extractor and judge.
+/speckit.plan Remove the LLM-judge pass for v1 and record the rationale in plan.md:
+it doubles latency/cost and adds a second model to eval for marginal recall on
+categories already covered. Revisit only if eval shows a recall gap that determinism
+and a single extractor can't close. Keep the architecture as flat as the problem
+allows — no message queue, no service split; this is one process until proven
+otherwise.
 ```
 
-> *(Caption of emitted design artifacts:)*
->
-> - `research.md` — trade study: regex-only vs. LLM-only vs. **hybrid** (chosen:
->   hybrid for precision on structured IDs + recall on names); SSE vs.
->   WebSocket (chosen: SSE, one-way stream).
-> - `data-model.md` — entities: `Document`, `Detection{category, span,
->   confidence, status}`, `AuditRecord`.
-> - `contracts/` — `POST /documents`, `GET /documents/{id}/detections` (SSE),
->   `POST /detections/{id}/resolve`, `POST /documents/{id}/export` (enforces the
->   block rule).
-> - `quickstart.md` — the golden-path validation scenario.
-
-⚖️ **Load-bearing move:** Deniz reads `research.md` and pushes back: the judge
-adds latency. He keeps it but adds a plan note — *deterministic validators
-short-circuit the judge for checksum-valid IDs* — protecting the 2s p95 budget
-from principle (6). This is a design debate happening in `plan.md`, where it's
-free.
-
-### Phase 5 — Checklist the requirements, then task it out
-
-Because the **review UX** and **PII rules** are the real risk, he quality-tests
-the *requirements* first:
+**Smell 2 — and this is the real insight — the architecture is upside down.**
+While reviewing `research.md`, Deniz notices that T.C. national IDs and IBANs have
+*checksums*. The highest-liability categories don't need a model at all; a few
+lines of deterministic validation are more accurate than any LLM and trivially
+auditable. That inverts the design: **this is mostly a deterministic system, with
+AI used only where determinism fails (names, free-form addresses).** The "AI
+product" is 80% boring code, and that's a feature — boring code is auditable, and
+auditability is the actual requirement.
 
 ```text
-/speckit.checklist Generate a requirements-quality checklist for the review UX
-and PII rules: is every detection category's masking token specified? is the
-export-block condition unambiguous and testable? are reviewer keyboard/a11y
-requirements defined? is the SSE reconnect behavior on dropped connection
-specified?
+/speckit.plan Restructure detection around determinism-first: regex + checksum
+validators own T.C. IDs, IBANs, and phones (these are exact and auditable). The
+LLM extractor handles only names and free-form addresses, where determinism fails.
+Make the validator path the default and the model path the exception. Note in
+plan.md: this is deliberate — the most legally sensitive categories must be
+explainable without invoking a model.
 ```
 
-Two items fail — SSE reconnect behavior and a11y were never specified. He loops
-back through `clarify` to fix the *spec*, not the code. Then:
-
-```text
-/speckit.tasks Break the plan into tasks
-```
+⟶ **Lesson.** Two senior reflexes here. First: **the model never volunteers the
+cost of its suggestions** — latency, money, a new failure surface, eval burden —
+so you must price every "let's also add…" yourself and often decline. Second: the
+best architectural insight in an AI product is frequently *use less AI* — push
+work onto deterministic, auditable code wherever it actually works, and reserve
+the model for the irreducibly fuzzy part. Read `research.md` and `plan.md` like a
+design review, because that's the last place a bad structure is cheap to kill.
 
-> *(Caption of `tasks.md`:)* Setup (repo, CI, db) → Foundational (engine library,
-> validators, eval harness) → Story A (detect+stream) → Story B (resolve UI) →
-> Story C (export-block + audit) → Polish (a11y, latency, observability). `[P]`
-> on the independent validator and UI-component tasks.
+He overrides one of the simplicity-gate's own suggestions (it wanted a separate
+validation microservice "for scalability") with a one-line documented exception —
+*premature; one process until load data says otherwise.* Gates inform; they don't
+command. A senior overrides them **with a written reason**, which is different from
+ignoring them.
 
-### Phase 6 — Analyze: the "ready to build" gate
+### Tasks & analyze — using the cheap audit as a real gate
 
 ```text
+/speckit.tasks Break the plan into tasks
 /speckit.analyze Run a project analysis for consistency
 ```
 
-The read-only audit flags one real gap: **no task covers the "non-TR span"
-flagging** from the clarify step. Deniz adds it (re-runs `tasks`), re-analyzes,
-gets a clean report.
+`analyze` is read-only and catches one thing that matters: the override-audit
+requirement Deniz injected during clarify has **no task**. It would have silently
+not been built — and it's the core trust mechanism. He adds it and re-runs until
+clean.
 
-⚖️ **Load-bearing move:** a clean `analyze` is his hard gate to implementation. A
-requirement with no task is a feature that silently won't exist — caught here for
-the price of a re-read.
+⟶ **Lesson.** A requirement with no task is a feature that won't exist, and nobody
+will notice until it's missing in production. `analyze` is a near-free adversarial
+reader; treating "clean analyze" as a hard gate before implementing is one of the
+highest-leverage habits in the whole method. The cost of running it is a minute;
+the cost of the gap it finds is a postmortem.
 
-### Phase 7 — Implement, scoped by phase
+### Implement — it drifts, and the process (not your vigilance) catches it
 
-He never one-shots a feature this size. He scopes by phase to keep context
-healthy (§3.5):
+He never one-shots this. He scopes by phase, because he knows the model degrades
+as context fills — it starts strong and hallucinates around the edges of a long
+run:
 
 ```text
-# 1) Foundation first — the engine + evals must exist before stories.
-/speckit.implement only execute the Setup and Foundational phases, then stop and
-report progress
+# Foundation first; nothing else can be trusted until the engine + evals exist.
+/speckit.implement execute the Setup and Foundational phases only, then stop and
+report
+```
 
-# (Deniz reviews: runs the eval harness, confirms validators pass on golden IDs.)
+He runs the eval harness himself before going further, because for a model
+component **"the unit tests pass" and "it actually detects well" are different
+claims**, and only the eval set speaks to the second.
 
-# 2) Then story by story, reviewing each:
-/speckit.implement execute Story A (detect + stream), then stop
+Then, mid-build around task 7, the model does something very on-brand: in a new
+error handler, it logs the offending document snippet to help debugging. That
+violates constitution principle (3) — raw text in logs. Deniz doesn't catch it by
+reading every line; nobody reliably does that. He catches it because he'd written
+a test that greps emitted logs for PII patterns. **The failure mode he predicted,
+he instrumented for.**
 
-# 3) Large parallelizable polish — delegate if the agent supports sub-agents:
-/speckit.implement execute the Polish phase, delegate [P] tasks to sub-agents
+```text
+# He'd added this as a foundational task precisely because he knew the model
+# would eventually "helpfully" log raw text. The test is the guard, not his eyes.
+/speckit.implement the log-redaction test is failing on the new error handler —
+fix the handler to log a reference id, not document content, and confirm the test
+passes
 ```
 
-Each task lands as `[X]`. Between phases, Deniz **runs the app and the evals**,
-not just the unit tests — because for an AI component, "tests pass" ≠ "it
-detects well." The eval set from principle (2) is the real acceptance gate.
+⟶ **Lesson.** You cannot out-vigilance a tireless collaborator by reading harder.
+You beat its predictable failures by **building the trap before it walks into
+them**: a test that fails on the exact mistake you know it tends to make. The
+constitution names the invariant; a test makes the invariant self-enforcing. That
+pairing is the difference between a principle and a hope.
 
-⚖️ **Load-bearing move:** reviewing per phase means a bad masking decision is
-caught after Story A, not after the whole product is wired. The `[X]` markers let
-him stop and resume without losing the thread.
+### The spec was *still* wrong — and here's where SDD pays for itself
 
-### Phase 8 — Converge: reconcile reality with intent
+Foundation, detection, review UI, audit — all built, all green, demo to the
+compliance team. It fails. Not on a bug. The reviewers won't sign off, because
+**they can't see what's under a mask without un-masking the whole document**, and
+un-masking in front of others is exactly the exposure they're trying to avoid. The
+product is technically correct and practically useless. The trust mechanism Deniz
+was so proud of is unusable in the room where trust actually happens.
 
-Mid-build, product asks for one change: *warn-level detections (0.6–0.85) should
-also be reviewable, not just high-confidence ones.* Deniz does the
-**spec-evolution loop**, not a hand-patch:
+This is the expensive discovery — the kind that, in a code-first project, means a
+painful rewrite. Here it's a *spec* problem (he never specified *how* a reviewer
+inspects a mask safely), so he fixes it where it's cheap: he edits intent, then
+lets convergence find the delta between the new intent and the built reality.
 
 ```text
-# 1) Update the spec/clarify with the new rule, then:
-/speckit.converge Assess the codebase against the updated spec and append any
+# Fix the SPEC, then reconcile reality to it. Not a hand-patch in the UI —
+# the change has to live in intent so it survives the next person.
+/speckit.clarify Add the missing requirement: a reviewer must be able to inspect
+the original value behind a single mask, in place, visible only to them, without
+exposing the rest of the document or persisting the reveal. Specify the audit
+consequence: each reveal is itself an audited event.
+
+/speckit.converge Assess the codebase against the updated spec and append the
 missing work as tasks.
 
-# 2) converge appends the delta tasks (UI filter for warn-level, export rule
-#    stays high-confidence-only). Then finish them:
 /speckit.implement execute the newly appended tasks
 ```
 
-⚖️ **Load-bearing move:** the requirement change entered through the *spec* and
-propagated *down* — not as a stray commit. The spec stays the source of truth, so
-the next engineer (or agent) reads intent that matches reality.
-
-### Phase 9 — Ship: CI, audit, and the human gate
+⟶ **Lesson.** This is the entire argument for SDD in one event. You *will* discover
+late that the spec was wrong — that's not a process failure, it's the nature of
+building things for real humans, whose needs you can't fully know in advance. The
+question SDD answers is: **when that discovery lands, does it cost a paragraph or a
+rewrite?** Because the intent lived in a spec the system regenerates from, a
+demo-day catastrophe became an afternoon. That's the payoff — not the clean happy
+path, but the cheap recovery from the inevitable wrong turn.
 
-Before release, Deniz wires the lifecycle closure (often via the `git` extension
-and his CI):
+### Ship — the gate that lets him sleep
 
-- **CI gates** run the test suite *and* the eval suite; a regression below the
-  eval threshold **fails the build** (principle 2, now load-bearing in CI).
-- **A redaction-filter test** asserts no raw PII reaches logs (principle 4).
-- **An export-block test** asserts the hard stop from the clarify step.
-- He has the agent fan remaining work to issues for a teammate:
+CI runs tests *and* the eval suite; a recall regression below threshold fails the
+build, so the model can't silently get worse over time. The log-PII test and the
+export-block test are non-negotiable gates. The human approval step is the actual
+product. None of this is ceremony — each one is a specific way the system could
+have hurt someone, turned into a wall.
 
-  ```text
-  /speckit.taskstoissues Create GitHub issues for the remaining polish tasks,
-  preserving dependency order, labeled by phase.
-  ```
+⟶ **Lesson.** "Done" for an AI product isn't "it works." It's "I know the specific
+ways this can fail, and each one is now something that fails the build instead of
+failing a customer." That confidence is what you're actually shipping.
 
-Redaktör ships: a reviewer pastes a document, watches detections stream in with
-confidence, resolves each inline, and exports a masked copy with an audit trail —
-and **cannot** export while a high-confidence detection is unresolved, because
-that rule has been load-bearing since Phase 3.
+---
 
-### What made this "senior," in one breath
+## Reading generated artifacts adversarially: a smell guide
 
-The model wrote most of the code. **Deniz owned the leverage points**: framing
-intent (t0), promoting invariants to gates (constitution), spending clarify
-questions early, debating architecture in `plan.md`, gating on a clean `analyze`,
-reviewing *per phase* against **evals** not vibes, and routing every change
-through the spec via `converge`. SDD didn't replace his judgment — it
-*concentrated* it where it mattered.
+The single most valuable skill SDD demands is reading the model's output like a
+hostile reviewer. Concrete tells, by artifact:
 
----
+**In a spec:**
 
-## Part V — Operating SDD Like a Senior
+- It optimizes a *metric* (accuracy, speed) when your real problem is a
+  *property* (trust, safety, defensibility). → It solved the legible problem, not
+  yours.
+- Edge cases are all the *easy* ones (empty input, too long). The expensive edge
+  cases — conflict, disagreement, partial failure, liability — are absent. →
+  The model lists what's conventional, not what's dangerous *here*.
+- It reads complete and you can't find a single `[NEEDS CLARIFICATION]`. → It
+  guessed confidently instead of flagging the unknowns. Suspicious, not
+  reassuring.
 
-### 5.1 Habits that separate load-bearing SDD from theater
+**In a plan:**
 
-- **Always `clarify` before `plan`** on anything that matters. It's the highest
-  ROI command in the kit.
-- **Read `research.md` and `contracts/`.** If you're not reviewing the design
-  artifacts, you're vibe coding with extra files.
-- **Gate on a clean `analyze`.** Make "no orphan requirements, no orphan tasks"
-  your definition of implementation-ready.
-- **Scope `implement` by phase, review between phases.** Don't discover problems
-  at the end.
-- **Route changes through the spec → `converge` → `implement`.** Hand-patches rot
-  the spec; once it's stale, it stops being load-bearing and you're back to
-  code-as-truth.
+- An extra service, queue, cache, or abstraction "for scale/flexibility" with no
+  number justifying it. → Plausible over-engineering; make it justify itself or cut
+  it.
+- A new non-deterministic component added "to be safe." → Price the latency, cost,
+  and *new eval burden* it creates. Usually not worth it.
+- Heavy machinery where a checksum, a regex, or a constraint would be exact and
+  auditable. → The best move is often *less* sophistication, not more.
 
-### 5.2 Anti-patterns to refuse
+**In tasks:**
 
-- **Stack in the spec.** Frameworks in `spec.md` poison the intent layer and
-  block re-implementation.
-- **Skipping clarify to "save time."** You don't save it; you move it to
-  debugging, at 10× the cost.
-- **One-shot `implement` on a big feature.** Context exhaustion will corrupt the
-  back half of the run.
-- **A constitution of platitudes.** If a principle can't fail a plan, delete it.
-- **Treating generated artifacts as write-once.** A spec you never regenerate
-  from is documentation, not SDD.
+- A requirement you fought for during clarify has no corresponding task. →
+  `analyze` exists to catch exactly this; gate on it.
+- A task you can't imagine writing a test for. → It's underspecified upstream; the
+  vagueness is in the spec, not the task.
 
-### 5.3 Fitting SDD to a team
+The meta-tell across all three: **fluency where there should be friction.** If the
+hard, contested part of your problem came out smooth and confident, the model
+papered over it. Go look there first.
 
-- Use **`taskstoissues`** to turn a task DAG into a shared backlog; each issue
-  carries spec context, so humans and agents can both pick up work.
-- Use **bundles** (developer / PM / BA / security) so each role gets the
-  pipeline emphasis they need.
-- Use **extensions** (`git`, `bug`, `agent-context`) to bind SDD phases to your
-  real branch/commit/triage flow.
-- Keep `spec.md`/`plan.md`/`tasks.md` **in version control** next to the code and
-  review them in PRs — the spec diff *is* the design review.
+---
 
-### 5.4 When **not** to reach for the full pipeline
+## Where SDD fights you, and what a senior does about it
+
+Pretending the method is all upside is its own kind of decoration. The honest
+failure modes:
+
+- **It can make wrong decisions feel settled.** A beautifully formatted spec
+  carries false authority. Counter: treat every artifact as a draft under
+  suspicion until *you've* attacked it, regardless of how finished it looks.
+- **Clarify and analyze can devolve into pedantry** — asking about themes,
+  flagging trivia. Counter: take the two questions that matter, ignore the noise,
+  and supply the one they missed. The commands serve your judgment, not the
+  reverse.
+- **The structure can seduce you into completing the workflow instead of solving
+  the problem.** Running all ten commands feels productive. Counter: if a step
+  isn't surfacing disagreement or reducing risk, skip it. Ceremony is the enemy.
+- **Long `implement` runs rot.** The model loses the plan as context fills.
+  Counter: scope every run to a phase or a task range; review between; let the
+  `[X]` markers resume you. Never one-shot anything large.
+- **The spec can drift from the code** the moment you hand-patch. Counter: route
+  changes through the spec → `converge` → `implement`. The instant the spec stops
+  matching reality, it stops being load-bearing and you've silently reverted to
+  code-as-truth.
 
-SDD ceremony should match stakes:
+### Calibration: how much process is the right amount
 
-| Situation | Reach for |
+| Situation | What a senior actually does |
 | --- | --- |
-| Load-bearing feature, multiple stories, real risk | Full pipeline + clarify + checklist + analyze |
-| Small, well-understood change | specify → plan → tasks → implement (skip the wrappers) |
-| Throwaway spike / proof of concept | Straight to `/speckit.implement` with a tight prompt |
-| Bug fix | The `bug` extension (assess → failing test → fix) |
-| Requirements changed on a live feature | `clarify`/edit spec → `converge` → `implement` |
-
-The senior skill is calibration: enough structure to stay honest, not so much
-that it's theater.
-
-### 5.5 The one-sentence summary
+| Throwaway spike, learning something | Skip the apparatus. One tight `/speckit.implement` prompt or just write it. |
+| Small change to a known system | specify → plan → tasks → implement. Skip the wrappers; the risk is low. |
+| Load-bearing feature, real consequences | Full adversarial pass: clarify, checklist on the risky domain, gate on clean analyze, scope implement, review per phase against evals. |
+| You're not sure what you're even building | *Stop.* Spend the t0 thinking. The tool can't do this part and will happily build the wrong thing. |
+| Requirements changed on a live feature | Edit the spec → converge → implement. Never a stray commit. |
 
-**Make intent the durable artifact, promote your invariants to gates, spend your
-judgment at the spec/plan/analyze checkpoints, and let the agent regenerate the
-code — so that when the world changes, you change the spec and the system
-follows.**
+The skill isn't running more steps. It's spending exactly enough structure to keep
+yourself honest, and not one step of theater past that.
 
 ---
 
-## Appendix — Cheat Sheets
-
-### A. Command map
-
-| Command | Phase | Emits | Reach for it when |
-| --- | --- | --- | --- |
-| `/speckit.constitution` | Govern | `constitution.md` | Project birth; principle change |
-| `/speckit.specify` | Intent | `spec.md` (+ branch) | Every new feature |
-| `/speckit.clarify` | Intent | updated `spec.md` | Before planning anything load-bearing |
-| `/speckit.plan` | Design | `plan.md`, `research.md`, `data-model.md`, `contracts/`, `quickstart.md` | Spec is clarified |
-| `/speckit.checklist` | Design | requirements-quality checklist | Fuzzy/high-risk domains (UX, security) |
-| `/speckit.tasks` | Decompose | `tasks.md` | Plan is settled |
-| `/speckit.analyze` | Gate | read-only consistency report | Before implement; on suspected drift |
-| `/speckit.taskstoissues` | Decompose | GitHub issues | Team/shared backlog |
-| `/speckit.implement` | Build | code + `[X]` tasks | After a clean analyze |
-| `/speckit.converge` | Evolve | appended delta tasks | Spec changed; brownfield reconcile |
-
-### B. Artifact graph
-
-```text
-constitution.md ──governs──▶ spec.md ──derives──▶ plan.md ─┬─▶ research.md
-                                  ▲                          ├─▶ data-model.md
-                            clarify│ checklist               ├─▶ contracts/
-                                  │                          └─▶ quickstart.md
-                                  │                                  │
-                                  └────────── analyze (audits all) ──┤
-                                                                     ▼
-                                                                 tasks.md ──▶ code
-                                                                     ▲          │
-                                                                     └ converge ┘
-```
-
-### C. Glossary
-
-- **Constitution** — binding project principles, enforced as plan-time gates.
-- **Gate (Phase −1)** — a pre-implementation check (simplicity, anti-abstraction,
-  integration-first) a plan must pass or explicitly waive.
-- **`[NEEDS CLARIFICATION]`** — an explicit uncertainty marker; a question, not a
-  guess.
-- **Persistence model** — your chosen strategy for keeping specs alive
-  (greenfield / brownfield / evolution).
-- **`[P]` task** — a task with no blocking dependency; parallelizable.
-- **Converge** — reconcile code-on-disk with intent by appending the missing
-  work as tasks.
-- **Bundle / Preset / Extension** — role setups / alternative pipelines / opt-in
-  command packs that tailor Spec Kit to your SDLC.
+## If you remember five things
+
+1. **The model is a fast, confident, predictably-wrong collaborator.** Your stance
+   is adversarial: every artifact is a suspect until you've attacked it.
+2. **SDD's value is cheap, early disagreement.** Every loop drags a future
+   expensive mistake into a present cheap one. A step that surfaces no
+   disagreement is theater.
+3. **The real problem usually isn't the one you typed.** The t0 reframing — done
+   by you, before any command — prevents the most expensive class of mistake.
+4. **Instrument for the failures you can predict.** Don't out-vigilance the tool;
+   build the test that fails on the mistake you know it'll make, and let the
+   constitution name the invariant.
+5. **Going backwards is the method working.** You will discover the spec was wrong
+   late. The win condition isn't avoiding that — it's that fixing intent and
+   regenerating costs a paragraph instead of a rewrite.
 
 ---
 
-*Further reading in this repo: [`spec-driven.md`](../../spec-driven.md) (the
-full SDD essay and the constitutional articles),
+## Appendix — the commands, compressed
+
+You'll learn these by using them. They are instruments for the thinking above, not
+the point of it.
+
+| Command | Its job | The senior question it serves |
+| --- | --- | --- |
+| `constitution` | Binding principles, enforced as plan-time gates | *Which tempting wrong decisions do I want dead on arrival?* |
+| `specify` | Natural language → structured spec (+ feature branch) | *What problem am I actually solving — and is this spec aimed at it?* |
+| `clarify` | Up to 5 questions; answers encoded into the spec | *What's uniquely dangerous to leave unsaid here — including what it didn't ask?* |
+| `plan` | Spec + stack → technical design, research, contracts | *What does each choice cost, and where is this over-built?* |
+| `checklist` | "Unit tests for the requirements" — quality, not behavior | *Are the requirements in my risky domain actually airtight?* |
+| `tasks` | Plan → ordered, `[P]`-parallelizable task list | *Is every hard-won requirement represented as buildable, testable work?* |
+| `analyze` | Read-only cross-artifact consistency audit | *What did we say we'd build that has no task — or build with no reason?* |
+| `taskstoissues` | Tasks → dependency-ordered GitHub issues | *How do humans and agents share this work with full context?* |
+| `implement` | Executes tasks, marking `[X]`; scope it by phase | *How do I keep context healthy and review before it drifts?* |
+| `converge` | Reconcile code-on-disk with intent; append the delta | *Reality changed — how do I make the spec true again instead of patching code?* |
+
+Customization that lets the method match *your* SDLC rather than the default:
+**extensions** (`git`, `bug` for spec-driven debugging, `agent-context`),
+**presets** (`lean` trims the pipeline), and **bundles** (developer / PM / BA /
+security role setups).
+
+*Deeper reading in this repo: [`spec-driven.md`](../../spec-driven.md),
 [`docs/concepts/sdd.md`](../concepts/sdd.md),
-[`docs/concepts/complex-features.md`](../concepts/complex-features.md), and
+[`docs/concepts/complex-features.md`](../concepts/complex-features.md),
 [`docs/guides/evolving-specs.md`](./evolving-specs.md).*