feat(brief): pre-execution contract negotiation with output evaluation and auto-retry by Nivesh353 · Pull Request #68 · open-gitagent/gitagent

Nivesh353 · 2026-07-03T16:08:33Z

Summary

Introduces Agent Brief — a negotiation step that runs before an agent
executes a task, producing a concrete, versioned "contract" of binary
pass/fail success criteria (assertions). Once the agent produces output,
a second evaluator grades the output against that contract and can trigger
automatic retries with targeted feedback on what failed.

This closes the gap where an agent's output "looks done" but silently
violates constraints (word count, tone, banned terms, structure) that were
never made explicit or checked.

What's new (`src/brief/`, 13 files)

planner.ts — LLM call that turns a task + agent's SOUL/RULES/DUTIES
into a BriefDraft: task summary, ambiguities, binary assertions
(format/content/quality/constraint/behavior/tone), a scoring rubric, and
estimated complexity. Assertion count is capped at 15 in code (backing up
the prompt's own "never exceed 15" instruction) to bound evaluator cost.
evaluator.ts — LLM call that reviews the Planner's draft itself
(not the agent's output) for vague/unmeasurable assertions, missing
coverage, or contradictions with RULES.md, and returns critical/warning
issues.
negotiator.ts — drives Planner ⇄ Evaluator rounds until the draft is
approved or a round limit is hit.
output-evaluator.ts — after the agent runs, grades its actual output
against each approved assertion (binary, evidence-required, strict).
Hardened in this PR: explicit constraints: { temperature: 0, maxTokens: 8000 }
on the grading call (previously silently inherited the host agent's
possibly-too-small max_tokens, causing truncated/unparseable JSON), plus
the failure path now distinguishes and reports why grading failed —
LLM call error, truncated response, or empty response — instead of a
generic Unexpected end of JSON input.
orchestrator.ts — top-level flow: negotiate → stale-check →
approve/edit/regenerate/skip → save.
runner.ts (runWithBrief) — runs the agent with the brief injected
into its system prompt, evaluates output, and auto-retries (default 2x)
with a prompt that lists exactly which assertions failed and why.
storage.ts — brief persistence: save/load/list/find/version/archive
under .gitagent/briefs/, content-hash based staleness keys.
stale.ts — detects when SOUL/RULES/DUTIES changed since a brief was
approved and reports which assertions are affected.
approval.ts — interactive CLI approval flow (approve/edit/regenerate/skip)
and brief list/detail rendering.
editor.ts — opens a draft in $EDITOR for manual editing, validates
the edited JSON against the schema.
injector.ts — serializes an approved brief into a system-prompt suffix.
report.ts — renders the final pass/fail evaluation report.
types.ts — shared types (Brief, BriefDraft, BriefAssertion,
AssertionResult, OutputVerdict, etc).

Integration changes (4 files)

src/loader.ts — adds optional brief?.model to AgentManifest so an
agent.yaml can pin a specific model for brief negotiation/evaluation,
independent of the agent's own model.preferred.
src/index.ts — new gitagent brief CLI subcommand:
- gitagent brief "<task>" — negotiate + approve + run with evaluation
- --only — negotiate and save without running
- --brief-path <path> — run with an already-approved brief
- --list / --view <id> / --regenerate — manage saved briefs
- --model / -m, --dir / -d, --yes / -y
src/exports.ts — exposes the brief API (runBriefOrchestration,
generateBrief, runWithBrief, storage helpers, and all brief types) from
the SDK's public surface.
src/schedule-runner.ts — scheduled jobs now look up an approved brief
matching the schedule's prompt and inject it into the run automatically,
warning if it's gone stale.

Why

Assertions are cheap to write and binary to check, which makes "did the
agent actually do what was asked" a testable question instead of a vibe
check — especially valuable for scheduled/unattended runs where nobody is
reading the output live.

Test plan

gitagent brief "write a 500-word post on X" --dir <agent> — negotiate,
approve, confirm the assertion contract looks reasonable
- Confirm auto-retry fires when an assertion is deliberately violated
gitagent brief --list / --view <id> render correctly
Edit an agent's SOUL.md after approving a brief — confirm staleness
warning appears on next run
Scheduled job with a matching approved brief picks it up automatically
(src/schedule-runner.ts)
tsc --noEmit passes

…uccess criteria Adds a Planner/Evaluator negotiation loop that turns a task prompt into a concrete, versioned contract of binary pass/fail assertions before an agent runs, then grades the agent's actual output against that contract with auto-retry on failure. Wires it into the CLI (`gitagent brief`), the SDK (`runBriefOrchestration`/`runWithBrief`), and the scheduler.

shreyas-lyzr

Solid addition overall — the planner/evaluator negotiation loop is well-thought-out and the hardening in output-evaluator.ts (explicit temperature/maxTokens, distinct failure reasons) is good prior-work cleanup. Four issues worth addressing before merge, in rough priority order.

shreyas-lyzr · 2026-07-03T16:10:18Z

 	model?: string;
 	env?: string;
-	runPrompt: (prompt: string) => Promise<string>;
+	runPrompt: (prompt: string, briefSuffix?: string) => Promise<string>;


The runPrompt signature now has an optional second parameter (briefSuffix?: string) but callers outside this repo who implement SchedulerOptions won't know to pass it — they'll silently ignore the brief injection because nothing enforces forwarding the suffix to the underlying query call. Since this is an exported interface, either document the new parameter clearly in a JSDoc comment, or consider shipping a thin adapter so callers don't have to change: e.g. run the brief injection internally inside executeScheduledJob and keep runPrompt with its original one-argument signature. The current design requires every downstream implementor to update their callback even though the caller already has the suffix.

shreyas-lyzr · 2026-07-03T16:10:18Z

+		currentNegotiation = renegotiation;
+		finalBrief = {
+			...finalBrief,
+			version: version + attempts,


Version number computed as version + attempts is wrong when version came from nextVersion() (which already accounts for archived versions). If, say, four prior versions exist, nextVersion returns 5, then on the second regenerate attempt the in-memory brief gets version 7, skipping 6. Use nextVersion(agentDir, id) again here (or track a running counter separately) so the on-disk filename and the in-memory version field stay consistent.

shreyas-lyzr · 2026-07-03T16:10:18Z

+		"| # | Category | Assertion | How to Verify |",
+		"|---|---|---|---|",
+		...draft.assertions.map(a =>
+			`| ${a.id} | ${a.category} | ${a.assertion} | ${a.test} |`,


Assertion text and test descriptions are written raw into Markdown table cells without escaping pipe characters. If an assertion contains | (e.g. "word count must be 800 | 1000" or a regex) the table will be corrupted, and parseBrief re-reads the Markdown table as display-only anyway (the canonical data is in the draft_json HTML comment), so the table is purely presentational. Either escape pipes in those fields (.replace(/\|/g, '\\|')) or add a comment noting the table is non-authoritative. Right now a future reader could be confused about why table rows look broken.

shreyas-lyzr · 2026-07-03T16:10:18Z

+const green  = (s: string) => isTTY ? `\x1b[32m${s}\x1b[0m` : s;
+const red    = (s: string) => isTTY ? `\x1b[31m${s}\x1b[0m` : s;
+
+const CATEGORY_COLORS: Record<string, (s: string) => string> = {


CATEGORY_COLORS in approval.ts and report.ts are identical and both ignore the module-level isTTY guard — the color lambdas always emit ANSI codes regardless of TTY. The other helpers (dim, bold, etc.) correctly gate on isTTY; these don't. Also, the same map is copy-pasted in both files. Extract it to a shared colors.ts utility and apply the isTTY guard consistently.

- schedule-runner.ts: revert runPrompt to its original one-arg signature, fold the brief suffix into the prompt internally so external SchedulerOptions implementers don't silently drop brief injection - orchestrator.ts: recompute version via nextVersion() on each regenerate attempt instead of `version + attempts`, which could skip version numbers - storage.ts: escape pipe characters in assertion/test text before writing the Markdown success-criteria table - approval.ts, report.ts, colors.ts: extract duplicated CATEGORY_COLORS into a shared colors.ts, fixing the missing isTTY guard in both copies

shreyas-lyzr reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(brief): pre-execution contract negotiation with output evaluation and auto-retry#68

feat(brief): pre-execution contract negotiation with output evaluation and auto-retry#68
Nivesh353 wants to merge 2 commits into
open-gitagent:mainfrom
Nivesh353:feat/contract-negotiation

Nivesh353 commented Jul 3, 2026

Uh oh!

shreyas-lyzr left a comment

Uh oh!

shreyas-lyzr Jul 3, 2026

Uh oh!

shreyas-lyzr Jul 3, 2026

Uh oh!

shreyas-lyzr Jul 3, 2026

Uh oh!

shreyas-lyzr Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Nivesh353 commented Jul 3, 2026

Summary

What's new (src/brief/, 13 files)

Integration changes (4 files)

Why

Test plan

Uh oh!

shreyas-lyzr left a comment

Choose a reason for hiding this comment

Uh oh!

shreyas-lyzr Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

shreyas-lyzr Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

shreyas-lyzr Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

shreyas-lyzr Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

What's new (`src/brief/`, 13 files)