docs(agent-workflows): design the invoke-validation fix by mmabrouk · Pull Request #5002 · Agenta-AI/agenta

mmabrouk · 2026-07-01T17:07:44Z

Design docs only. No code.

The agent service invoke endpoint (POST {host}/services/agent/v0/invoke) does not tell a caller when a request is malformed. Send references with no revision, or send data.revision one level too shallow, and the service silently runs a seeded default (pi_core / gpt-5.5) and then 500s a few steps later. The caller gets an opaque late error instead of "your request was shaped wrong, here is the right shape." This cost the agent-creation lab real time, because a wrong guess looked like an agent runtime bug rather than a bad request.

This PR adds a plan-feature design workspace that reframes the problem as request validation (not self-hydration) and proposes validating the invoke request at the shared resolver boundary.

The proposed validation

Validate before any run starts. When the request cannot resolve to a config, return a clear 4xx that names the three valid ways to invoke:

A complete revision, nested correctly: data.revision = {"data": ...}.
No revision, plus a resolvable reference: a variant, an environment, or a revision, not a bare application (an application has many revisions; a bare app id pins nothing).
Inline configuration: data.parameters.

Three rules at the boundary:

Rule A - if data.revision is present, it must be double-nested; a single-nested revision gets a 4xx that shows the fix.
Rule B - a reference that is the intended target must be resolvable (reuses and extends the existing _validate_executable_reference_families at resolver.py:69-98, which already raises 4xx for competing reference families).
Rule C - if there is no revision, no resolvable reference, and no inline params, reject with the three valid shapes spelled out.

Scope limits (carried over from the superseded harden-invoke decision): no blanket extra="forbid" on the envelope, no OpenAPI (/inspect stays the live contract). The seeded-default self-hydration fix is complementary and can land alongside, but validation is the primary deliverable because it helps even when self-hydration is intentionally off.

Files (all under `docs/design/agent-workflows/projects/builder-agent-reliability/invoke-validation/`)

README.md - index.
context.md - symptom, lab cost, why validation is the right lens, goals / non-goals.
research.md - the mechanism with file:line citations (double-nested requirement at resolver.py:150; the agent seed at utils.py:285-287 defeating the hydration gate at resolver.py:573-577; loose envelope at workflows.py:237/:296; the API pre-hydration that makes the product path work at service.py:745-751; the three reference-kind results; the live reproduction).
plan.md - Rules A/B/C, scope, consistency across agent/completion/chat, test matrix.
status.md - decisions + open questions.

Supersedes and merges the earlier harden-invoke / silent-fallback / invoke-contract notes under scratch/console/builder-kit/.

https://claude.ai/code/session_01WSp2LqKrEtXnm2fsPWuQWa

vercel · 2026-07-01T17:07:51Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jul 3, 2026 9:44am

coderabbitai · 2026-07-01T17:07:53Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d8b20eab-c026-4e1d-a15b-491c85eb8bb5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/agent-invoke-validation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

mmabrouk · 2026-07-01T17:08:07Z

Design docs only, no code. Needs a design review before I implement.

Please weigh in on the open questions in status.md, specifically:

Status code: 400 vs 422 for the malformed-invoke reject, to stay consistent with how the envelope reports validation failures elsewhere. _validate_executable_reference_families uses bad_request (400) today.
Rule C over-rejection: an empty-body call to completion/chat is a legitimate "run the interface default" case. Is "the caller sent references or a malformed data.revision that did not resolve" the right signal to distinguish "intended a config" from "wants the default"? Does this hold across all three services?
Where the checks live: fold Rules A/B/C into _validate_executable_reference_families, or add a sibling validator at the same resolver boundary (my lean)? And validate the revision-nesting in the resolver vs in models/workflows.py?
Bundle with self-hydration? Should the seeded-default fix (utils.py:285-287 -> empty WorkflowRevisionData(), so a references-only agent call self-hydrates like completion/chat) land in the same change, or stay separate and keep this validation-only?

Biggest thing to sanity-check: is "three valid call shapes + a resolvable reference needs variant/environment/revision" the right contract to enforce, or is there a fourth shape I am missing?

mmabrouk · 2026-07-01T17:40:08Z

+
+## Scope of the strictness (what we are NOT doing)
+
+- **No blanket `extra="forbid"` on the envelope.** The user rejected locking the whole request


that was meant only for the parameters not the whole thing

mmabrouk · 2026-07-01T17:40:45Z

+
+## Where the code changes land (for implementation, not this PR)
+
+- `sdks/python/agenta/sdk/middlewares/running/resolver.py`: add Rule A / B / C next to


only here. should the chnage live

mmabrouk · 2026-07-01T17:41:30Z

+
+## Open questions
+
+1. **400 vs 422.** Which status best matches the codebase's existing request-validation errors?


check and stay consistent with code base and best practices

mmabrouk · 2026-07-01T17:42:39Z

+1. **400 vs 422.** Which status best matches the codebase's existing request-validation errors?
+   `_validate_executable_reference_families` uses `bad_request`. Pick to stay consistent with how
+   the envelope reports validation failures elsewhere.
+2. **Rule C over-rejection on completion/chat.** An empty-body call to completion or chat is a


i dont think there is a case where you would want the defaults. you either want to give a config or you want to specify a revision (maybe the latest)

mmabrouk · 2026-07-01T17:42:54Z

+   intended a config (sent `references` or a malformed `data.revision`) and it did not resolve,
+   not when the caller deliberately sent an empty body. Confirm the exact signal that
+   distinguishes "intended a config" from "wants the default."
+3. **Where the check lives.** Fold Rules A/B/C into `_validate_executable_reference_families`, or


answered above

mmabrouk · 2026-07-01T17:43:14Z

+3. **Where the check lives.** Fold Rules A/B/C into `_validate_executable_reference_families`, or
+   add a sibling validator at the same boundary? One function is simpler; a sibling keeps the
+   family check focused. Lean sibling.
+4. **Revision-nesting check placement.** A targeted validator on `data.revision` in


answered above. simplicity and limit to sdk

mmabrouk · 2026-07-01T17:43:22Z

+   `models/workflows.py` catches the shape early, but the resolver is where the shape is
+   consumed. Decide whether to validate in the model or in the resolver (Rule A). Lean resolver,
+   so the error can reference the same three-shapes message.
+5. **Does self-hydration land in the same change or separately?** If the seed fix lands too, a


mmabrouk

lgtm assuming comments are addressed in the implementation

mmabrouk · 2026-07-01T18:02:01Z

Implemented (still draft)

Applied the review decisions and turned this from a design doc into code + updated docs on the same branch. Pushed as 8d051cbad3.

What changed

sdks/python/agenta/sdk/middlewares/running/resolver.py — new sibling validator _validate_resolvable_config, called at the top of ResolverMiddleware.__call__, next to _validate_executable_reference_families. Two checks, scoped to the config path:

Rule A — a supplied data.revision must be double-nested ({"data": {...}}). A single-nested revision is dropped silently today and falls through to a seeded default; now it is a clear 400 with the fix.
Rule B — when references are the intended config source (no inline parameters, no usable revision), they must pin one committed config. A bare application (or workflow/evaluator) root resolves nothing and is rejected in favor of a variant, an environment, or a revision.

Both raise bad_request (400, via the existing _raise_bad_request), matching the family validator at the same boundary. The error names the two valid call shapes and why a bare application is not resolvable.

Decisions applied

Two valid shapes, not three. No "run the default" rule. An empty request is left alone, so completion and chat do not regress and config can still arrive from context.
Status: 400 (bad_request), consistent with the codebase.
SDK resolver boundary only. Not in models/workflows.py; the nesting check lives in the resolver so it shares the shapes message.
Scoped to the config/parameters path, not a blanket extra="forbid" on the envelope.
Self-hydration / seeded-default fix (utils.py:285-287) left as a separate follow-up. Validation-only.

Tests

TestResolverConfigValidation in test_resolver_middleware.py: inline config passes, double-nested revision passes, resolvable variant/environment/revision references pass, empty request passes; bare application and bare workflow reject with the clear 400; single-nested data.revision rejects; bare-application also rejects through the middleware before call_next. One existing test that used a bare application as a stand-in was updated to a variant (a bare application is now rejected).

Resolver/middleware + agents/golden/contract suites green (530 tests). ruff format + ruff check clean.

What to review

Are Rule A and Rule B the right two checks, and is dropping the "nothing to run" rule (to avoid regressing completion/chat and the context-supplied-config path) the right call?
The bare-root rule now applies to workflow/evaluator roots too, not just application. That is intended per the family symmetry; flag if any real flow relies on a bare root resolving.
Error message wording (the shared _INVOKE_CALL_SHAPES tail).

mmabrouk · 2026-07-01T20:58:12Z

Rebased onto big-agents f8765a9b89 (current base, post services/agent -> services/runner rename). Content unchanged from the prior review round; diff is exactly this project's files. Ready for review.

jp-agenta · 2026-07-02T21:44:25Z

This sounds like something that should be true and applicable for all services calling invoke and not just agents, no ?
More of it should go in the route and workflow decorators and only agent-specific validations should go in the agent handler.

…undary Rebased onto big-agents f8765a9. Claude-Session: https://claude.ai/code/session_01WSp2LqKrEtXnm2fsPWuQWa

mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jul 1, 2026

vercel Bot deployed to Preview July 1, 2026 17:08 View deployment

mmabrouk commented Jul 1, 2026

View reviewed changes

mmabrouk added lgtm This PR has been approved by a maintainer implementing Design approved; implementation queued/in progress (not awaiting review) and removed needs-review Agent updated; awaiting Mahmoud's review labels Jul 1, 2026

vercel Bot deployed to Preview July 1, 2026 18:02 View deployment

mmabrouk force-pushed the docs/agent-invoke-validation branch from 8d051cb to 6a29a0e Compare July 1, 2026 20:54

vercel Bot deployed to Preview July 1, 2026 20:56 View deployment

mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jul 1, 2026

feat(sdk): validate an invoke resolves to a config at the resolver bo…

da561b1

…undary Rebased onto big-agents f8765a9. Claude-Session: https://claude.ai/code/session_01WSp2LqKrEtXnm2fsPWuQWa

mmabrouk force-pushed the docs/agent-invoke-validation branch from 6a29a0e to da561b1 Compare July 3, 2026 09:42

vercel Bot deployed to Preview July 3, 2026 09:44 View deployment


		## Scope of the strictness (what we are NOT doing)

		- No blanket `extra="forbid"` on the envelope. The user rejected locking the whole request


		## Where the code changes land (for implementation, not this PR)

		- `sdks/python/agenta/sdk/middlewares/running/resolver.py`: add Rule A / B / C next to


		## Open questions

		1. 400 vs 422. Which status best matches the codebase's existing request-validation errors?

Uh oh!

Conversation

mmabrouk commented Jul 1, 2026

Design docs only. No code.

The proposed validation

Files (all under docs/design/agent-workflows/projects/builder-agent-reliability/invoke-validation/)

Uh oh!

vercel Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

mmabrouk commented Jul 1, 2026

Uh oh!

mmabrouk Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk left a comment

Choose a reason for hiding this comment

Uh oh!

mmabrouk commented Jul 1, 2026

Implemented (still draft)

What changed

Decisions applied

Tests

What to review

Uh oh!

mmabrouk commented Jul 1, 2026

Uh oh!

jp-agenta commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Files (all under `docs/design/agent-workflows/projects/builder-agent-reliability/invoke-validation/`)

vercel Bot commented Jul 1, 2026 •

edited

Loading

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading