Skip to content

docs(agent-workflows): design the invoke-validation fix#5002

Draft
mmabrouk wants to merge 1 commit into
big-agentsfrom
docs/agent-invoke-validation
Draft

docs(agent-workflows): design the invoke-validation fix#5002
mmabrouk wants to merge 1 commit into
big-agentsfrom
docs/agent-invoke-validation

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jul 1, 2026

Copy link
Copy Markdown
Member

Design docs only. No code.

The agent service invoke endpoint (POST {host}/services/agent/v0/invoke) does not tell a caller when a request is malformed. Send references with no revision, or send data.revision one level too shallow, and the service silently runs a seeded default (pi_core / gpt-5.5) and then 500s a few steps later. The caller gets an opaque late error instead of "your request was shaped wrong, here is the right shape." This cost the agent-creation lab real time, because a wrong guess looked like an agent runtime bug rather than a bad request.

This PR adds a plan-feature design workspace that reframes the problem as request validation (not self-hydration) and proposes validating the invoke request at the shared resolver boundary.

The proposed validation

Validate before any run starts. When the request cannot resolve to a config, return a clear 4xx that names the three valid ways to invoke:

  1. A complete revision, nested correctly: data.revision = {"data": ...}.
  2. No revision, plus a resolvable reference: a variant, an environment, or a revision, not a bare application (an application has many revisions; a bare app id pins nothing).
  3. Inline configuration: data.parameters.

Three rules at the boundary:

  • Rule A - if data.revision is present, it must be double-nested; a single-nested revision gets a 4xx that shows the fix.
  • Rule B - a reference that is the intended target must be resolvable (reuses and extends the existing _validate_executable_reference_families at resolver.py:69-98, which already raises 4xx for competing reference families).
  • Rule C - if there is no revision, no resolvable reference, and no inline params, reject with the three valid shapes spelled out.

Scope limits (carried over from the superseded harden-invoke decision): no blanket extra="forbid" on the envelope, no OpenAPI (/inspect stays the live contract). The seeded-default self-hydration fix is complementary and can land alongside, but validation is the primary deliverable because it helps even when self-hydration is intentionally off.

Files (all under docs/design/agent-workflows/projects/builder-agent-reliability/invoke-validation/)

  • README.md - index.
  • context.md - symptom, lab cost, why validation is the right lens, goals / non-goals.
  • research.md - the mechanism with file:line citations (double-nested requirement at resolver.py:150; the agent seed at utils.py:285-287 defeating the hydration gate at resolver.py:573-577; loose envelope at workflows.py:237/:296; the API pre-hydration that makes the product path work at service.py:745-751; the three reference-kind results; the live reproduction).
  • plan.md - Rules A/B/C, scope, consistency across agent/completion/chat, test matrix.
  • status.md - decisions + open questions.

Supersedes and merges the earlier harden-invoke / silent-fallback / invoke-contract notes under scratch/console/builder-kit/.

https://claude.ai/code/session_01WSp2LqKrEtXnm2fsPWuQWa

@vercel

vercel Bot commented Jul 1, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jul 3, 2026 9:44am

Request Review

@mmabrouk mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jul 1, 2026
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d8b20eab-c026-4e1d-a15b-491c85eb8bb5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/agent-invoke-validation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@mmabrouk

mmabrouk commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

Design docs only, no code. Needs a design review before I implement.

Please weigh in on the open questions in status.md, specifically:

  1. Status code: 400 vs 422 for the malformed-invoke reject, to stay consistent with how the envelope reports validation failures elsewhere. _validate_executable_reference_families uses bad_request (400) today.
  2. Rule C over-rejection: an empty-body call to completion/chat is a legitimate "run the interface default" case. Is "the caller sent references or a malformed data.revision that did not resolve" the right signal to distinguish "intended a config" from "wants the default"? Does this hold across all three services?
  3. Where the checks live: fold Rules A/B/C into _validate_executable_reference_families, or add a sibling validator at the same resolver boundary (my lean)? And validate the revision-nesting in the resolver vs in models/workflows.py?
  4. Bundle with self-hydration? Should the seeded-default fix (utils.py:285-287 -> empty WorkflowRevisionData(), so a references-only agent call self-hydrates like completion/chat) land in the same change, or stay separate and keep this validation-only?

Biggest thing to sanity-check: is "three valid call shapes + a resolvable reference needs variant/environment/revision" the right contract to enforce, or is there a fourth shape I am missing?


## Scope of the strictness (what we are NOT doing)

- **No blanket `extra="forbid"` on the envelope.** The user rejected locking the whole request

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was meant only for the parameters not the whole thing


## Where the code changes land (for implementation, not this PR)

- `sdks/python/agenta/sdk/middlewares/running/resolver.py`: add Rule A / B / C next to

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only here. should the chnage live


## Open questions

1. **400 vs 422.** Which status best matches the codebase's existing request-validation errors?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check and stay consistent with code base and best practices

1. **400 vs 422.** Which status best matches the codebase's existing request-validation errors?
`_validate_executable_reference_families` uses `bad_request`. Pick to stay consistent with how
the envelope reports validation failures elsewhere.
2. **Rule C over-rejection on completion/chat.** An empty-body call to completion or chat is a

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think there is a case where you would want the defaults. you either want to give a config or you want to specify a revision (maybe the latest)

intended a config (sent `references` or a malformed `data.revision`) and it did not resolve,
not when the caller deliberately sent an empty body. Confirm the exact signal that
distinguishes "intended a config" from "wants the default."
3. **Where the check lives.** Fold Rules A/B/C into `_validate_executable_reference_families`, or

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answered above

3. **Where the check lives.** Fold Rules A/B/C into `_validate_executable_reference_families`, or
add a sibling validator at the same boundary? One function is simpler; a sibling keeps the
family check focused. Lean sibling.
4. **Revision-nesting check placement.** A targeted validator on `data.revision` in

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answered above. simplicity and limit to sdk

`models/workflows.py` catches the shape early, but the resolver is where the shape is
consumed. Decide whether to validate in the model or in the resolver (Rule A). Lean resolver,
so the error can reference the same three-shapes message.
5. **Does self-hydration land in the same change or separately?** If the seed fix lands too, a

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not here

@mmabrouk mmabrouk left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm assuming comments are addressed in the implementation

@mmabrouk mmabrouk added lgtm This PR has been approved by a maintainer implementing Design approved; implementation queued/in progress (not awaiting review) and removed needs-review Agent updated; awaiting Mahmoud's review labels Jul 1, 2026
@mmabrouk

mmabrouk commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

Implemented (still draft)

Applied the review decisions and turned this from a design doc into code + updated docs on the same branch. Pushed as 8d051cbad3.

What changed

sdks/python/agenta/sdk/middlewares/running/resolver.py — new sibling validator _validate_resolvable_config, called at the top of ResolverMiddleware.__call__, next to _validate_executable_reference_families. Two checks, scoped to the config path:

  • Rule A — a supplied data.revision must be double-nested ({"data": {...}}). A single-nested revision is dropped silently today and falls through to a seeded default; now it is a clear 400 with the fix.
  • Rule B — when references are the intended config source (no inline parameters, no usable revision), they must pin one committed config. A bare application (or workflow/evaluator) root resolves nothing and is rejected in favor of a variant, an environment, or a revision.

Both raise bad_request (400, via the existing _raise_bad_request), matching the family validator at the same boundary. The error names the two valid call shapes and why a bare application is not resolvable.

Decisions applied

  1. Two valid shapes, not three. No "run the default" rule. An empty request is left alone, so completion and chat do not regress and config can still arrive from context.
  2. Status: 400 (bad_request), consistent with the codebase.
  3. SDK resolver boundary only. Not in models/workflows.py; the nesting check lives in the resolver so it shares the shapes message.
  4. Scoped to the config/parameters path, not a blanket extra="forbid" on the envelope.
  5. Self-hydration / seeded-default fix (utils.py:285-287) left as a separate follow-up. Validation-only.

Tests

TestResolverConfigValidation in test_resolver_middleware.py: inline config passes, double-nested revision passes, resolvable variant/environment/revision references pass, empty request passes; bare application and bare workflow reject with the clear 400; single-nested data.revision rejects; bare-application also rejects through the middleware before call_next. One existing test that used a bare application as a stand-in was updated to a variant (a bare application is now rejected).

Resolver/middleware + agents/golden/contract suites green (530 tests). ruff format + ruff check clean.

What to review

  • Are Rule A and Rule B the right two checks, and is dropping the "nothing to run" rule (to avoid regressing completion/chat and the context-supplied-config path) the right call?
  • The bare-root rule now applies to workflow/evaluator roots too, not just application. That is intended per the family symmetry; flag if any real flow relies on a bare root resolving.
  • Error message wording (the shared _INVOKE_CALL_SHAPES tail).

@mmabrouk mmabrouk force-pushed the docs/agent-invoke-validation branch from 8d051cb to 6a29a0e Compare July 1, 2026 20:54
@mmabrouk mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jul 1, 2026
@mmabrouk

mmabrouk commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

Rebased onto big-agents f8765a9b89 (current base, post services/agent -> services/runner rename). Content unchanged from the prior review round; diff is exactly this project's files. Ready for review.

@jp-agenta

Copy link
Copy Markdown
Member

This sounds like something that should be true and applicable for all services calling invoke and not just agents, no ?
More of it should go in the route and workflow decorators and only agent-specific validations should go in the agent handler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

implementing Design approved; implementation queued/in progress (not awaiting review) lgtm This PR has been approved by a maintainer needs-review Agent updated; awaiting Mahmoud's review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants