docs(agent-workflows): design the invoke-validation fix#5002
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Design docs only, no code. Needs a design review before I implement. Please weigh in on the open questions in
Biggest thing to sanity-check: is "three valid call shapes + a resolvable reference needs variant/environment/revision" the right contract to enforce, or is there a fourth shape I am missing? |
|
|
||
| ## Scope of the strictness (what we are NOT doing) | ||
|
|
||
| - **No blanket `extra="forbid"` on the envelope.** The user rejected locking the whole request |
There was a problem hiding this comment.
that was meant only for the parameters not the whole thing
|
|
||
| ## Where the code changes land (for implementation, not this PR) | ||
|
|
||
| - `sdks/python/agenta/sdk/middlewares/running/resolver.py`: add Rule A / B / C next to |
There was a problem hiding this comment.
only here. should the chnage live
|
|
||
| ## Open questions | ||
|
|
||
| 1. **400 vs 422.** Which status best matches the codebase's existing request-validation errors? |
There was a problem hiding this comment.
check and stay consistent with code base and best practices
| 1. **400 vs 422.** Which status best matches the codebase's existing request-validation errors? | ||
| `_validate_executable_reference_families` uses `bad_request`. Pick to stay consistent with how | ||
| the envelope reports validation failures elsewhere. | ||
| 2. **Rule C over-rejection on completion/chat.** An empty-body call to completion or chat is a |
There was a problem hiding this comment.
i dont think there is a case where you would want the defaults. you either want to give a config or you want to specify a revision (maybe the latest)
| intended a config (sent `references` or a malformed `data.revision`) and it did not resolve, | ||
| not when the caller deliberately sent an empty body. Confirm the exact signal that | ||
| distinguishes "intended a config" from "wants the default." | ||
| 3. **Where the check lives.** Fold Rules A/B/C into `_validate_executable_reference_families`, or |
| 3. **Where the check lives.** Fold Rules A/B/C into `_validate_executable_reference_families`, or | ||
| add a sibling validator at the same boundary? One function is simpler; a sibling keeps the | ||
| family check focused. Lean sibling. | ||
| 4. **Revision-nesting check placement.** A targeted validator on `data.revision` in |
There was a problem hiding this comment.
answered above. simplicity and limit to sdk
| `models/workflows.py` catches the shape early, but the resolver is where the shape is | ||
| consumed. Decide whether to validate in the model or in the resolver (Rule A). Lean resolver, | ||
| so the error can reference the same three-shapes message. | ||
| 5. **Does self-hydration land in the same change or separately?** If the seed fix lands too, a |
mmabrouk
left a comment
There was a problem hiding this comment.
lgtm assuming comments are addressed in the implementation
Implemented (still draft)Applied the review decisions and turned this from a design doc into code + updated docs on the same branch. Pushed as What changed
Both raise Decisions applied
Tests
Resolver/middleware + agents/golden/contract suites green (530 tests). What to review
|
8d051cb to
6a29a0e
Compare
|
Rebased onto big-agents |
|
This sounds like something that should be true and applicable for all services calling invoke and not just agents, no ? |
…undary Rebased onto big-agents f8765a9. Claude-Session: https://claude.ai/code/session_01WSp2LqKrEtXnm2fsPWuQWa
6a29a0e to
da561b1
Compare
Design docs only. No code.
The agent service invoke endpoint (
POST {host}/services/agent/v0/invoke) does not tell a caller when a request is malformed. Sendreferenceswith no revision, or senddata.revisionone level too shallow, and the service silently runs a seeded default (pi_core/gpt-5.5) and then 500s a few steps later. The caller gets an opaque late error instead of "your request was shaped wrong, here is the right shape." This cost the agent-creation lab real time, because a wrong guess looked like an agent runtime bug rather than a bad request.This PR adds a plan-feature design workspace that reframes the problem as request validation (not self-hydration) and proposes validating the invoke request at the shared resolver boundary.
The proposed validation
Validate before any run starts. When the request cannot resolve to a config, return a clear 4xx that names the three valid ways to invoke:
data.revision = {"data": ...}.application(an application has many revisions; a bare app id pins nothing).data.parameters.Three rules at the boundary:
data.revisionis present, it must be double-nested; a single-nested revision gets a 4xx that shows the fix._validate_executable_reference_familiesatresolver.py:69-98, which already raises 4xx for competing reference families).Scope limits (carried over from the superseded
harden-invokedecision): no blanketextra="forbid"on the envelope, no OpenAPI (/inspectstays the live contract). The seeded-default self-hydration fix is complementary and can land alongside, but validation is the primary deliverable because it helps even when self-hydration is intentionally off.Files (all under
docs/design/agent-workflows/projects/builder-agent-reliability/invoke-validation/)README.md- index.context.md- symptom, lab cost, why validation is the right lens, goals / non-goals.research.md- the mechanism withfile:linecitations (double-nested requirement atresolver.py:150; the agent seed atutils.py:285-287defeating the hydration gate atresolver.py:573-577; loose envelope atworkflows.py:237/:296; the API pre-hydration that makes the product path work atservice.py:745-751; the three reference-kind results; the live reproduction).plan.md- Rules A/B/C, scope, consistency across agent/completion/chat, test matrix.status.md- decisions + open questions.Supersedes and merges the earlier
harden-invoke/silent-fallback/invoke-contractnotes underscratch/console/builder-kit/.https://claude.ai/code/session_01WSp2LqKrEtXnm2fsPWuQWa