yeast: Add support for a user-managed context during rule rewrites#22054
Conversation
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
| /// translation reaches inner rules that read that state. | ||
| /// | ||
| /// ```text | ||
| /// manual_rule!( |
There was a problem hiding this comment.
Actually, I think I just had a good idea (dangerous, I know). What if instead of having a separate macro that auto-translates nothing (and one that auto-translates everything), we add a way to annotate a capture as "raw" or "untranslated"?
Then we could -- at the cost of relatively little syntax -- get the best of both worlds. Non-raw captures get auto-translated before the body of the rule target is executed, everything else is up to the user.
As for the syntax, I'm thinking either @!foo or perhaps @@foo -- something that stands out sufficiently to remind the user that there's a raw capture that needs to be dealt with.
Also, in that case I would auto-wrap the entire body of (the result of expanding) manual_rule in Ok to avoid having to write Ok(...) at the end. This is what rule does currently anyway, and it wouldn't change the expressivity (since we need to return Result anyway).
There was a problem hiding this comment.
I worry about the syntax becoming a soup of symbols tbh. Can we keep manual_rule as it is for now?
There was a problem hiding this comment.
Fair enough. I'll postpone those changes for a different PR.
Renames what was previously called `__yeast_ctx` into just `ctx`, and adds a new field `user_ctx` to this context. Said field can contain a struct of any user type (necessitating making various parts of the implementation generic in said type). Through some Deref magic, field accesses are delegated to the inner struct (assuming they are not already defined on `ctx`), which should hopefully make the interface a bit more ergonomic.
This will enable us to actually capture and log errors in complicated rules (e.g. ones written in Rust) rather than just panicking.
This enables users to specify how and when these captures get translated. In conjunction with the context mechanism, this can be used to e.g. translate some piece of information (e.g. the type of something), record it in the context, and then recursively translate some other capture that relies on this information. This allows information to be cleanly passed into descendants (which can be written using context accesses in the `rule!` macro form). As a consequence of this change, we now need to pass around a TranslatorHandle to perform the manual translation. For Repeating rules, it doesn't really make sense to translate things, so in this case we simply signal an error. Also, the implementation of the `rule!` macro changes slightly (without changing semantics): it now essentially delegates to `Rule::new`, receiving raw captures, but then immediately applies the translation to those captures (which, for the majority of cases, is likely the desired behaviour).
Adds `manual_rule!` which provides a more low-level interface for defining rewrites. (I'm not entirely sold on the name, so any suggestions would be welcome.) Notably, the captures bound in the body of such rules have _not_ been translated yet -- they still come from the _input_ tree. It is the user's duty to call ctx.translate on these (which has the effect of recursively invoking the translation) before substituting them into the output. For _truly_ low-level access, the user can still construct a Rule directly, but this is now somewhat cumbersome as the closure contained therein takes quite a few parameters. Still, the possibility remains.
This was necessary since otherwise the generic type of the user-specified context (which should only be a concern for yeast) starts to bleed out into the shared extractor. Instead, we type-erase it by putting it inside the aforementioned trait.
Propagates in name and type information for various property declarations, using the context mechanism. This avoids mutating already-translated nodes in-place, and is generally much easier to read.
Extends the context with a field for keeping track of the default value. In the process, we also rename the context to SwiftContext as it now doesn't only concern itself with properties.
Avoids more "mutation after creation" via prepend_field. Also adds a test to the corpus for exercising this syntax. Although it's not evident, the test output was unchanged by this refactoring.
Same as in the preceding commit, we added a test beforehand for testing this syntax, and verified that it was unchanged by the cleanup in this commit.
Gets rid of the final uses of mutation (via prepend_field). The approach is the same as in the preceding commits: we set the appropriate fields on the context when processing the outer node, and then access these fields on the inner nodes. The repeated use of `modifier` fields is a _bit_ clunky, but since we're likely moving to an out-of-band modifier mechanism at some point, I think it's good enough for now.
(Both reduce_left and map are still supported, but we could remove them at this point.) I think this way of writing things makes the intent a lot clearer -- it avoids extending the yeast rule language with complicated constructs, pushing the complexity (such as it is) into Rust instead.
Cleans up a few places where we were constructing trees piece by piece
rather than using the `tree!` macro.
In the process, Copilot noticed an issue that should probably be
addressed: the labeled_statement rule can never fire, since there are no
such nodes in the input. This is possibly a simple as making
_labeled_statement (which _does_ exist) named, but I haven't attempted
this.
Finally, a small change to yeast makes it so that the contents of a {}
interpolation can be a Rust block (previously it could only be a single
expression). This avoids the need to double-wrap instances where you
want to interpolate a single node produced as the final value of some
block.
Format the touched Rust crates (shared/tree-sitter-extractor, shared/yeast, shared/yeast-macros, unified/extractor) so the tree-sitter-extractor CI fmt check passes. No functional changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f75cf95 to
af7ae8c
Compare
There was a problem hiding this comment.
Pull request overview
This PR extends the shared/yeast rewriting framework with a user-managed context threaded through rule transforms (with snapshot/restore semantics), and updates the tree-sitter extractor and Swift desugaring pipeline to use it. This enables Swift outer rules to publish contextual information (e.g., property name/type/modifiers) so inner rules can emit schema-valid nodes directly during recursive translation.
Changes:
- Add a typed user context to yeast transforms (Rules/Phases/Runner), including a translation handle to support controlled recursive translation from within transforms.
- Introduce
manual_rule!(no auto-translation prefix) and a type-erasedDesugarer/ConcreteDesugarerinterface for extractor integration. - Update Swift desugaring rules to use context propagation for properties/accessors (including willSet/didSet and protocol property requirements) and expand Swift corpus coverage.
Show a summary per file
| File | Description |
|---|---|
| unified/extractor/tests/corpus/swift/variables.txt | Adds corpus case for stored properties with willSet/didSet observers. |
| unified/extractor/tests/corpus/swift/types.txt | Adds corpus cases for protocol property requirements and chained enum cases. |
| unified/extractor/tests/corpus_tests.rs | Updates corpus harness to run desugaring via the new Desugarer interface when configured. |
| unified/extractor/src/languages/swift/swift.rs | Introduces SwiftContext and migrates several Swift rules to context-driven manual_rule! transforms; builds a ConcreteDesugarer. |
| unified/extractor/src/generator.rs | Formatting-only adjustments while continuing to use DesugaringConfig for schema generation. |
| unified/extractor/src/extractor.rs | Minor import formatting / wrapping (no behavior change). |
| shared/yeast/tests/test.rs | Updates tests for generic Runner/Phase/Rule and the new BuildCtx constructor signature. |
| shared/yeast/src/schema.rs | Minor formatting change for field_types. |
| shared/yeast/src/node_types_yaml.rs | Minor formatting changes in YAML→schema application. |
| shared/yeast/src/lib.rs | Core change: add user context threading + snapshot semantics, translator handle, generic Rule/Phase/Runner, and Desugarer/ConcreteDesugarer. |
| shared/yeast/src/dump.rs | Minor formatting/refactor in typed dump entrypoint. |
| shared/yeast/src/build.rs | Extend BuildCtx to carry user context + optional translator; add translate helpers and Deref to user context. |
| shared/yeast/src/bin/main.rs | Adds explicit Runner type annotation for generic inference. |
| shared/yeast/doc/yeast.md | Documents { ... } blocks inside tree!/splices now behaving like Rust blocks (multi-statement supported). |
| shared/yeast-macros/src/parse.rs | Changes implicit ctx identifier to ctx; wraps {...} expressions as blocks; adds parsing/codegen for manual_rule!. |
| shared/yeast-macros/src/lib.rs | Exposes the new manual_rule proc macro. |
| shared/tree-sitter-extractor/src/extractor/simple.rs | Switches LanguageSpec.desugar from DesugaringConfig to Box<dyn yeast::Desugarer> and updates schema selection logic. |
| shared/tree-sitter-extractor/src/extractor/mod.rs | Updates extractor plumbing to accept/use Option<&dyn yeast::Desugarer> instead of a prebuilt yeast Runner. |
Copilot's findings
- Files reviewed: 18/18 changed files
- Comments generated: 1
This provides a (hopefully) clean way to allow information to flow from nodes to their descendants.
Note that although the user context is mutable, it is also snapshotted before each recursive call (and restored afterwards). Thus, modifications that happen inside one child node cannot affect the context seen by another. This seemed like the healthiest choice and the most consistent semantics.
This PR builds on tope of #22016, which should be merged first.
It can be reviewed commit-by-commit.
(The first commit is a bit heavy due to all of the generic parameters being introduced. The interesting changes take place in
lib.rsandswift.rs.)