Skip to content

feat(bigframes): UDF transpiler handles some control flow#17558

Open
TrevorBergeron wants to merge 13 commits into
mainfrom
tbergeron_more_py_funcs
Open

feat(bigframes): UDF transpiler handles some control flow#17558
TrevorBergeron wants to merge 13 commits into
mainfrom
tbergeron_more_py_funcs

Conversation

@TrevorBergeron

Copy link
Copy Markdown
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces control flow graph (CFG) construction and topological sorting to transpile Python bytecode with conditional branches and jumps into BigFrames expressions, supported by extensive unit tests. The review feedback highlights several critical improvements, including handling the JUMP_BACKWARD_NO_INTERRUPT instruction to prevent loops from being silently compiled, optimizing CFG building by precomputing instruction offsets to avoid O(N^2) sorting overhead, removing dead code for STORE_FAST, and correcting a misleading stack underflow error message.

Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
@TrevorBergeron TrevorBergeron requested a review from tswast June 24, 2026 19:42
@TrevorBergeron TrevorBergeron marked this pull request as ready for review June 24, 2026 19:42
@TrevorBergeron TrevorBergeron requested review from a team as code owners June 24, 2026 19:42
Comment thread packages/bigframes/bigframes/operations/generic_ops.py
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py
@TrevorBergeron TrevorBergeron requested a review from tswast June 26, 2026 01:26

@tswast tswast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more case statement that I worry about and a nit, but otherwise looking good, thanks!

local_vars.get(var2, expression.UnboundVariableExpression(var2))
)

case name if name.startswith("LOAD_FAST"):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we've observed at least two LOAD_FAST instructions that need to append two variables to the statck instead of 1, I worry about this "startswith" too. I'd rather fail for an unknown instruction than accidentally miss a value / generate incorrect code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, and yeah, best to just be entirely explicit everywhere about the exact instruction

stack.pop()

case name if name in _ALL_JUMP_OPNAMES:
if opname in _UNCONDITIONAL_JUMP_OPNAMES:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why are we nesting if statements inside the case statement instead of introducing more specific case statements? Other than the "jumped = True" line, these cases appear pretty independent. IMO at least that top level if/elif/else should be broken up into separate cases to reduce a bit of the nesting.

Alternatively, maybe we could break the jump handling into a separate function to reduce a bit of nesting that way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flattened in new revision. I think there is some room to factor things, but will leave for later. Numba for instance dispatches methods based on op name and mutates an object representing VM state: https://github.com/numba/numba/blob/main/numba/core/byteflow.py

@TrevorBergeron TrevorBergeron requested a review from tswast June 26, 2026 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants