fix(engine): Base64-encode Python worker startup config to survive Windows argv#5917
fix(engine): Base64-encode Python worker startup config to survive Windows argv#5917yangzhang75 wants to merge 1 commit into
Conversation
Automated Reviewer SuggestionsBased on the
|
|
It seems this branch has diverged from the current main branch. Could you clean up the branch. |
bb77e2c to
28c8df1
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5917 +/- ##
============================================
+ Coverage 54.50% 54.69% +0.18%
- Complexity 2915 2952 +37
============================================
Files 1108 1110 +2
Lines 42807 42855 +48
Branches 4604 4608 +4
============================================
+ Hits 23332 23438 +106
+ Misses 18119 18050 -69
- Partials 1356 1367 +11
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
| config | throughput | MB/s | latency | max Δ latest / 7d | |
|---|---|---|---|---|---|
| 🔴 | bs=10 sw=10 sl=64 | 411 | 0.251 | 23,767/31,766/31,766 us | 🔴 +22.9% / 🔴 +113.4% |
| ⚪ | bs=100 sw=10 sl=64 | 817 | 0.499 | 120,635/156,208/156,208 us | ⚪ within ±5% / 🔴 +47.0% |
| ⚪ | bs=1000 sw=10 sl=64 | 912 | 0.557 | 1,100,507/1,127,471/1,127,471 us | ⚪ within ±5% / 🔴 +13.4% |
Baseline details
Latest main 1c580e5 from same runner
| config | metric | PR | latest main | 7d avg | Δ latest | Δ 7d |
|---|---|---|---|---|---|---|
| bs=10 sw=10 sl=64 | throughput | 411 tuples/sec | 459 tuples/sec | 781.13 tuples/sec | -10.5% | -47.4% |
| bs=10 sw=10 sl=64 | MB/s | 0.251 MB/s | 0.28 MB/s | 0.477 MB/s | -10.4% | -47.4% |
| bs=10 sw=10 sl=64 | p50 | 23,767 us | 22,514 us | 12,542 us | +5.6% | +89.5% |
| bs=10 sw=10 sl=64 | p95 | 31,766 us | 25,853 us | 14,886 us | +22.9% | +113.4% |
| bs=10 sw=10 sl=64 | p99 | 31,766 us | 25,853 us | 17,580 us | +22.9% | +80.7% |
| bs=100 sw=10 sl=64 | throughput | 817 tuples/sec | 812 tuples/sec | 999.37 tuples/sec | +0.6% | -18.2% |
| bs=100 sw=10 sl=64 | MB/s | 0.499 MB/s | 0.496 MB/s | 0.61 MB/s | +0.6% | -18.2% |
| bs=100 sw=10 sl=64 | p50 | 120,635 us | 120,386 us | 99,687 us | +0.2% | +21.0% |
| bs=100 sw=10 sl=64 | p95 | 156,208 us | 149,966 us | 106,271 us | +4.2% | +47.0% |
| bs=100 sw=10 sl=64 | p99 | 156,208 us | 149,966 us | 115,445 us | +4.2% | +35.3% |
| bs=1000 sw=10 sl=64 | throughput | 912 tuples/sec | 924 tuples/sec | 1,036 tuples/sec | -1.3% | -12.0% |
| bs=1000 sw=10 sl=64 | MB/s | 0.557 MB/s | 0.564 MB/s | 0.632 MB/s | -1.2% | -11.9% |
| bs=1000 sw=10 sl=64 | p50 | 1,100,507 us | 1,073,848 us | 970,675 us | +2.5% | +13.4% |
| bs=1000 sw=10 sl=64 | p95 | 1,127,471 us | 1,137,266 us | 1,011,928 us | -0.9% | +11.4% |
| bs=1000 sw=10 sl=64 | p99 | 1,127,471 us | 1,137,266 us | 1,045,045 us | -0.9% | +7.9% |
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,486.52,200,128000,411,0.251,23766.55,31766.35,31766.35
1,100,10,64,20,2448.25,2000,1280000,817,0.499,120635.30,156207.52,156207.52
2,1000,10,64,20,21935.23,20000,12800000,912,0.557,1100507.11,1127470.78,1127470.78|
@yangzhang75 please rebase your commit history on this branch. right now it contains 4461 commits. I marked the PR into a draft for now. please reopen it when ready. |
967020a to
28c8df1
Compare
|
Sure, feel free to ping me when is ready for a review pass. thanks. |
|
By default, PR is ready for review. If not, author should mark it a draft |
|
Ok I am doing testing now |
|
It worked, thanks. |
Yicong-Huang
left a comment
There was a problem hiding this comment.
ok, approved for the change. however base64 causes the payload not readable by human. could potentially make future debugging harder. low priority now.
Thanks!
|
@yangzhang75 could you please make CI work? |
What changes were proposed in this PR?
Follow-up fix for #5597, which started passing the Python worker startup config as a
single JSON-string command-line argument. On Windows the JVM assembles argv into one
command line and the inner double quotes are stripped before Python receives them, so
json.loadsfails withJSONDecodeError: Expecting property name enclosed in double quotes.Linux/macOS are unaffected (the JVM passes argv directly, quotes survive).
This Base64-encodes the JSON on the JVM side (
encodeStartupConfig) and Base64-decodes itin
texera_run_python_worker.pybefore parsing. Base64 uses only[A-Za-z0-9+/=], so theargument carries no quotes or spaces and survives argv quoting on every platform. The 19-key
contract and all validation are unchanged.
Any related issues, documentation, discussions?
Closes #5916
How was this PR tested?
PythonWorkflowWorkerStartupConfigSpecupdated to decode Base64; added a regressiontest asserting the encoded output contains no quotes/whitespace. 5/5 pass; scalafmt clean.
test_run_python_worker.pyupdated to feed Base64 input; added a round-trip test.ruff check + format clean.
Base64.getEncoderandPython
base64.b64decode.Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)