Integrate datafusion-distributed with Python#1611
Conversation
1931203 to
0d40367
Compare
0d40367 to
53a685d
Compare
| .with_analyzer_rule(Arc::new(crate::analyzer::ResolveLambdaVariables::new())); | ||
|
|
||
| if distributed { | ||
| builder = builder.with_distributed_planner(); | ||
| } |
There was a problem hiding this comment.
Rather than letting external system inject their own QueryPlanners, this allows just plumbing datafusion-distributed query planner from within the Rust world, which is actually a pretty easy thing to do.
| "rt-multi-thread", | ||
| "sync", | ||
| ] } | ||
| tonic = { workspace = true } |
There was a problem hiding this comment.
This is the kind of thing that could be easily hidden behind a flag.
| @classmethod | ||
| def from_session_builder( | ||
| cls, | ||
| session_builder: Callable[[WorkerQueryContext], SessionContext], |
There was a problem hiding this comment.
In datafusion-distributed, this callback configures DataFusion at the SessionStateBuilder level, but this project does not contain an analogous structure, the closest is just a SessionContext, so users are expected to build a full SessionContext here out of another SessionContext, even if in datafusion-distributed the contract is SessionStateBuilder in and SessionState out.
Quick Try
Run two workers in separate terminals:
Download the test dataset:
Then run the query or print the distributed plan:
Or print the distributed plan:
Which issue does this PR close?
N/A.
Rationale for this change
This lets Python users wire
datafusion-distributedintodatafusion-python: discover workers from Python, spawn worker servers, and inspect distributed plans.One integration wrinkle: upstream examples usually build from
SessionStateBuilder, while this package exposesSessionConfig/SessionContextas the main public path. This PR installs the distributed planner when aSessionContextis built from a distributed config.What changes are included in this PR?
WorkerResolver,Worker, andWorkerQueryContextsupport.ExecutionPlan.display_distributed(metrics_format=...)using upstream ASCII plan rendering.--plan.Are there any user-facing changes?
Yes. New distributed APIs are exposed from Python, plus new examples. No intended breaking changes.