[AURON #1716] fix string cast array missing null value#1750
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in array-to-string casting where null values within arrays were not being properly rendered. Previously, cast(array(c1) as string) would return [] for an array containing a null element, but in Spark 3.1+, it should return [null]. The implementation adds a custom Spark-compatible cast operation for List to Utf8 types.
Key Changes:
- Added custom array-to-string cast implementation in Rust that properly handles null elements
- Added comprehensive test coverage with 5 integration tests and 5 unit tests
- Implementation follows existing Spark-compatible patterns for complex type casting
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
native-engine/datafusion-ext-commons/src/arrow/cast.rs |
Implements array-to-string casting with proper null handling, mirroring the existing struct-to-string pattern. Includes 5 unit tests covering edge cases (nulls, empty arrays, nested arrays, string arrays). |
spark-extension-shims-spark/src/test/scala/org.apache.auron/AuronQuerySuite.scala |
Adds 5 integration tests to verify Spark compatibility for various array casting scenarios (basic arrays, nested arrays, arrays with nulls, string arrays, empty arrays). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
we should think about a better way to handle inconsistent behaviors of different spark versions. for example we can expose |
After this PR merge, I plan to refactor array / map / struct cast to string to be compatible with versions less than Spark3.1.0 |
27a9f07 to
090ef6b
Compare
weiqingy
left a comment
There was a problem hiding this comment.
Thanks for taking this on — the array arm is a clean, faithful mirror of the existing struct/map → string branches, and pairing the Rust unit tests with checkSparkAnswerAndOperator integration tests gives real native-vs-Spark coverage of the null cases. One optional thread inline.
| let num_elements = end - start; | ||
|
|
||
| if num_elements > 0 { | ||
| if values.is_null(start) { |
There was a problem hiding this comment.
The new arm renders null elements as null unconditionally — exactly Spark 3.1+ non-legacy behavior, and consistent with what the struct and map arms already do, so this isn't new divergence. The one gap worth flagging: Spark gates this on !legacyCastToStr (spark.sql.legacy.castComplexTypesToString.enabled), while the integration tests guard only on isSparkV31OrGreater, not on that flag. So on Spark 3.1+ with the legacy flag enabled, native would emit [null] where Spark emits [], and checkSparkAnswerAndOperator wouldn't catch it since the default test config leaves the flag off. Your follow-up note about pre-3.1 compatibility looks like the natural home for this — does it make sense to fold the legacy-flag case into that same refactor, or is supporting that flag out of scope here?
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Which issue does this PR close?
Closes #1716
Rationale for this change
In Spark3.1 and above, it should be
[null]What changes are included in this PR?
Are there any user-facing changes?
How was this patch tested?
Add UT