Skip to content

[AURON #1716] fix string cast array missing null value#1750

Open
cxzl25 wants to merge 3 commits into
apache:masterfrom
cxzl25:auron_1716
Open

[AURON #1716] fix string cast array missing null value#1750
cxzl25 wants to merge 3 commits into
apache:masterfrom
cxzl25:auron_1716

Conversation

@cxzl25

@cxzl25 cxzl25 commented Dec 12, 2025

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #1716

Rationale for this change

set spark.auron.enable=true;
select cast(array(c1) as string ) from test_t1 ;
[]

In Spark3.1 and above, it should be [null]

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

Add UT

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in array-to-string casting where null values within arrays were not being properly rendered. Previously, cast(array(c1) as string) would return [] for an array containing a null element, but in Spark 3.1+, it should return [null]. The implementation adds a custom Spark-compatible cast operation for List to Utf8 types.

Key Changes:

  • Added custom array-to-string cast implementation in Rust that properly handles null elements
  • Added comprehensive test coverage with 5 integration tests and 5 unit tests
  • Implementation follows existing Spark-compatible patterns for complex type casting

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
native-engine/datafusion-ext-commons/src/arrow/cast.rs Implements array-to-string casting with proper null handling, mirroring the existing struct-to-string pattern. Includes 5 unit tests covering edge cases (nulls, empty arrays, nested arrays, string arrays).
spark-extension-shims-spark/src/test/scala/org.apache.auron/AuronQuerySuite.scala Adds 5 integration tests to verify Spark compatibility for various array casting scenarios (basic arrays, nested arrays, arrays with nulls, string arrays, empty arrays).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@richox

richox commented Dec 26, 2025

Copy link
Copy Markdown
Contributor

we should think about a better way to handle inconsistent behaviors of different spark versions. for example we can expose @sparkver to rust code?

@cxzl25

cxzl25 commented Dec 26, 2025

Copy link
Copy Markdown
Contributor Author

we should think about a better way to handle inconsistent behaviors of different spark versions. for example we can expose @sparkver to rust code?

After this PR merge, I plan to refactor array / map / struct cast to string to be compatible with versions less than Spark3.1.0

@yew1eb

yew1eb commented Jan 5, 2026

Copy link
Copy Markdown
Contributor

Hi @richox @cxzl25
I'd like to help review Pull Requests in this repo. Could you please add me as a collaborator with write access so I can approve reviews?
Thanks!

@cxzl25 cxzl25 force-pushed the auron_1716 branch 3 times, most recently from 27a9f07 to 090ef6b Compare April 7, 2026 10:10

@weiqingy weiqingy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on — the array arm is a clean, faithful mirror of the existing struct/map → string branches, and pairing the Rust unit tests with checkSparkAnswerAndOperator integration tests gives real native-vs-Spark coverage of the null cases. One optional thread inline.

let num_elements = end - start;

if num_elements > 0 {
if values.is_null(start) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new arm renders null elements as null unconditionally — exactly Spark 3.1+ non-legacy behavior, and consistent with what the struct and map arms already do, so this isn't new divergence. The one gap worth flagging: Spark gates this on !legacyCastToStr (spark.sql.legacy.castComplexTypesToString.enabled), while the integration tests guard only on isSparkV31OrGreater, not on that flag. So on Spark 3.1+ with the legacy flag enabled, native would emit [null] where Spark emits [], and checkSparkAnswerAndOperator wouldn't catch it since the default test config leaves the flag off. Your follow-up note about pre-3.1 compatibility looks like the natural home for this — does it make sense to fold the legacy-flag case into that same refactor, or is supporting that flag out of scope here?

Copilot AI review requested due to automatic review settings July 1, 2026 03:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread native-engine/datafusion-ext-commons/src/arrow/cast.rs Outdated
Comment thread native-engine/datafusion-ext-commons/src/arrow/cast.rs
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings July 1, 2026 06:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings July 1, 2026 06:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

string cast array missing null value

6 participants