[refine](column) separate mutable subcolumn mutation from read-only traversal#64905
[refine](column) separate mutable subcolumn mutation from read-only traversal#64905Mryange wants to merge 3 commits into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Review result: request changes for one test coverage gap in the core column COW refactor.
Critical checkpoint conclusions:
- Goal: the PR separates read-only subcolumn traversal from recursive COW mutation. The code mostly accomplishes that by keeping
for_each_subcolumnconst-only and routingIColumn::mutatethroughmutate_subcolumns(). - Scope/focus: the source change is small and focused on BE column wrappers, with tests added for array and nullable COW behavior.
- Concurrency/lifecycle: no new threads, locks, static initialization, persistence, transaction, or configuration behavior are introduced.
- Compatibility/protocol: no serialization format, FE/BE protocol, or function symbol compatibility change found.
- Parallel paths: array, nullable, map, struct, const, and variant wrapper paths were checked. The remaining issue is that the distinct map enumeration path lacks focused mutation coverage.
- Error handling/memory: no ignored
Statusor new memory ownership issue found in the changed code. - Tests:
git diff --checkis clean for the PR file set. Formatter/style CI is green. I could not run BE UT locally becausethirdparty/installedis absent in this runner. The current macOS BE-UT CI job fails before compiling Doris withERROR: The JAVA version is 25, it must be JDK-17, so I did not treat that as PR-caused.
User focus: no additional user-provided focus points were present.
Subagent conclusions: optimizer-rewrite reported no candidates. tests-session-config proposed TEST-1; I verified and accepted the ColumnMap portion as MAIN-1 for an inline comment. Convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same ledger and one-comment final set.
| callback(keys_column); | ||
| callback(values_column); | ||
| callback(offsets); | ||
| void mutate_subcolumns() override { |
There was a problem hiding this comment.
This COW refactor changes ColumnMap from the old callback/defer path to a class-specific mutate_subcolumns() that must enumerate all three children (keys_column, values_column, and typed offsets_column). The new tests cover array and nullable, but there is no focused test that mutates a shared/exclusive ColumnMap and verifies all three child pointers are detached or preserved correctly. A missed or wrong entry in this method would compile and leave one map child aliased across mutation, so please add a BE unit test analogous to the new array/nullable cases for ColumnMap.
|
/review |
There was a problem hiding this comment.
I found one remaining review issue: the COW refactor still lacks direct coverage for the shared-detach path on ColumnArray, and for the new ColumnConst/ColumnStruct mutation hooks. The existing ColumnMap review thread is already known context and current head has map tests, so I did not duplicate that comment.
Critical checkpoints:
- Goal/test: the refactor is focused, but the changed composite-column mutation paths are not all directly proved by tests.
- Scope: authoritative GitHub file list is limited to BE column COW code and BE column tests.
- Concurrency/lifecycle/config/compatibility/persistence/data writes/session propagation: not involved in this PR scope.
- Parallel paths: checked Array, Map, Nullable, Const, Struct, and Variant mutation paths; only the test gap below remains substantiated.
- Test/results:
git diff --checkis clean. BE tests were not run becausethirdparty/installedis missing in this checkout.
Subagent conclusions: tests-session-config proposed TEST-1, accepted as MAIN-1 and submitted below. optimizer-rewrite reported no optimizer/Nereids semantic issue. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same MAIN-1-only comment set.
| }); | ||
| callback(offsets_column); | ||
| callback(data); | ||
| void mutate_subcolumns() override { |
There was a problem hiding this comment.
This new hook still needs a shared-detach test. The added array COW test builds the array from moved mutable children and then calls IColumn::mutate(std::move(array)), so it only exercises the exclusive fast path; it would not catch a regression where the by-value/shared IColumn::mutate(array) path forgets to detach either the nested data or the typed offsets. Please add a small test analogous to the nullable/map shared tests that keeps aliases to both child columns, calls IColumn::mutate(array) without moving the source owner, and verifies the result detached both children while the aliases remain unchanged. Since this PR also adds mutate_subcolumns() hooks for ColumnConst and ColumnStruct, small direct mutate tests for those hooks would keep the COW refactor covered across all changed composite columns.
|
/review |
There was a problem hiding this comment.
Review complete. I did not find a new distinct issue that needs an additional inline comment.
Critical checkpoint conclusions:
- Goal and scope: the PR separates mutable subcolumn mutation from read-only traversal in BE column COW code. The implementation is focused on
IColumn::mutate_subcolumns()and the composite column overrides for Array, Map, Nullable, Const, Struct, and Variant. - COW correctness: the new helper preserves recursive detach semantics. Typed child pointers for offsets/null maps are restored through concrete
WrappedPtrconversions, and the remainingfor_each_subcolumncallers are read-only traversal. - Parallel paths: Array, Map, Nullable, Const, Struct, and Variant mutation paths were checked. Variant already has
clone_finalized_deep_copies_columns, which exercisesIColumn::mutate(get_ptr())and verifies subcolumns plus sparse/doc columns are copied. - Existing review context: the prior ColumnMap and Array/Const/Struct coverage comments are not duplicated. Current head adds the missing shared/exclusive COW tests for Array, Map, Const, and Struct, with Nullable covering the typed null-map path.
- Concurrency/lifecycle/config/compatibility: no new threads, locks, static lifecycle, session/config, serialization, or protocol compatibility changes are introduced by the authoritative PR file set.
- Tests and validation: the added BE unit tests cover the relevant shared and exclusive mutation paths. I ran
git diff --checkon the authoritative changed files and it was clean. I did not run BE unit tests because this runner has nothirdparty/installedand no existing BE test binaries/output build to execute. - User focus: no additional user-provided review focus was supplied.
Subagent conclusions:
optimizer-rewrite: no candidates; convergence round 1 returnedNO_NEW_VALUABLE_FINDINGS.tests-session-config: no candidates; convergence round 1 returnedNO_NEW_VALUABLE_FINDINGS.- No subagent candidate became an inline comment; the only reviewed concerns were dismissed as duplicate existing threads or with code/test evidence in the shared ledger.
|
run buildall |
TPC-H: Total hot run time: 29155 ms |
TPC-DS: Total hot run time: 170761 ms |
ClickBench: Total hot run time: 25.22 s |
What problem does this PR solve?
Column wrappers used
for_each_subcolumnfor both read-only traversal and mutable subcolumn detachment during COW mutation. This mixed two different contracts in one callback API and forced typed subcolumns such asColumnNullable::_null_map,ColumnArray::offsets, andColumnMap::offsets_columnto move through temporary baseIColumn::WrappedPtrbridges.Root cause: the mutable callback accepted
IColumn::WrappedPtr&, but several subcolumns are stored as strongly typed wrappers. Binding typed wrapper references to the base wrapper callback is unsafe, so each implementation needed ad-hoc move/defer/cast code.This PR keeps
for_each_subcolumnas a const read-only traversal API and addsmutate_subcolumns()for the COW mutation path. Commonmutate_subcolumnhelpers handle generic and strongly typed subcolumns, soColumnArray,ColumnMap,ColumnNullable,ColumnStruct, andColumnVariantcan detach children without exposing a mutable traversal callback. The added BEUT cases also verify that mutating exclusive subcolumns does not introduce an extra copy.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)