test: add testing utilities for data generation, row conversion and r…#123
Open
lucasfang wants to merge 1 commit into
Open
test: add testing utilities for data generation, row conversion and r…#123lucasfang wants to merge 1 commit into
lucasfang wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a set of reusable C++ testing utilities under src/paimon/testing/utils/ to simplify integration/unit tests around table writes, scans, and Arrow result validation. It adds helpers for generating rows/record batches, collecting and normalizing read results (including dictionary-array conversion), and higher-level test orchestration (create table, write/commit, scan, validate).
Changes:
- Add
TestHelperand supporting utilities to create test tables, write/commit data, scan, and validate read results. - Add data/row generation utilities (
DataGenerator,BinaryRowGenerator) and a dictionary-array converter for stable Arrow comparisons. - Add unit tests for the data generation path (
data_generator_test.cpp) and small test helpers (IO exception macros, key/value checking).
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/testing/utils/timezone_guard.h | License header formatting adjustment. |
| src/paimon/testing/utils/test_helper.h | New high-level integration-test helper for creating tables, writing/committing, scanning, and validating results. |
| src/paimon/testing/utils/read_result_collector.h | New utility to collect Arrow batches, apply bitmap filtering when needed, and normalize dictionary arrays for comparisons. |
| src/paimon/testing/utils/dict_array_converter.h | New converter that deep-copies dictionary arrays into plain string/large-string arrays for comparison stability. |
| src/paimon/testing/utils/data_generator.h | Declares the test data generator interface for partition/bucket splitting into RecordBatches. |
| src/paimon/testing/utils/data_generator.cpp | Implements partition/bucket extraction and record-batch construction from BinaryRow inputs. |
| src/paimon/testing/utils/data_generator_test.cpp | Unit tests covering DataGenerator behavior. |
| src/paimon/testing/utils/binary_row_generator.h | Utility for constructing BinaryRow/InternalRow test inputs from variant-typed values. |
| src/paimon/testing/utils/key_value_checker.h | Utility for validating KeyValue sequences in tests. |
| src/paimon/testing/utils/io_exception_helper.h | Macros to simplify asserting IOHook-triggered error paths in tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+21
to
+35
| #include <map> | ||
| #include <memory> | ||
| #include <string> | ||
| #include <utility> | ||
| #include <vector> | ||
|
|
||
| #include "arrow/c/bridge.h" | ||
| #include "arrow/ipc/api.h" | ||
| #include "paimon/api.h" | ||
| #include "paimon/catalog/catalog.h" | ||
| #include "paimon/commit_context.h" | ||
| #include "paimon/common/data/blob_descriptor.h" | ||
| #include "paimon/common/data/blob_utils.h" | ||
| #include "paimon/common/utils/arrow/status_utils.h" | ||
| #include "paimon/common/utils/path_util.h" |
Comment on lines
+343
to
+346
| Result<std::optional<Snapshot>> LatestSnapshot() const { | ||
| auto commit_impl = dynamic_cast<FileStoreCommitImpl*>(commit_.get()); | ||
| return commit_impl->snapshot_manager_->LatestSnapshotOfUser(commit_user_); | ||
| } |
Comment on lines
+348
to
+351
| Result<std::optional<std::shared_ptr<TableSchema>>> LatestSchema() const { | ||
| auto commit_impl = dynamic_cast<FileStoreCommitImpl*>(commit_.get()); | ||
| return commit_impl->schema_manager_->Latest(); | ||
| } |
Comment on lines
+353
to
+356
| Result<std::string> PartitionStr(const BinaryRow& partition) const { | ||
| auto abstract_write = dynamic_cast<AbstractFileStoreWrite*>(write_.get()); | ||
| return abstract_write->file_store_path_factory_->GetPartitionString(partition); | ||
| } |
Comment on lines
+118
to
+120
| if (result_array_vector.empty()) { | ||
| return std::shared_ptr<arrow::ChunkedArray>(); | ||
| } |
Comment on lines
+102
to
+106
| } else if (value_type_id == arrow::Type::type::LARGE_STRING && | ||
| index_type_id == arrow::Type::type::INT64) { | ||
| return ConvertDictionaryArrayToBinaryArray< | ||
| arrow::LargeStringArray, arrow::Int64Array, arrow::StringBuilder>( | ||
| dict_array, pool); |
Comment on lines
+106
to
+110
| for (size_t field_idx = 0; field_idx < partition_fields.size(); field_idx++) { | ||
| int32_t id = partition_fields[field_idx].Id(); | ||
| auto type = partition_fields[field_idx].Type(); | ||
| PAIMON_RETURN_NOT_OK(WriteBinaryRow(binary_row, id, type, field_idx, &writer)); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: No linked issue
This change adds testing utilities for data generation, row conversion, and result validation.
Included changes:
src/paimon/testing/utils/):test_helper.hwith comprehensive test assertion helpers and fixture supportdata_generator.h/.cppfor generating test data with configurable schemasdata_generator_test.cppwith unit tests for data generation logicbinary_row_generator.hfor constructing binary rows in testsdict_array_converter.hfor dictionary-encoded array conversionsread_result_collector.hfor collecting and validating read resultskey_value_checker.hfor key-value pair validation in testsio_exception_helper.hfor IO exception testing utilitiestimezone_guard.hwith improved timezone handling for testsTests
Test coverage included in this change:
data_generator_test.cppAPI and Format
No public API, storage format, or protocol changes.
Documentation
No documentation changes required.
Generative AI tooling
Migrate-by: Aone Copilot (Qwen3.7-Max)