Skip to content

test: add testing utilities for data generation, row conversion and r…#123

Open
lucasfang wants to merge 1 commit into
apache:mainfrom
lucasfang:migrate_9
Open

test: add testing utilities for data generation, row conversion and r…#123
lucasfang wants to merge 1 commit into
apache:mainfrom
lucasfang:migrate_9

Conversation

@lucasfang

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: No linked issue

This change adds testing utilities for data generation, row conversion, and result validation.

Included changes:

  • Testing utilities (src/paimon/testing/utils/):
    • Adds test_helper.h with comprehensive test assertion helpers and fixture support
    • Adds data_generator.h/.cpp for generating test data with configurable schemas
    • Adds data_generator_test.cpp with unit tests for data generation logic
    • Adds binary_row_generator.h for constructing binary rows in tests
    • Adds dict_array_converter.h for dictionary-encoded array conversions
    • Adds read_result_collector.h for collecting and validating read results
    • Adds key_value_checker.h for key-value pair validation in tests
    • Adds io_exception_helper.h for IO exception testing utilities
    • Updates timezone_guard.h with improved timezone handling for tests

Tests

Test coverage included in this change:

  • data_generator_test.cpp

API and Format

No public API, storage format, or protocol changes.

Documentation

No documentation changes required.

Generative AI tooling

Migrate-by: Aone Copilot (Qwen3.7-Max)

Copilot AI review requested due to automatic review settings June 25, 2026 06:33

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a set of reusable C++ testing utilities under src/paimon/testing/utils/ to simplify integration/unit tests around table writes, scans, and Arrow result validation. It adds helpers for generating rows/record batches, collecting and normalizing read results (including dictionary-array conversion), and higher-level test orchestration (create table, write/commit, scan, validate).

Changes:

  • Add TestHelper and supporting utilities to create test tables, write/commit data, scan, and validate read results.
  • Add data/row generation utilities (DataGenerator, BinaryRowGenerator) and a dictionary-array converter for stable Arrow comparisons.
  • Add unit tests for the data generation path (data_generator_test.cpp) and small test helpers (IO exception macros, key/value checking).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/paimon/testing/utils/timezone_guard.h License header formatting adjustment.
src/paimon/testing/utils/test_helper.h New high-level integration-test helper for creating tables, writing/committing, scanning, and validating results.
src/paimon/testing/utils/read_result_collector.h New utility to collect Arrow batches, apply bitmap filtering when needed, and normalize dictionary arrays for comparisons.
src/paimon/testing/utils/dict_array_converter.h New converter that deep-copies dictionary arrays into plain string/large-string arrays for comparison stability.
src/paimon/testing/utils/data_generator.h Declares the test data generator interface for partition/bucket splitting into RecordBatches.
src/paimon/testing/utils/data_generator.cpp Implements partition/bucket extraction and record-batch construction from BinaryRow inputs.
src/paimon/testing/utils/data_generator_test.cpp Unit tests covering DataGenerator behavior.
src/paimon/testing/utils/binary_row_generator.h Utility for constructing BinaryRow/InternalRow test inputs from variant-typed values.
src/paimon/testing/utils/key_value_checker.h Utility for validating KeyValue sequences in tests.
src/paimon/testing/utils/io_exception_helper.h Macros to simplify asserting IOHook-triggered error paths in tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +21 to +35
#include <map>
#include <memory>
#include <string>
#include <utility>
#include <vector>

#include "arrow/c/bridge.h"
#include "arrow/ipc/api.h"
#include "paimon/api.h"
#include "paimon/catalog/catalog.h"
#include "paimon/commit_context.h"
#include "paimon/common/data/blob_descriptor.h"
#include "paimon/common/data/blob_utils.h"
#include "paimon/common/utils/arrow/status_utils.h"
#include "paimon/common/utils/path_util.h"
Comment on lines +343 to +346
Result<std::optional<Snapshot>> LatestSnapshot() const {
auto commit_impl = dynamic_cast<FileStoreCommitImpl*>(commit_.get());
return commit_impl->snapshot_manager_->LatestSnapshotOfUser(commit_user_);
}
Comment on lines +348 to +351
Result<std::optional<std::shared_ptr<TableSchema>>> LatestSchema() const {
auto commit_impl = dynamic_cast<FileStoreCommitImpl*>(commit_.get());
return commit_impl->schema_manager_->Latest();
}
Comment on lines +353 to +356
Result<std::string> PartitionStr(const BinaryRow& partition) const {
auto abstract_write = dynamic_cast<AbstractFileStoreWrite*>(write_.get());
return abstract_write->file_store_path_factory_->GetPartitionString(partition);
}
Comment on lines +118 to +120
if (result_array_vector.empty()) {
return std::shared_ptr<arrow::ChunkedArray>();
}
Comment on lines +102 to +106
} else if (value_type_id == arrow::Type::type::LARGE_STRING &&
index_type_id == arrow::Type::type::INT64) {
return ConvertDictionaryArrayToBinaryArray<
arrow::LargeStringArray, arrow::Int64Array, arrow::StringBuilder>(
dict_array, pool);
Comment on lines +106 to +110
for (size_t field_idx = 0; field_idx < partition_fields.size(); field_idx++) {
int32_t id = partition_fields[field_idx].Id();
auto type = partition_fields[field_idx].Type();
PAIMON_RETURN_NOT_OK(WriteBinaryRow(binary_row, id, type, field_idx, &writer));
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants