feat(append): add append-only writer, compact task and coordinator#112
Open
lucasfang wants to merge 1 commit into
Open
feat(append): add append-only writer, compact task and coordinator#112lucasfang wants to merge 1 commit into
lucasfang wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds append-only write and compaction coordination functionality to the C++ Paimon implementation, introducing a new writer for append-only tables plus a coordinator/task abstraction for scheduling compaction based on small-file scanning.
Changes:
- Introduces
AppendOnlyWriter(implementation + unit tests) to write append-only batches, flush to files, and integrate with compaction triggering/result draining. - Adds
AppendCompactTaskto represent a single append-only compaction rewrite and produce aCommitMessage. - Adds a public
AppendCompactCoordinatorAPI plus implementation that scans manifests for small files and groups them into compaction tasks.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/append/append_only_writer.h | Declares AppendOnlyWriter (BatchWriter implementation) and its compaction/flush/commit interfaces. |
| src/paimon/append/append_only_writer.cpp | Implements append-only write/flush behavior, compaction triggering, and compaction result draining. |
| src/paimon/append/append_only_writer_test.cpp | Adds unit tests covering writer behavior, commit preparation, compaction sync, and close cleanup. |
| src/paimon/append/append_compact_task.h | Declares AppendCompactTask abstraction for a single compaction rewrite. |
| src/paimon/append/append_compact_task.cpp | Implements compaction rewrite via AppendOnlyFileStoreWrite::CompactRewrite and returns a commit message. |
| src/paimon/append/append_compact_coordinator.cpp | Implements coordinator logic: scan small files from latest snapshot, bin-pack into tasks, execute synchronously. |
| include/paimon/append/append_compact_coordinator.h | Public API surface for running append-only compaction coordination. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * limitations under the License. | ||
| */ | ||
|
|
||
| #include "paimon/core/append/append_only_writer.h" |
| * limitations under the License. | ||
| */ | ||
|
|
||
| #include "paimon/core/append/append_only_writer.h" |
| * limitations under the License. | ||
| */ | ||
|
|
||
| #include "paimon/core/append/append_compact_task.h" |
| #include "paimon/common/data/binary_row.h" | ||
| #include "paimon/common/types/data_field.h" | ||
| #include "paimon/common/utils/linked_hash_map.h" | ||
| #include "paimon/core/append/append_compact_task.h" |
Comment on lines
+176
to
+187
| ::ArrowSchema arrow_schema; | ||
| ScopeGuard guard([&arrow_schema]() { ArrowSchemaRelease(&arrow_schema); }); | ||
| PAIMON_RETURN_NOT_OK_FROM_ARROW(arrow::ExportSchema(*schema, &arrow_schema)); | ||
| auto format = options_.GetFileFormat(); | ||
| PAIMON_ASSIGN_OR_RAISE( | ||
| std::shared_ptr<WriterBuilder> writer_builder, | ||
| format->CreateWriterBuilder(&arrow_schema, options_.GetWriteBatchSize())); | ||
| writer_builder->WithMemoryPool(memory_pool_); | ||
|
|
||
| PAIMON_RETURN_NOT_OK_FROM_ARROW(arrow::ExportSchema(*schema, &arrow_schema)); | ||
| PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<FormatStatsExtractor> stats_extractor, | ||
| format->CreateStatsExtractor(&arrow_schema)); |
Comment on lines
+231
to
+242
| ::ArrowSchema arrow_schema; | ||
| ScopeGuard guard([&arrow_schema]() { ArrowSchemaRelease(&arrow_schema); }); | ||
| PAIMON_RETURN_NOT_OK_FROM_ARROW(arrow::ExportSchema(*single_field_schema, &arrow_schema)); | ||
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<FileFormat> format, | ||
| FileFormatFactory::Get("blob", options_.ToMap())); | ||
| PAIMON_ASSIGN_OR_RAISE( | ||
| std::shared_ptr<WriterBuilder> writer_builder, | ||
| format->CreateWriterBuilder(&arrow_schema, options_.GetWriteBatchSize())); | ||
| writer_builder->WithMemoryPool(memory_pool_); | ||
| PAIMON_RETURN_NOT_OK_FROM_ARROW(arrow::ExportSchema(*single_field_schema, &arrow_schema)); | ||
| PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<FormatStatsExtractor> stats_extractor, | ||
| format->CreateStatsExtractor(&arrow_schema)); |
Comment on lines
+93
to
+97
| /// Pack small files into compaction groups using a bin-packing algorithm. | ||
| /// Files are sorted by size ascending, then greedily packed into bins. | ||
| /// A bin is flushed when its total size >= targetFileSize * 2 (and has > 1 file), | ||
| /// or when it has >= minFileNum files. | ||
| std::vector<std::vector<std::shared_ptr<DataFileMeta>>> PackFiles( |
| #include <string> | ||
| #include <vector> | ||
|
|
||
| #include "paimon/result.h" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: No linked issue
This change adds append-only table write and compaction coordination support.
Included changes:
Public API (
include/paimon/append/):AppendCompactCoordinatorheader for coordinating append-only table compaction tasks.Append Module (
src/paimon/append/):AppendOnlyWriterfor writing data to append-only tables with automatic flush and compaction triggering.AppendCompactTaskrepresenting a single compaction task for append-only tables.AppendCompactCoordinatorimplementation for managing and scheduling compaction tasks.AppendOnlyWriterTest(710 lines) validating writer behavior, flush logic, and compaction integration.Tests
Not run. Local compile, CMake, and gtest environment checks are not part of this PR description.
Test coverage included in this change:
AppendOnlyWriterTestAPI and Format
This change adds public API in
include/paimon/append/append_compact_coordinator.h.No storage format or protocol changes.
Documentation
No documentation changes required.
Generative AI tooling
Migrate-by: Aone Copilot (Qwen3.7-Max)