Skip to content

feat(append): add append-only writer, compact task and coordinator#112

Open
lucasfang wants to merge 1 commit into
apache:mainfrom
lucasfang:migrate_2
Open

feat(append): add append-only writer, compact task and coordinator#112
lucasfang wants to merge 1 commit into
apache:mainfrom
lucasfang:migrate_2

Conversation

@lucasfang

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: No linked issue

This change adds append-only table write and compaction coordination support.

Included changes:

  • Public API (include/paimon/append/):

    • Adds AppendCompactCoordinator header for coordinating append-only table compaction tasks.
  • Append Module (src/paimon/append/):

    • Adds AppendOnlyWriter for writing data to append-only tables with automatic flush and compaction triggering.
    • Adds AppendCompactTask representing a single compaction task for append-only tables.
    • Adds AppendCompactCoordinator implementation for managing and scheduling compaction tasks.
    • Adds test coverage in AppendOnlyWriterTest (710 lines) validating writer behavior, flush logic, and compaction integration.

Tests

Not run. Local compile, CMake, and gtest environment checks are not part of this PR description.

Test coverage included in this change:

  • AppendOnlyWriterTest

API and Format

This change adds public API in include/paimon/append/append_compact_coordinator.h.

No storage format or protocol changes.

Documentation

No documentation changes required.

Generative AI tooling

Migrate-by: Aone Copilot (Qwen3.7-Max)

Copilot AI review requested due to automatic review settings June 25, 2026 03:09

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds append-only write and compaction coordination functionality to the C++ Paimon implementation, introducing a new writer for append-only tables plus a coordinator/task abstraction for scheduling compaction based on small-file scanning.

Changes:

  • Introduces AppendOnlyWriter (implementation + unit tests) to write append-only batches, flush to files, and integrate with compaction triggering/result draining.
  • Adds AppendCompactTask to represent a single append-only compaction rewrite and produce a CommitMessage.
  • Adds a public AppendCompactCoordinator API plus implementation that scans manifests for small files and groups them into compaction tasks.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/paimon/append/append_only_writer.h Declares AppendOnlyWriter (BatchWriter implementation) and its compaction/flush/commit interfaces.
src/paimon/append/append_only_writer.cpp Implements append-only write/flush behavior, compaction triggering, and compaction result draining.
src/paimon/append/append_only_writer_test.cpp Adds unit tests covering writer behavior, commit preparation, compaction sync, and close cleanup.
src/paimon/append/append_compact_task.h Declares AppendCompactTask abstraction for a single compaction rewrite.
src/paimon/append/append_compact_task.cpp Implements compaction rewrite via AppendOnlyFileStoreWrite::CompactRewrite and returns a commit message.
src/paimon/append/append_compact_coordinator.cpp Implements coordinator logic: scan small files from latest snapshot, bin-pack into tasks, execute synchronously.
include/paimon/append/append_compact_coordinator.h Public API surface for running append-only compaction coordination.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* limitations under the License.
*/

#include "paimon/core/append/append_only_writer.h"
* limitations under the License.
*/

#include "paimon/core/append/append_only_writer.h"
* limitations under the License.
*/

#include "paimon/core/append/append_compact_task.h"
#include "paimon/common/data/binary_row.h"
#include "paimon/common/types/data_field.h"
#include "paimon/common/utils/linked_hash_map.h"
#include "paimon/core/append/append_compact_task.h"
Comment on lines +176 to +187
::ArrowSchema arrow_schema;
ScopeGuard guard([&arrow_schema]() { ArrowSchemaRelease(&arrow_schema); });
PAIMON_RETURN_NOT_OK_FROM_ARROW(arrow::ExportSchema(*schema, &arrow_schema));
auto format = options_.GetFileFormat();
PAIMON_ASSIGN_OR_RAISE(
std::shared_ptr<WriterBuilder> writer_builder,
format->CreateWriterBuilder(&arrow_schema, options_.GetWriteBatchSize()));
writer_builder->WithMemoryPool(memory_pool_);

PAIMON_RETURN_NOT_OK_FROM_ARROW(arrow::ExportSchema(*schema, &arrow_schema));
PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<FormatStatsExtractor> stats_extractor,
format->CreateStatsExtractor(&arrow_schema));
Comment on lines +231 to +242
::ArrowSchema arrow_schema;
ScopeGuard guard([&arrow_schema]() { ArrowSchemaRelease(&arrow_schema); });
PAIMON_RETURN_NOT_OK_FROM_ARROW(arrow::ExportSchema(*single_field_schema, &arrow_schema));
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<FileFormat> format,
FileFormatFactory::Get("blob", options_.ToMap()));
PAIMON_ASSIGN_OR_RAISE(
std::shared_ptr<WriterBuilder> writer_builder,
format->CreateWriterBuilder(&arrow_schema, options_.GetWriteBatchSize()));
writer_builder->WithMemoryPool(memory_pool_);
PAIMON_RETURN_NOT_OK_FROM_ARROW(arrow::ExportSchema(*single_field_schema, &arrow_schema));
PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<FormatStatsExtractor> stats_extractor,
format->CreateStatsExtractor(&arrow_schema));
Comment on lines +93 to +97
/// Pack small files into compaction groups using a bin-packing algorithm.
/// Files are sorted by size ascending, then greedily packed into bins.
/// A bin is flushed when its total size >= targetFileSize * 2 (and has > 1 file),
/// or when it has >= minFileNum files.
std::vector<std::vector<std::shared_ptr<DataFileMeta>>> PackFiles(
#include <string>
#include <vector>

#include "paimon/result.h"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants