[Python] Add metadata index cache for TsFileDataFrame by ColinLeeo · Pull Request #857 · apache/tsfile

ColinLeeo · 2026-07-02T11:57:55Z

Persist each shard's MetadataCatalog to a fixed-name index file in the dataset directory so repeated loads skip the expensive native metadata walk. A new use_cache flag (default True) enables it only when a single directory is passed; single-file and list inputs are unchanged.

The cache is binary: a pickled sidecar for the small table/device tables plus one numpy int64 structured array per shard for the bulk series stats. Writes are atomic (temp + os.replace); load falls back to a fresh build on a bad magic/version or a changed file set. Source files are not validated, per design.

Persist each shard's MetadataCatalog to a fixed-name index file in the dataset directory so repeated loads skip the expensive native metadata walk. A new use_cache flag (default True) enables it only when a single directory is passed; single-file and list inputs are unchanged. The cache is binary: a pickled sidecar for the small table/device tables plus one numpy int64 structured array per shard for the bulk series stats. Writes are atomic (temp + os.replace); load falls back to a fresh build on a bad magic/version or a changed file set. Source files are not validated, per design.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Python] Add metadata index cache for TsFileDataFrame#857

[Python] Add metadata index cache for TsFileDataFrame#857
ColinLeeo wants to merge 1 commit into
apache:developfrom
ColinLeeo:opt_dataframe

ColinLeeo commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ColinLeeo commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant