commit-reach: terminate merge-base walk when one side is exhausted#2149
Open
spkrka wants to merge 7 commits into
Open
commit-reach: terminate merge-base walk when one side is exhausted#2149spkrka wants to merge 7 commits into
spkrka wants to merge 7 commits into
Conversation
7d5b1bb to
3e1315e
Compare
Author
|
/preview |
|
Preview email sent as pull.2149.git.1781946989.gitgitgadget@gmail.com |
Author
|
/submit |
|
Submitted as pull.2149.git.1781951820.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
This patch series was integrated into seen via git@418052d. |
Author
|
/submit |
|
Submitted as pull.2149.v2.git.1782303254.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
This branch is now known as |
Add a technical document describing the paint_down_to_common() algorithm used for merge-base computation, covering the paint walk, generation number regions, and termination conditions. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add test cases to t6600-test-reach.sh that exercise edge cases in the side-exhaustion optimization for paint_down_to_common(): - in_merge_bases_many:self: commit is both A and one of the X inputs - get_merge_bases_many:duplicate-twos: duplicate entries in X list - get_merge_bases_many:pending-stale: STALE transition on an already-painted commit (ps-* diamond topology) - get_merge_bases_many:infinity-both-sides: both tips outside the commit-graph with non-monotonic dates (pi-* topology) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add t6099 to test the case where multiple merge-base candidates exist and one is an ancestor of another. This exercises the side-exhaustion optimization in paint_down_to_common together with the remove_redundant safety net in get_merge_bases_many_0. Add a mixed finite/INFINITY test to t6600 where one tip is outside the commit-graph (INFINITY generation) and the other is inside. This exercises the region transition: the walk starts in the INFINITY region where side-exhaustion is disabled, then crosses into the finite region where it can fire. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add a step counter and trace2_data_intmax() call so that the number of commits visited during the paint walk is observable via GIT_TRACE2_EVENT. This provides a way to measure the impact of future optimizations without relying on wall-clock benchmarks alone. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add a paint_state struct for use by paint_down_to_common() that wraps a prio_queue with per-side commit counters. Each non-stale queued commit occupies exactly one counter bucket based on its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending merge-base candidate). The counters are maintained by paint_count_update() which adjusts the appropriate bucket by a signed delta. An exhaustive switch on the paint+stale bits documents all valid flag combinations in one place. Convert paint_down_to_common() to use paint_state. The loop now drains the queue via paint_queue_get() which returns NULL when all counters reach zero, replacing the old pointer-based termination (max_nonstale). This is equivalent behavior -- both conditions detect that no non-stale entries remain. paint_queue_get() uses a "pop first" form: it dequeues a commit, then checks the counters. This means the loop exits one iteration earlier than the old code in some topologies (the popped stale commit is never processed), so a few step counts drop by one. The existing nonstale_queue is left in place for ahead_behind(). Signed-off-by: Kristofer Karlsson <krka@spotify.com>
nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became unused after the previous commit. The core nonstale_queue functions remain in use by ahead_behind(). Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add an early termination check to paint_down_to_common() using the
per-side counters introduced earlier. Once the walk enters the
finite-generation region, terminate early when one side's exclusive
count drops to zero -- no new merge-base can form without both paint
sides meeting.
The check also waits for pending_merge_bases to reach zero, ensuring
all merge-base candidates have been dequeued and recorded before
exiting.
The INFINITY gate ensures correctness: commits without a commit-graph
entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
which is not topologically reliable. The optimization only fires
once the walk enters the finite-generation region where ordering
guarantees hold.
Step counts measured with trace2 on git.git with commit-graph:
merge-base --all v2.0.0 v2.55.0-rc1:
before: 72264 steps after: 44589 steps
merge-base --all v2.55.0-rc1 v2.55.0-rc1~5:
before: 110 steps after: 7 steps
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
commit-reach: terminate merge-base walk when one paint side is exhausted
Optimize paint_down_to_common() for merge-base queries that hit
large one-sided histories.
When the walk from one side reaches a commit with a very low
generation number that the other side never paints, the walk is
forced to drain most of the graph. A common trigger is a
repository import that grafts a separate history with its own root,
but any merge that introduces a low-generation commit never painted
by the other side has the same effect.
A new merge-base candidate can only be discovered when exclusive
PARENT1 and PARENT2 paint meet. This series teaches
paint_down_to_common() to stop as soon as one side has no exclusive
commits left in the queue; once one side is exhausted, no further
candidates can appear.
In the RFC thread [1], Derrick Stolee provided a criss-cross
counterexample that sharpened the halt condition, and Elijah Newren
independently discovered the same optimization and shared an
implementation in PR #2150 [2]. Patches 2-3 incorporate test
cases from Elijah's branch.
This series implements the optimization only after the walk enters
the finite-generation region, where generation ordering guarantees
that paint on visited commits is final.
Patch layout:
1/7 Documentation/technical: add paint-down-to-common doc
2/7 t6600: add test cases for side-exhaustion edge cases
3/7 t6099, t6600: add side-exhaustion regression tests
4/7 commit-reach: add trace2 instrumentation to paint_down_to_common()
5/7 commit-reach: introduce struct paint_state with per-side counters
6/7 commit-reach: remove unused nonstale_queue dedup wrappers
7/7 commit-reach: terminate merge-base walk when one paint side is exhausted
Benchmarks
Step counts are deterministic (measured via trace2_data_intmax added
in patch 4). Wall-clock times are best-of-11 runs.
2.6M-commit monorepo with commit-graph:
git.git (88k commits, commit-graph):
Changes since v2:
Moved all halt conditions inside paint_queue_get() as Stolee
suggested, with the "pop first" form: pop, check, then
decrement counters. This keeps the optimization commit's
diff minimal (just inserting the new checks between pop and
decrement).
Shortened the doc comment on paint_queue_get() to describe
what it does rather than how. Inline comments on each
return NULL explain the specific halt condition.
Replaced the manual commit-graph setup in the step-count test
with run_all_modes, which now sets GIT_TRACE2_EVENT per mode
and produces trace-mode-{none,full,half,no-gdat}.txt files.
Added a test_paint_down_steps helper for concise 4-mode step
assertions with diagnostic output on mismatch (prints
"expected X, got Y" instead of a silent grep failure).
Added step-count assertions to the single-walk edge-case
tests: in_merge_bases_many:self, pending-stale,
infinity-both-sides, mixed-finite-infinity.
Included step counts alongside wall-clock times in the
benchmark tables.
Changes since v1:
Reordered patches: documentation first (describing the existing
algorithm), tests before code changes, so they demonstrate
passing with old logic first.
Dropped the ahead_behind decoupling patch. paint_state is now a
NEW struct alongside nonstale_queue instead of replacing it.
ahead_behind() is completely untouched.
Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup()
(dead code after the conversion) in a separate commit.
Renamed: struct paint_queue -> paint_state, field pq -> queue,
paint_count_add/remove -> paint_count_update (single function
with signed delta parameter).
Split the old paint_count_transition (which handled both old
and new flags in one call) into separate remove/add calls with
a signed delta. This eliminates the need for the case 0
handler (which tracked "not in the queue") and allows an
exhaustive switch on (PARENT1 | PARENT2 | STALE) that
documents all valid flag combinations, with BUG() in default.
Added trace2_data_intmax() instrumentation to report the number
of commits visited per paint walk (separate commit), with
deterministic step-count assertions in t6600.
Expanded switch statements to multi-line format per .clang-format.
Used !count style throughout instead of count == 0.
Updated technical documentation alongside code changes.
[1] https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u
[2] #2150
CC: Derrick Stolee stolee@gmail.com
CC: Elijah Newren newren@gmail.com