rev-list: use merge-base --independent algorithm when possible#2082
Open
derrickstolee wants to merge 3 commits intogitgitgadget:masterfrom
Open
rev-list: use merge-base --independent algorithm when possible#2082derrickstolee wants to merge 3 commits intogitgitgadget:masterfrom
derrickstolee wants to merge 3 commits intogitgitgadget:masterfrom
Conversation
Add a test that verifies the 'git rev-list --maximal-only' option produces the same set of commits as 'git merge-base --independent'. This equivalence was noted when the feature was first created, but we are about to update the implementation to use a common algorithm in this case where the user intention is identical. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Add a performance test that compares 'git rev-list --maximal-only' against 'git merge-base --independent'. These two commands are asking essentially the same thing, but the rev-list implementation is more generic and hence slower. These performance tests will demonstrate that in the current state and also be used to show the equivalence in the future. We also add a case with '--since' to force the generic walk logic for rev-list even when we make that future change to use the merge-base algorithm on a simple walk. When run on my copy of git.git, I see these results: Test HEAD ---------------------------------------------- 6011.2: merge-base --independent 0.03 6011.3: rev-list --maximal-only 0.06 6011.4: rev-list --maximal-only --since 0.06 These numbers are low, but the --independent calculation is interesting due to having a lot of local branches that are actually independent. Running the same test on a fresh clone of the Linux kernel repository shows a larger difference between the algorithms, especially because the --independent algorithm is extremely fast when there are no independent references selected: Test HEAD ---------------------------------------------- 6011.2: merge-base --independent 0.00 6011.3: rev-list --maximal-only 0.70 6011.4: rev-list --maximal-only --since 0.70 Signed-off-by: Derrick Stolee <stolee@gmail.com>
The 'git rev-list --maximal-only' option filters the output to only independent commits. A commit is independent if it is not reachable from other listed commits. Currently this is implemented by doing a full revision walk and marking parents with CHILD_VISITED to skip non-maximal commits. The 'git merge-base --independent' command computes the same result using reduce_heads(), which uses the more efficient remove_redundant() algorithm. This is significantly faster because it avoids walking the entire commit graph. Add a fast path in rev-list that detects when --maximal-only is the only interesting option and all input commits are positive (no revision ranges). In this case, use reduce_heads() directly instead of doing a full revision walk. In order to preserve the rest of the output filtering, this computation is done opportunistically in a new prepare_maximal_independent() method when possible. If successful, it populates revs->commits with the list of independent commits and set revs->no_walk to prevent any other walk from occurring. This allows us to have any custom output be handled using the existing output code hidden inside traverse_commit_list_filtered(). A new test is added to demonstrate that this output is preserved. The fast path is only used when no other flags complicate the walk or output format: no UNINTERESTING commits, no limiting options (max-count, age filters, path filters, grep filters), no output formatting beyond plain OIDs, and no object listing flags. Running the p6011 performance test for my copy of git.git, I see the following improvement with this change: Test HEAD~1 HEAD ------------------------------------------------------------ 6011.2: merge-base --independent 0.03 0.03 +0.0% 6011.3: rev-list --maximal-only 0.06 0.03 -50.0% 6011.4: rev-list --maximal-only --since 0.06 0.06 +0.0% And for a fresh clone of the Linux kernel repository, I see: Test HEAD~1 HEAD ------------------------------------------------------------ 6011.2: merge-base --independent 0.00 0.00 = 6011.3: rev-list --maximal-only 0.70 0.00 -100.0% 6011.4: rev-list --maximal-only --since 0.70 0.70 +0.0% In both cases, the performance is indeed matching the behavior of 'git merge-base --independent', as expected. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Author
|
/submit |
|
Submitted as pull.2082.git.1775482048.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
This patch series was integrated into seen via git@41520e7. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The --maximal-only option was added to
git rev-listin b4e8f60 (revision: add --maximal-only option, 2026-01-22) and the discussion [1] included talks of how 'git rev-list --maximal-only <refs>' acts the same as 'git merge-base --independent <refs>' assuming that no other walk modifiers are provided to the revision walk. And with those assumptions, the merge-base algorithm can be faster if the refs have most of their history shared.[1] https://lore.kernel.org/git/pull.2032.v2.git.1769097958549.gitgitgadget@gmail.com/
This series updates the revision walk to use the merge-base algorithm when possible. This checks the rev_info struct for options that cause the walk to be different and also looks for negative references. If none of these appear, then the merge-base algorithm is used instead.
The series is broken into three patches that could theoretically be squashed into a single patch.
Thanks,
-Stolee
cc: gitster@pobox.com
cc: j6t@kdbg.org