Skip to content

StatusCommand: Optimize git status by deferring getObjectId() calls#9304

Merged
mbien merged 1 commit intoapache:masterfrom
OndroMih:ondromih-git-commit-dlg-optimization
Apr 11, 2026
Merged

StatusCommand: Optimize git status by deferring getObjectId() calls#9304
mbien merged 1 commit intoapache:masterfrom
OndroMih:ondromih-git-commit-dlg-optimization

Conversation

@OndroMih
Copy link
Copy Markdown
Contributor

@OndroMih OndroMih commented Mar 28, 2026

Speeds up GitClient.getStatus() by deferring expensive evaluation of object Ids, which often compute file content hash, to evaluate them lazily only when needed.

On NetBeans repository with a lot of files, this speeds up GitClient.getStatus() execution from 4 seconds to 1 second.

PR approval and merge checklist:

  1. Was this PR correctly labeled, did the right tests run? When did they run?
  2. Is this PR squashed?
  3. Are author name / email address correct? Are co-authors correctly listed? Do the commit messages need updates?
  4. Does the PR title and description still fit after the Nth iteration? Is the description sufficient to appear in the release notes?

If this PR targets the delivery branch: don't merge. (full wiki article)

@mbien mbien added git [ci] enable versioning job ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) labels Mar 28, 2026
@apache apache locked and limited conversation to collaborators Mar 28, 2026
@apache apache unlocked this conversation Mar 28, 2026
@mbien
Copy link
Copy Markdown
Member

mbien commented Apr 1, 2026

the description makes sense to me. Unfortunately this causes several test failures which I could also reproduce locally (same module). Those would have to be investigated first before this can proceed.

@OndroMih
Copy link
Copy Markdown
Contributor Author

OndroMih commented Apr 1, 2026

@mbien , it's getting a bit more complicated :)

I wonder if the tests are really valid and it's required that files need to be present in the resulting map if they are not changed. The Commit dialog always calls the StatusCommand for a directory, not for a single file.

For now, to please the tests, and potentially some real functionality that needs to get status for a single file, I modified the solution (in the second commit) to apply only for directories (where it omits the unchanged files from the statuses map) but it keeps the file in the statuses map if called for a single file.

A better approach would be to preload file status and parallelize computing hashes with fti.isModified(indexEntry, true,...), so that it doesn't block main thread. This, however, requires a lot of refactoring, so let's try a simple approach first. I asked AI to analyze how GIT does it, and it said that it does it in a similar way - first preloads status info from filesystem in parallel, computes hashes from content if needed, and then goes through the precomputed information to build the result.

P.S. I'm adding commits, without squashing, to keep the history until find the best solution. Then I'll squash commits.

@OndroMih OndroMih force-pushed the ondromih-git-commit-dlg-optimization branch from 52ee181 to d72d934 Compare April 4, 2026 15:52
@OndroMih
Copy link
Copy Markdown
Contributor Author

OndroMih commented Apr 4, 2026

@mbien , I dug deeper and found out that there are better ways to optimize the current algorithm, without skipping any files.

I started from scratch and managed to reduce the time spent in the GitClient.getStatus() in the refreshAllRoots method by 75%, on Netbeans repo from 4s to 1s. Mostly by not executing some I/O operations, which were being executed and then their result was ignored.

This reduces the whole time it takes to compute files statuses on the Netbeans repo in the Commit dialog from about 17s to about 14s.

And I have even better news - I managed to speed up another part of the refreshAllRoots method by about 10 more seconds. I'll raise a separate PR for that because they are independent.

@OndroMih
Copy link
Copy Markdown
Contributor Author

OndroMih commented Apr 4, 2026

Here's the other PR: #9324

@mbien
Copy link
Copy Markdown
Member

mbien commented Apr 10, 2026

@OndroMih could you do a

git pull --rebase --autostash git@github.com:apache/netbeans master

so that this is getting tested on top of your other PR?

@OndroMih OndroMih force-pushed the ondromih-git-commit-dlg-optimization branch from d72d934 to 068b48c Compare April 10, 2026 18:43
@OndroMih
Copy link
Copy Markdown
Contributor Author

Done, @mbien .

@mbien mbien added this to the NB30 milestone Apr 10, 2026
Comment on lines 183 to -203
@@ -200,7 +208,6 @@
} else {
continue;
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this modification has the exact same amount of isEntryIgnored() calls like before, no?

if nWorking == TREE -> 1 call in both cases and if its not a tree -> 0 calls in both cases.

I reverted this section and left the supplier modification and got similar results (ran the action 5 times each)

reverted:
2184, 1942, 1870, 1820, 1821
PR:
2196, 1873, 1864, 1865, 1787

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I thought that this optimizes the number of calls to isEntryIgnored() but it's still being called later. To be honest, here I used some AI to help me understand what's the code doing. AI suggested this optimization and it looked OK to me, but now I see that it only shuffled the order of execution and didn't optimize anything.

I'll revert these changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

The only real thing that remains is the lazy evaluation of object ids which is the only thing that speeds the execution.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI suggested this optimization and it looked OK to me, but now I see that it only shuffled the order of execution and didn't optimize anything.

thanks for reverting. I suspected some form of LLM involvement when I saw that.

will merge later once CI is green so that this is getting in for NB 30

Speeds up GitClient.getStatus() by deferring expensive evaluation of object Ids, which often compute file content hash, to evaluate them lazily only when needed.

On Netbeans repository with a lot of files, this speeds up GitClient.getStatus() execution from 4 seconds to 1 second.
@OndroMih OndroMih force-pushed the ondromih-git-commit-dlg-optimization branch from 068b48c to d98f086 Compare April 10, 2026 23:10
@mbien mbien changed the title Optimization of Git commit dialog finding changed files StatusCommand: Optimize git status by deferring getObjectId() calls for when they are needed Apr 11, 2026
@mbien mbien changed the title StatusCommand: Optimize git status by deferring getObjectId() calls for when they are needed StatusCommand: Optimize git status by deferring getObjectId() calls Apr 11, 2026
@mbien mbien merged commit 0b30443 into apache:master Apr 11, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) git [ci] enable versioning job performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants