-
Notifications
You must be signed in to change notification settings - Fork 779
indexer CPU usage increased by factor 5 after 1.5 to 1.7 upgrade #3585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What kind of repositories are indexed ? How many of them ? It would be helpful if you could extract the partial times from indexer logs (entries with This is very likely related to the conversion of Git handling to JGit. |
Also, getting the stack traces of the indexer using |
Actually, another candidate could be the XML serialization restrictions. |
Sorry for not giving more information. We index 223 Git repositories with a variety of complexity. Here are some log entries:
For Also we run the Docker container with |
Here's the
|
I think Git bisect is in order. |
Instead of bisecting I ran the indexer for individual releases first: #!/bin/bash
set -e
git config --local advice.detachedHead false
git checkout -f -q master
git tag --sort="version:refname" | grep '^1\.[5-7]' | while read tag; do
echo -n "\"$tag\" "
git checkout -f -q "$tag"
# Maven messes with stdin, corrupting the while cycle, so point it to the right place.
mvn -q -DskipTests=true -Dcheckstyle.skip clean package </dev/null >/dev/null 2>&1
rm -rf /var/opengrok/data.openssl
/usr/bin/time -o /dev/stdout --format '%e' java -Xmx8g -classpath 'distribution/target/dist/*' -Djava.util.logging.config.file=logging.properties-SEVERE \
org.opengrok.indexer.index.Indexer -s /var/opengrok/src.openssl -d /var/opengrok/data.openssl -c /usr/local/bin/ctags -H -S -P \
2>/dev/null
git checkout -f -q master
done Not sure how I missed this in #3589 (comment) however there seems to be cca 3x slowdown in 1.7.4 when reindexing from scratch. The 1.7.4 release contained mainly the per partes history changes plus also some JGit finishing touches. It seems that the performance regression is caused by the former (now bisect would help to drill down). It seems that In 1.7.11 some of the performance was brought back by the parallelization of history cache creation for individual files in PR #3636. That would help only if there is enough CPU cores available, though. Anyhow, I think the regression would only be seen when reindexing from scratch or when reindexing repository with large amount of incoming changesets. |
Experimenting with
This confirms that the per partes changes indeed caused the long indexing times. Need to figure out the sweet spot for large repositories (like Linux that has more than one million of changesets) w.r.t. speed and heap. Also, the count should probably be configurable. |
Describe the bug
I recently upgraded our OpenGrok instance by simply changing the tag of the Docker image from 1.5.12 to 1.7.3. Since then the duration of the periodic reindex rose from 3,5 minutes to 17 minutes. I was aware that the 1.7.0 release mentioned
Since the subsequent reindex times aren't normal I guess this is a bug. This is how the CPU usage looks in Grafana since the upgrade:

To Reproduce
Steps to reproduce the behavior:
Expected behavior
Continue normal operation
The text was updated successfully, but these errors were encountered: