Skip to content

Indexer hangs but no error message #3067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Z4rc opened this issue Mar 9, 2020 · 20 comments
Closed

Indexer hangs but no error message #3067

Z4rc opened this issue Mar 9, 2020 · 20 comments
Labels

Comments

@Z4rc
Copy link

Z4rc commented Mar 9, 2020

Hi,

I am using OpenGrok 1.3.7 and Tomcat 9.0.31 on CentOS 7.6.
After the upgrade from Tomcat 9.0.2 to 9.0.31 the OpenGrok Indexer hangs, no matter which source I try to index. But I do not get an error message:

Can't set level for java.util.logging.ConsoleHandler
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.configuration.Configuration read
INFO: Reading configuration from /grok/etc/read-only.xml
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.index.Indexer parseOptions
INFO: Indexer options: [-P, -H, -S, --depth, 2, -T, 48, --progress, --renamedHistory, on, -r, on, --canonicalRoot, tools/, -s, /grok/source, -d, /grok/data, -c, /grok/FOSS/ctags, -W, /grok/etc/configuration.xml, -R, /grok/etc/read-only.xml, -U, http://lab3:8080/grok]
Mar 09, 2020 3:20:28 PM org.glassfish.jersey.internal.ServiceFinder <clinit>
CONFIG: Running in a non-OSGi environment
Mar 09, 2020 3:20:28 PM org.glassfish.jersey.internal.util.ExtendedLogger config
CONFIG: Selected ExecutorServiceProvider implementation [org.glassfish.jersey.client.DefaultClientAsyncExecutorProvider] to be used for injection of executor qualified by [org.glassfish.jersey.client.ClientAsyncExecutor] annotation.
Mar 09, 2020 3:20:28 PM org.glassfish.jersey.internal.util.ExtendedLogger config
CONFIG: Selected ScheduledExecutorServiceProvider implementation [org.glassfish.jersey.client.DefaultClientBackgroundSchedulerProvider] to be used for injection of scheduler qualified by [org.glassfish.jersey.client.ClientBackgroundScheduler] annotation.
Mar 09, 2020 3:20:28 PM org.glassfish.jersey.client.internal.HttpUrlConnector <init>
CONFIG: Restricted headers are not enabled using [sun.net.http.allowRestrictedHeaders] system property (setting only takes effect on connections created after the property has been set/changed).
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.index.Indexer main
SEVERE: Couldn't notify the webapp on http://lab3:8080/grok.
javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)
	at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:260)
	at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:254)
	at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$1(JerseyInvocation.java:743)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:205)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390)
	at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:741)
	at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:404)
	at org.glassfish.jersey.client.JerseyInvocation$Builder.get(JerseyInvocation.java:300)
	at org.opengrok.indexer.index.IndexerUtil.enableProjects(IndexerUtil.java:60)
	at org.opengrok.indexer.index.Indexer.main(Indexer.java:327)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
	at org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:366)
	at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:258)
	... 11 more

Couldn't notify the webapp on http://lab3:8080/grok: java.net.ConnectException: Connection refused (Connection refused).
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.index.Indexer main
INFO: Indexer version 1.3.7 (6eb8dd46b4f2d0fd90d9aee05e6a8b5f676279f1)
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.util.CtagsUtil validate
INFO: Using ctags: Universal Ctags 0.0.0(f43463a), Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Oct  9 2019, 12:53:45
  URL: https://ctags.io/
  Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +packcc
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.index.Indexer prepareIndexer
INFO: Scanning for repositories...
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.history.HistoryGuru addRepositories
CONFIG: Adding <org.opengrok.indexer.history.GitRepository> repository: </gitworkspace/tools>
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.util.Statistics report
INFO: Done scanning for repositories, found 1 repositories (took 51 ms)
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.index.Indexer prepareIndexer
INFO: Generating history cache for all repositories ...
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.history.HistoryGuru createCacheReal
INFO: Creating historycache for 1 repositories
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.history.HistoryGuru createCache
INFO: Creating historycache for /grok/source/tools (GitRepository) with renamed file handling
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.util.Statistics report
INFO: Done historycache for /grok/source/tools (took 10 ms)
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.util.Statistics report
INFO: Done historycache for all repositories (took 15 ms)
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.index.Indexer prepareIndexer
INFO: Done...
Mar 09, 2020 3:20:28 PM org.opengrok.indexer.index.Indexer doIndexerExecution
INFO: Starting indexing
Mar 09, 2020 3:20:29 PM org.opengrok.indexer.index.Indexer doIndexerExecution
INFO: Waiting for the executors to finish
Mar 09, 2020 3:20:29 PM org.opengrok.indexer.history.HistoryGuru createCache
INFO: Creating historycache for /grok/source/tools (GitRepository) with renamed file handling
Mar 09, 2020 3:20:29 PM org.opengrok.indexer.util.Statistics report
INFO: Done historycache for /grok/source/tools (took 9 ms)
Mar 09, 2020 3:20:29 PM org.opengrok.indexer.index.IndexDatabase update
INFO: Starting traversal of directory /tools
Mar 09, 2020 3:20:29 PM org.opengrok.indexer.util.Statistics report
INFO: Need to process: 3024 files for /tools (took 528 ms)
Mar 09, 2020 3:20:29 PM org.opengrok.indexer.index.IndexDatabase update
INFO: Starting indexing of directory /tools
Mar 09, 2020 3:20:30 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 1 (0.033%) for /tools
Mar 09, 2020 3:20:30 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 200 (6.614%) for /tools
Mar 09, 2020 3:20:30 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 300 (9.921%) for /tools
Mar 09, 2020 3:20:31 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 400 (13.228%) for /tools
Mar 09, 2020 3:20:31 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 500 (16.534%) for /tools
Mar 09, 2020 3:20:32 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 600 (19.841%) for /tools
Mar 09, 2020 3:20:33 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 700 (23.148%) for /tools
Mar 09, 2020 3:20:33 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 800 (26.455%) for /tools
Mar 09, 2020 3:20:34 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 1,100 (36.376%) for /tools
Mar 09, 2020 3:20:34 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 1,200 (39.683%) for /tools
Mar 09, 2020 3:20:34 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 1,300 (42.989%) for /tools
Mar 09, 2020 3:20:35 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 1,400 (46.296%) for /tools
Mar 09, 2020 3:20:36 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 1,600 (52.91%) for /tools
Mar 09, 2020 3:20:36 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 1,700 (56.217%) for /tools
Mar 09, 2020 3:20:36 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 1,900 (62.831%) for /tools
Mar 09, 2020 3:20:37 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 2,000 (66.138%) for /tools
Mar 09, 2020 3:20:38 PM org.opengrok.indexer.util.Progress logLoop
INFO: Progress: 2,100 (69.444%) for /tools

Since I saw the problem after switching to the new Tomcat version, I stopped the Tomcat server, but the Indexer still hangs.

When I strace the Indexer process, I see an endless loop running, e.g.:

[pid  1831] futex(0x7fea202def28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  1831] futex(0x7fea202def54, FUTEX_WAIT_BITSET_PRIVATE, 1, {1559237, 145760106}, ffffffff <unfinished ...>
[pid  1831] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid  1831] futex(0x7fea202def28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  1831] futex(0x7fea202def54, FUTEX_WAIT_BITSET_PRIVATE, 1, {1559237, 195894179}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  1831] futex(0x7fea202def28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  1831] futex(0x7fea202def54, FUTEX_WAIT_BITSET_PRIVATE, 1, {1559237, 246028007}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  1831] futex(0x7fea202def28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  1831] futex(0x7fea202def54, FUTEX_WAIT_BITSET_PRIVATE, 1, {1559237, 296156552}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  1831] futex(0x7fea202def28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  1831] futex(0x7fea202def54, FUTEX_WAIT_BITSET_PRIVATE, 1, {1559237, 346283802}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  1831] futex(0x7fea202def28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  1831] futex(0x7fea202def54, FUTEX_WAIT_BITSET_PRIVATE, 1, {1559237, 396413742}, ffffffff <unfinished ...>
[pid  1831] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid  1831] futex(0x7fea202def28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  1831] futex(0x7fea202def54, FUTEX_WAIT_BITSET_PRIVATE, 1, {1559237, 446540146}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  1831] futex(0x7fea202def28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  1831] futex(0x7fea202def54, FUTEX_WAIT_BITSET_PRIVATE, 1, {1559237, 496667410}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

Do you have any idea what's going on?

@vladak vladak added the question label Mar 9, 2020
@vladak
Copy link
Member

vladak commented Mar 9, 2020

I see quite a steady progress in the log. Does it mean there is nothing printed to the log after the last line ? If so, for how long ?

What happens if you omit the -U option ?

@vladak
Copy link
Member

vladak commented Mar 9, 2020

Also, you can try increasing the log level: https://github.com/oracle/opengrok/wiki/Debugging#web-application

@Z4rc
Copy link
Author

Z4rc commented Mar 10, 2020

"Does it mean there is nothing printed to the log after the last line ? If so, for how long ?"
Exactly. There is nothing printed after the last line. After ~ 6 hours I cancelled the Indexer.

Log Level is already set to "ALL" but the output also stops here without any error, e.g. the last lines:

<record>
  <date>2020-03-09T15:20:38</date>
  <millis>1583763638330</millis>
  <sequence>8893</sequence>
  <logger>org.opengrok.indexer.index</logger>
  <level>FINE</level>
  <class>org.opengrok.indexer.index.DefaultIndexChangedListener</class>
  <method>fileAdd</method>
  <thread>64</thread>
  <message>Add: /tools/RH6-32/share/man/man1/git-clean.1 (TroffAnalyzer)</message>
</record>
<record>
  <date>2020-03-09T15:20:38</date>
  <millis>1583763638348</millis>
  <sequence>8897</sequence>
  <logger>org.opengrok.indexer.index</logger>
  <level>FINE</level>
  <class>org.opengrok.indexer.index.DefaultIndexChangedListener</class>
  <method>fileAdd</method>
  <thread>64</thread>
  <message>Add: /tools/RH6-32/share/man/man1/git-clone.1 (TroffAnalyzer)</message>
</record>
<record>
  <date>2020-03-09T15:20:38</date>
  <millis>1583763638411</millis>
  <sequence>8901</sequence>
  <logger>org.opengrok.indexer.index</logger>
  <level>FINE</level>
  <class>org.opengrok.indexer.index.DefaultIndexChangedListener</class>
  <method>fileAdd</method>
  <thread>64</thread>
  <message>Add: /tools/RH6-32/share/man/man1/git-column.1 (TroffAnalyzer)</message>
</record>
</log>

With each run it hangs on an other file, its not a problem of "/tools/RH6-32/share/man/man1/git-column.1".

Omitting the -U option makes no difference.

@vladak
Copy link
Member

vladak commented Mar 10, 2020

Time to attach a debugger and see what the indexer is doing I'd say. The sources for 1.3.7 are available on https://github.com/oracle/opengrok/releases/tag/1.3.7

Also, possibly the output of lsof on the Tomcat process might be of interest.

@vladak
Copy link
Member

vladak commented Mar 10, 2020

Or, do some jstack sampling on the indexer process.

@Z4rc
Copy link
Author

Z4rc commented Mar 10, 2020

I attached a jstack trace: grok.jstack.txt

@vladak
Copy link
Member

vladak commented Mar 10, 2020

I see some 46 threads waiting on lock in org.apache.lucene.index.ConcurrentMergeScheduler.merge, e.g.:

"ForkJoinPool-1-worker-101" #46 daemon prio=5 os_prio=0 cpu=458.34ms elapsed=114.24s tid=0x00007f57c8003000 nid=0x3494 waiting for monitor entry  [0x00007f58615ea000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:499)
        - waiting to lock <0x000000008b84d870> (a org.apache.lucene.index.ConcurrentMergeScheduler)
        at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2158)
        at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5120)
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1597)
        at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213)
        at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:774)
        at org.opengrok.indexer.index.IndexDatabase.lambda$null$2(IndexDatabase.java:1271)
        at org.opengrok.indexer.index.IndexDatabase$$Lambda$301/0x0000000800ca4040.apply(Unknown Source)
        at java.util.stream.Collectors.lambda$groupingByConcurrent$59(java.base@13.0.2/Collectors.java:1297)
        at java.util.stream.Collectors$$Lambda$303/0x0000000800ca3c40.accept(java.base@13.0.2/Unknown Source)
        at java.util.stream.ReferencePipeline.lambda$collect$1(java.base@13.0.2/ReferencePipeline.java:575)
        at java.util.stream.ReferencePipeline$$Lambda$304/0x0000000800ca3040.accept(java.base@13.0.2/Unknown Source)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(java.base@13.0.2/ForEachOps.java:183)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base@13.0.2/ArrayList.java:1621)
        at java.util.stream.AbstractPipeline.copyInto(java.base@13.0.2/AbstractPipeline.java:484)
        at java.util.stream.ForEachOps$ForEachTask.compute(java.base@13.0.2/ForEachOps.java:290)
        at java.util.concurrent.CountedCompleter.exec(java.base@13.0.2/CountedCompleter.java:746)
        at java.util.concurrent.ForkJoinTask.doExec(java.base@13.0.2/ForkJoinTask.java:290)
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(java.base@13.0.2/ForkJoinPool.java:1016)
        at java.util.concurrent.ForkJoinPool.scan(java.base@13.0.2/ForkJoinPool.java:1665)
        at java.util.concurrent.ForkJoinPool.runWorker(java.base@13.0.2/ForkJoinPool.java:1598)
        at java.util.concurrent.ForkJoinWorkerThread.run(java.base@13.0.2/ForkJoinWorkerThread.java:177)

and then one thread with:

"ForkJoinPool-1-worker-119" #63 daemon prio=5 os_prio=0 cpu=344.15ms elapsed=114.24s tid=0x00007f57a0001000 nid=0x34a5 runnable  [0x00007f58604d9000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.fs.UnixNativeDispatcher.stat0(java.base@13.0.2/Native Method)
        at sun.nio.fs.UnixNativeDispatcher.stat(java.base@13.0.2/UnixNativeDispatcher.java:301)
        at sun.nio.fs.UnixFileAttributes.get(java.base@13.0.2/UnixFileAttributes.java:70)
        at sun.nio.fs.UnixFileStore.devFor(java.base@13.0.2/UnixFileStore.java:57)
        at sun.nio.fs.UnixFileStore.<init>(java.base@13.0.2/UnixFileStore.java:72)
        at sun.nio.fs.LinuxFileStore.<init>(java.base@13.0.2/LinuxFileStore.java:53)
        at sun.nio.fs.LinuxFileSystem.getFileStore(java.base@13.0.2/LinuxFileSystem.java:128)
        at sun.nio.fs.UnixFileSystem$FileStoreIterator.readNext(java.base@13.0.2/UnixFileSystem.java:211)
        at sun.nio.fs.UnixFileSystem$FileStoreIterator.hasNext(java.base@13.0.2/UnixFileSystem.java:222)
        - locked <0x00000000920b2910> (a sun.nio.fs.UnixFileSystem$FileStoreIterator)
        at org.apache.lucene.util.IOUtils.getFileStore(IOUtils.java:595)
        at org.apache.lucene.util.IOUtils.spinsLinux(IOUtils.java:539)
        at org.apache.lucene.util.IOUtils.spins(IOUtils.java:528)
        at org.apache.lucene.util.IOUtils.spins(IOUtils.java:503)
        at org.apache.lucene.index.ConcurrentMergeScheduler.initDynamicDefaults(ConcurrentMergeScheduler.java:412)
        - locked <0x000000008b84d870> (a org.apache.lucene.index.ConcurrentMergeScheduler)
        at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:501)
        - locked <0x000000008b84d870> (a org.apache.lucene.index.ConcurrentMergeScheduler)
        at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2158)
        at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5120)
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1597)
        at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213)
        at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:774)
        at org.opengrok.indexer.index.IndexDatabase.lambda$null$2(IndexDatabase.java:1271)
        at org.opengrok.indexer.index.IndexDatabase$$Lambda$301/0x0000000800ca4040.apply(Unknown Source)
        at java.util.stream.Collectors.lambda$groupingByConcurrent$59(java.base@13.0.2/Collectors.java:1297)
        at java.util.stream.Collectors$$Lambda$303/0x0000000800ca3c40.accept(java.base@13.0.2/Unknown Source)
        at java.util.stream.ReferencePipeline.lambda$collect$1(java.base@13.0.2/ReferencePipeline.java:575)
        at java.util.stream.ReferencePipeline$$Lambda$304/0x0000000800ca3040.accept(java.base@13.0.2/Unknown Source)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(java.base@13.0.2/ForEachOps.java:183)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base@13.0.2/ArrayList.java:1621)
        at java.util.stream.AbstractPipeline.copyInto(java.base@13.0.2/AbstractPipeline.java:484)
        at java.util.stream.ForEachOps$ForEachTask.compute(java.base@13.0.2/ForEachOps.java:290)
        at java.util.concurrent.CountedCompleter.exec(java.base@13.0.2/CountedCompleter.java:746)
        at java.util.concurrent.ForkJoinTask.doExec(java.base@13.0.2/ForkJoinTask.java:290)
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(java.base@13.0.2/ForkJoinPool.java:1016)
        at java.util.concurrent.ForkJoinPool.scan(java.base@13.0.2/ForkJoinPool.java:1665)
        at java.util.concurrent.ForkJoinPool.runWorker(java.base@13.0.2/ForkJoinPool.java:1598)
        at java.util.concurrent.ForkJoinWorkerThread.run(java.base@13.0.2/ForkJoinWorkerThread.java:177)

that seems to be stuck in stat syscall for 100+ seconds while holding the lock the other threads are waiting for.

Further, there are some 45 threads doing Ctags processing with stacks similar to:

"Thread-26" #120 daemon prio=5 os_prio=0 cpu=0.20ms elapsed=114.01s tid=0x00007f5740052800 nid=0x34fe runnable  [0x00007f570f0d1000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileInputStream.readBytes(java.base@13.0.2/Native Method)
        at java.io.FileInputStream.read(java.base@13.0.2/FileInputStream.java:273)
        at java.io.BufferedInputStream.read1(java.base@13.0.2/BufferedInputStream.java:283)
        at java.io.BufferedInputStream.read(java.base@13.0.2/BufferedInputStream.java:344)
        - locked <0x000000008e044588> (a java.lang.ProcessImpl$ProcessPipeInputStream)
        at sun.nio.cs.StreamDecoder.readBytes(java.base@13.0.2/StreamDecoder.java:297)
        at sun.nio.cs.StreamDecoder.implRead(java.base@13.0.2/StreamDecoder.java:339)
        at sun.nio.cs.StreamDecoder.read(java.base@13.0.2/StreamDecoder.java:188)
        - locked <0x000000008e157468> (a java.io.InputStreamReader)
        at java.io.InputStreamReader.read(java.base@13.0.2/InputStreamReader.java:185)
        at java.io.BufferedReader.fill(java.base@13.0.2/BufferedReader.java:161)
        at java.io.BufferedReader.readLine(java.base@13.0.2/BufferedReader.java:326)
        - locked <0x000000008e157468> (a java.io.InputStreamReader)
        at java.io.BufferedReader.readLine(java.base@13.0.2/BufferedReader.java:392)
        at org.opengrok.indexer.analysis.Ctags.lambda$run$0(Ctags.java:234)
        at org.opengrok.indexer.analysis.Ctags$$Lambda$306/0x0000000800d3e040.run(Unknown Source)
        at java.lang.Thread.run(java.base@13.0.2/Thread.java:830)

Finally there is the "main" indexer thread:

"main" #1 prio=5 os_prio=0 cpu=1088.38ms elapsed=117.63s tid=0x00007f58d0019000 nid=0x3310 waiting on condition  [0x00007f58d9ee2000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@13.0.2/Native Method)
        - parking to wait for  <0x000000008b4019b0> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.parkNanos(java.base@13.0.2/LockSupport.java:235)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@13.0.2/AbstractQueuedSynchronizer.java:1079)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@13.0.2/AbstractQueuedSynchronizer.java:1369)
        at java.util.concurrent.CountDownLatch.await(java.base@13.0.2/CountDownLatch.java:278)
        at org.opengrok.indexer.index.Indexer.doIndexerExecution(Indexer.java:1069)
        at org.opengrok.indexer.index.Indexer.main(Indexer.java:347)

waiting for the worker threads to finish.

@vladak
Copy link
Member

vladak commented Mar 10, 2020

What does pstree say when run on the indexer process ? Are the ctags processes still around ? Could you strace the ctags processes to see what they are doing ?

@vladak
Copy link
Member

vladak commented Mar 10, 2020

Plus, the lsof output from indexer could be handy.

@vladak
Copy link
Member

vladak commented Mar 10, 2020

What file system is the /tools directory on ? Is it locally mounted ? What about /grok/data ?

@Z4rc
Copy link
Author

Z4rc commented Mar 10, 2020

Yes, the ctags processes are still around:

$ pstree -a 13071
java -Djava.util.logging.config.file=/grok/logging.properties -jar /grok/FOSS/opengrok-1.3.7/lib/opengrok.jar -P -H -S--dept
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  ctags --kinds-c=+l --kinds-java=+l --kinds-sql=+l --kinds-Fortran=+L --kinds-C++=+l --file-scope=yes -u --filter=yes --filter-terminator=__ctags_done_with_file__\012 --fields=-af+iKnS--exc
  173*[{java}] 

But unfortunately I can't see any activity on the ctags processes, e.g.:

$ strace -f -p 13528
strace: Process 13528 attached
read(0,

$ strace -f -p 13637
strace: Process 13637 attached
read(0, 

$ strace -f -p 14169
strace: Process 14169 attached
read(0, 

If I strace the java processes, I get the same FUTEX_WAIT_BITSET_PRIVATE and FUTEX_WAKE_PRIVATE messages I showed in the first post.

@vladak
Copy link
Member

vladak commented Mar 10, 2020

But unfortunately I can't see any activity on the ctags processes, e.g.:

$ strace -f -p 13528
strace: Process 13528 attached
read(0,

This means that the ctags processes are waiting for new job to be given to them on standard input however the indexer is not giving them any.

@Z4rc
Copy link
Author

Z4rc commented Mar 10, 2020

I am not able to get a lsof output, seems that the command also hangs.
/tools is a local zfs filesystem. I can try to move the sources to a local xfs filesystem.

@vladak
Copy link
Member

vladak commented Mar 10, 2020

I am not able to get a lsof output, seems that the command also hangs.

That's strange. That could mean a file system/kernel issue.

/tools is a local zfs filesystem. I can try to move the sources to a local xfs filesystem.

May be worth trying.

@vladak
Copy link
Member

vladak commented Mar 10, 2020

Another idea would be to strace the indexer from the beginning. This will generate lots of output.

@Z4rc
Copy link
Author

Z4rc commented Mar 10, 2020

Yay, that was the key. While doing a strace from the beginning, I saw many "unfinished" lstat calls and did afterwards a strace of the hanging lsof command.
It was hanging at a broken nfs mount (machine currently down) and after an umount everything was working again: lsof and the Grok Indexer.
So this was not a Grok issue, but many thanks to @vladak for your support!

1 similar comment
@Z4rc
Copy link
Author

Z4rc commented Mar 10, 2020

Yay, that was the key. While doing a strace from the beginning, I saw many "unfinished" lstat calls and did afterwards a strace of the hanging lsof command.
It was hanging at a broken nfs mount (machine currently down) and after an umount everything was working again: lsof and the Grok Indexer.
So this was not a Grok issue, but many thanks to @vladak for your support!

@vladak
Copy link
Member

vladak commented Mar 10, 2020

Great to hear that !

@vladak vladak closed this as completed Mar 10, 2020
@louie0817
Copy link
Contributor

louie0817 commented Mar 11, 2020

I am having the exact same issue using 1.3.6, all tools, source and output are on local disk. and the local disk is xfs. Can we re-open please.

@vladak
Copy link
Member

vladak commented Mar 12, 2020

I am having the exact same issue using 1.3.6, all tools, source and output are on local disk. and the local disk is xfs. Can we re-open please.

open a new issue please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants