Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug while making a lot of batch injections. InterruptedException: sleep interrupted #6081

Closed
fernandosoto138 opened this issue Jul 31, 2018 · 3 comments
Labels

Comments

@fernandosoto138
Copy link

Hi, I'm trying to load 100 batch files of 1.5 gigabytes each one with 20 millions of registers each one.

My structure is the following, each dot is a different server

  • Zookeeper
  • Druid broker
  • Druid coordinator - overlord
  • Postgre - historical - middle manager (with 3 workers) this have 6 cores and 32gb of ram.
  • Hadoop namenode
  • five hadoop datanodes with 200gb each one.

Almost all of the tasks are running with SUCCESS but some random tasks throw the next message:

task[HadoopIndexTask{id=index_hadoop_sales2_2018-07-30T20:35:48.876Z, type=index_hadoop, dataSource=sales2}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:238) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.1.jar:0.12.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	... 7 more
Caused by: java.lang.RuntimeException: java.lang.InterruptedException: sleep interrupted
	at io.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:229) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:293) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	... 7 more
Caused by: java.lang.InterruptedException: sleep interrupted
	at java.lang.Thread.sleep(Native Method) ~[?:1.8.0_181]
	at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1353) ~[?:?]
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1311) ~[?:?]
	at io.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:212) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:293) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	... 7 more
2018-07-31T06:59:15,123 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_sales2_2018-07-30T20:35:48.876Z] status changed to [FAILED].
2018-07-31T06:59:15,139 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_sales2_2018-07-30T20:35:48.876Z",
  "status" : "FAILED",
  "duration" : 589783
}

is this a issue because the load average? it's oscilating between 8 and 11.

If this is a bug, is there a way for running the failed tasks automatiacally?

@stale
Copy link

stale bot commented Jun 20, 2019

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jun 20, 2019
@gianm gianm removed stale labels Jul 4, 2019
@stale
Copy link

stale bot commented Apr 9, 2020

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Apr 9, 2020
@stale
Copy link

stale bot commented May 7, 2020

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants