Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Kill task" failure when interval data partitioned #6132

Closed
revverse opened this issue Aug 9, 2018 · 2 comments
Closed

"Kill task" failure when interval data partitioned #6132

revverse opened this issue Aug 9, 2018 · 2 comments
Labels

Comments

@revverse
Copy link

revverse commented Aug 9, 2018

I have partitioned data in HDFS storage for time ranges:

| e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z   | e-001 | 2017-11-22T02:35:47.071Z | 2017-11-22T01:00:00.000Z | 2017-11-22T02:00:00.000Z |           1 | 2017-11-22T01:47:42.266Z |    0 | {"dataSource":"e-001","interval":"2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z","version":"2017-11-22T01:47:42.266Z","loadSpec":{"type":"hdfs","path":"hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/0_index.zip"},"dimensions":"{..}","shardSpec":{"type":"numbered","partitionNum":0,"partitions":0},"binaryVersion":9,"size":29458664,"identifier":"e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z"} |

| e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_1 | e-001 | 2017-11-22T02:35:47.070Z | 2017-11-22T01:00:00.000Z | 2017-11-22T02:00:00.000Z |           1 | 2017-11-22T01:47:42.266Z |    0 | {"dataSource":"e-001","interval":"2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z","version":"2017-11-22T01:47:42.266Z","loadSpec":{"type":"hdfs","path":"hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/1_index.zip"},"dimensions":"{..}","shardSpec":{"type":"numbered","partitionNum":1,"partitions":0},"binaryVersion":9,"size":32192241,"identifier":"e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_1"} |

| e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_2 | e-001 | 2017-11-22T02:35:47.069Z | 2017-11-22T01:00:00.000Z | 2017-11-22T02:00:00.000Z |           1 | 2017-11-22T01:47:42.266Z |    0 | {"dataSource":"e-001","interval":"2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z","version":"2017-11-22T01:47:42.266Z","loadSpec":{"type":"hdfs","path":"hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/2_index.zip"},"dimensions":"{..}","shardSpec":{"type":"numbered","partitionNum":2,"partitions":0},"binaryVersion":9,"size":21045793,"identifier":"e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_2"} |

| e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_3 | e-001 | 2017-11-22T03:35:54.100Z | 2017-11-22T01:00:00.000Z | 2017-11-22T02:00:00.000Z |           1 | 2017-11-22T01:47:42.266Z |    0 | {"dataSource":"e-001","interval":"2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z","version":"2017-11-22T01:47:42.266Z","loadSpec":{"type":"hdfs","path":"hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/3_index.zip"},"dimensions":"{..}","shardSpec":{"type":"numbered","partitionNum":3,"partitions":0},"binaryVersion":9,"size":32177515,"identifier":"e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_3"} |

| e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_4 | e-001 | 2017-11-22T03:35:54.099Z | 2017-11-22T01:00:00.000Z | 2017-11-22T02:00:00.000Z |           1 | 2017-11-22T01:47:42.266Z |    0 | {"dataSource":"e-001","interval":"2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z","version":"2017-11-22T01:47:42.266Z","loadSpec":{"type":"hdfs","path":"hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/4_index.zip"},"dimensions":"{..}","shardSpec":{"type":"numbered","partitionNum":4,"partitions":0},"binaryVersion":9,"size":3594087,"identifier":"e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_4"} |

When i run kill task :

{
"type": "kill",
"id": "clean001-2017-11-22T01",
"interval": "2017-11-22T01:00:00Z/2017-11-22T02:00:01Z",
"dataSource": "e-001 "
}

I see that only first partition "0_index.zip" is removed from storage and task start failing with this errors:

2018-08-09T12:54:14,089 INFO [main] io.druid.query.lookup.LookupReferencesManager - Coordinator is unavailable. Loading saved snapshot instead
2018-08-09T12:54:14,089 INFO [main] io.druid.query.lookup.LookupReferencesManager - No lookups to be loaded at this point
2018-08-09T12:54:14,090 INFO [main] io.druid.query.lookup.LookupReferencesManager - LookupReferencesManager is started.
2018-08-09T12:54:14,090 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.server.listener.announcer.ListenerResourceAnnouncer.start()] on object[io.druid.query.lookup.LookupResourceListenerAnnouncer@601f264d].
2018-08-09T12:54:14,114 INFO [main] io.druid.server.listener.announcer.ListenerResourceAnnouncer - Announcing start time on [/druid/listeners/lookups/__default/http:1.1.1.1:7109]
2018-08-09T12:54:15,328 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.KillTask - OK to kill segment: e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z
2018-08-09T12:54:15,328 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.KillTask - OK to kill segment: e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_1
2018-08-09T12:54:15,328 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.KillTask - OK to kill segment: e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_2
2018-08-09T12:54:15,328 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.KillTask - OK to kill segment: e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_3
2018-08-09T12:54:15,328 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.KillTask - OK to kill segment: e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_4
2018-08-09T12:54:15,329 INFO [task-runner-0-priority-0] io.druid.storage.hdfs.HdfsDataSegmentKiller - Killing segment[e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z] mapped to path[hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/0_index.zip]
2018-08-09T12:54:15,595 INFO [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[clean001-2017-11-22T01]: SegmentNukeAction{segments=[DataSegment{size=29458664, shardSpec=NumberedShardSpec{partitionNum=0, partitions=0}, metrics=[{....}], version='2017-11-22T01:47:42.266Z', loadSpec={type=>hdfs, path=>hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/0_index.zip}, interval=2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z, dataSource='e-001', binaryVersion='9'}]}
2018-08-09T12:54:15,605 INFO [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[clean001-2017-11-22T01] to overlord: [SegmentNukeAction{segments=[DataSegment{size=29458664, shardSpec=NumberedShardSpec{partitionNum=0, partitions=0}, metrics=[{....}], version='2017-11-22T01:47:42.266Z', loadSpec={type=>hdfs, path=>hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/0_index.zip}, interval=2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z, dataSource='e-001', binaryVersion='9'}]}].
2018-08-09T12:54:15,613 WARN [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Exception submitting action for task[clean001-2017-11-22T01]
io.druid.java.util.common.IOE: Scary HTTP status returned: 500 Server Error. Check your overlord logs for exceptions.
	at io.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:95) [druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.common.task.KillTask.run(KillTask.java:104) [druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.1.jar:0.12.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_66]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_66]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_66]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]

On storage :

-rw-r--r--	druid	hdfs	1.03 KB	22.11.2017, 04:34:31	3	128 MB	1_descriptor.json
-rw-r--r--	druid	hdfs	19.7 MB	22.11.2017, 04:34:31	3	128 MB	1_index.zip
-rw-r--r--	druid	hdfs	1.03 KB	22.11.2017, 04:33:50	3	128 MB	2_descriptor.json
-rw-r--r--	druid	hdfs	12.88 MB	22.11.2017, 04:33:50	3	128 MB	2_index.zip
-rw-r--r--	druid	hdfs	1.03 KB	22.11.2017, 05:34:20	3	128 MB	3_descriptor.json
-rw-r--r--	druid	hdfs	19.7 MB	22.11.2017, 05:34:20	3	128 MB	3_index.zip
-rw-r--r--	druid	hdfs	1.03 KB	22.11.2017, 05:33:35	3	128 MB	4_descriptor.json
-rw-r--r--	druid	hdfs	2.15 MB	22.11.2017, 05:33:35	3	128 MB	4_index.zip

Before task started files for first partition was here
Supervisor logs:

2018-08-09T11:39:06,178 WARN [qtp1651923692-164] org.eclipse.jetty.servlet.ServletHandler - /druid/indexer/v1/action
io.druid.java.util.common.ISE: Segments not covered by locks for task: clean001-2017-11-22T01
	at io.druid.indexing.common.actions.TaskActionPreconditions.checkLockCoversSegments(TaskActionPreconditions.java:45) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.common.actions.SegmentNukeAction.perform(SegmentNukeAction.java:70) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.common.actions.SegmentNukeAction.perform(SegmentNukeAction.java:40) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.common.actions.LocalTaskActionClient.submit(LocalTaskActionClient.java:64) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.overlord.http.OverlordResource$3.apply(OverlordResource.java:345) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.overlord.http.OverlordResource$3.apply(OverlordResource.java:334) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.overlord.http.OverlordResource.asLeaderWith(OverlordResource.java:672) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.overlord.http.OverlordResource.doAction(OverlordResource.java:331) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at sun.reflect.GeneratedMethodAccessor179.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_66]

Version: druid-0.12.1

@revverse revverse changed the title Kill task failure when interval data partitioned "Kill task" failure when interval data partitioned Aug 9, 2018
@stale
Copy link

stale bot commented Jun 20, 2019

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jun 20, 2019
@gianm gianm removed stale labels Jul 4, 2019
@stale
Copy link

stale bot commented Apr 9, 2020

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants