Simplify BatchingBolt implementation to just use tick tuples #125

dan-blanchard · 2015-04-14T18:40:06Z

It just occurred to me that now that we have a process_tick method (#124) , we could just use that for BatchingBolt instead of process_batch (or just make process_tick call process_batch if we want to not break the API). I think we could also do away with the threading complications entirely using that approach. @kbourgoin, what do you think?

The text was updated successfully, but these errors were encountered:

amontalenti · 2015-04-14T18:57:12Z

Not necessarily a smooth transition. You can only set one tick value cluster-wide. I suppose if you always set it to e.g. "1", then the BatchingBolt could do something like, "every N ticks, flush". But, if you wanted to have different batch rates for different bolts, things might get complicated.

I do agree that part of the purpose of tick tuples is to avoid the need for multi-threaded "batch/flush" cycles. Perhaps we could just improve BatchingBolt such that you can provide ticks_between_batches instead of secs_between_batches?

dan-blanchard · 2015-04-14T18:59:27Z

Perhaps we could just improve BatchingBolt such that you can provide ticks_between_batches instead of secs_between_batches?

I like that idea. I also think this would make it a lot easier for us to test BatchingBolt, because currently the tests require sleeps to make sure enough time has gone by, which is prone to issues when the Travis machines are overloaded.

kbourgoin · 2015-04-14T19:48:38Z

I've been working on some other ideas regarding the AsyncBolt I showed Dan this morning, but I think for the interim using the ticks could work. I'm still not a huge fan of only having time-based batching, but eliminating the threading could definitely work. Have we verified that tick tuples keep coming through even when the topology is shutting down? The lack of tuples coming in is the current reason for threaded batching.

dan-blanchard · 2015-04-14T19:53:53Z

Have we verified that tick tuples keep coming through even when the topology is shutting down?

Is this actually a problem? I mean, isn't the normal use case for Storm "Turn it on and never turn it off"? And if you're shutting down your topology, Storm will just kill your Bolts along with your Spouts, so how would you guarantee they were processed even with the threading?

kbourgoin · 2015-04-14T20:03:49Z

There's a topology shutdown period that's equal to your tuple timeout value where it waits for everything in-process to finish. It stops reading new tuples from the spout, so there's no new work coming in, but it still gives you an opportunity to finish anything in-flight. If you're waiting for more tuples to come in before checking the time and releasing the next batch, those tuples never show up and you never handle tuples which are waiting -- the machine just locks up.

This doesn't sound too bad if you have good bookkeping of what's been acked, but if you're using Kafka with auto-ack, those tuples have now fallen into a black hole.

dan-blanchard · 2015-04-14T20:23:01Z

if you're using Kafka with auto-ack, those tuples have now fallen into a black hole.

Ah, that explains it then. When I was at ETS our system had a kind of crazy ack setup, so I never used auto-ack.

dan-blanchard mentioned this issue Apr 24, 2015

Make BatchingBolt use tick tuples instead of threads. #137

Merged

dan-blanchard added this to the v1.2 milestone Apr 27, 2015

dan-blanchard self-assigned this Apr 27, 2015

dan-blanchard added the in progress label Apr 27, 2015

dan-blanchard closed this as completed in #137 Jun 15, 2015

amontalenti removed the in progress label Jun 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify BatchingBolt implementation to just use tick tuples #125

Simplify BatchingBolt implementation to just use tick tuples #125

dan-blanchard commented Apr 14, 2015

amontalenti commented Apr 14, 2015

dan-blanchard commented Apr 14, 2015

kbourgoin commented Apr 14, 2015

dan-blanchard commented Apr 14, 2015

kbourgoin commented Apr 14, 2015

dan-blanchard commented Apr 14, 2015

Simplify BatchingBolt implementation to just use tick tuples #125

Simplify BatchingBolt implementation to just use tick tuples #125

Comments

dan-blanchard commented Apr 14, 2015

amontalenti commented Apr 14, 2015

dan-blanchard commented Apr 14, 2015

kbourgoin commented Apr 14, 2015

dan-blanchard commented Apr 14, 2015

kbourgoin commented Apr 14, 2015

dan-blanchard commented Apr 14, 2015