Change scaling int tests to constantly emit a stream of tuples throughout scaling event #1524

billonahill · 2016-10-28T23:07:35Z

Previously the scaling int test spouts did the following:

emit 10 tuples
wait for the topology update to happen
emit 10 tuples
send terminal

The problem with this approach is that tuples might not be sent during the scaling changes. This patches changes those tests to do the following:

emit a tuple every second continuously
wait for the topology update to happen
continue emitting a tuple every second for 10 more seconds
send terminal
post the tuples sent to the state server

Upon topology completion, the test runner would then fetch the dynamic set of expected results from the state server, instead of from a static expected results file.

…hout scaling event

kramasamy · 2016-11-01T22:57:32Z

👍

billonahill · 2016-11-02T18:13:24Z

IntegrationTest_BasicTopologyOneTaskScaleUpAndDown failed. I was able to reproduce, but not yet sure of the fix. Capturing my thoughts here re what's happening.

The test topology is Spout -> Identity Bolt -> Aggregator Bolt. Parallelism is S(1), IB(2), A(1) and we scale S up and IB down to produce S(2), IB(1), A(1). This is the sequence of events:

Launch topology.
S emits A,B,A
IB[1] receives and relays A, A. IB[2] does the same for B.
Scaling event occurs, stream managers restarted, S[2] started, IB[2] killed.
Spouts emit 41 more tuples, all processed by IBs and As.

The 3 tuples emitted in steps 2-3 go unaccounted for on A. Of 44 total tuples emitted, only 41 are received.

Still trying to understand why...

kramasamy · 2016-11-02T18:25:31Z

@billonahill - is this atmost once or atleast once semantics?

billonahill · 2016-11-03T00:42:33Z

The topology is at least once. I've located the issue and verified with @maosongfu. The current integration test spout tracks how many tuples it sends and decrements that count back to zero to determine when it should send terminals, which indicate the job is done. The decrement occurs in both the ack and fail methods of the spout and there is no resend mechanism implemented in fail to re-emit. This opens up two failure scenarios that I've seen:

In the scenario I reported above the stream manager restarts and fails to emit some tuples to the aggregator bolt. The final tuple counts are low because there is no re-emit implemented.
The stream manager successfully relays all tuples, but upon restart it loses it's state and reports a failures back upstream. In this cases all tuples are received, but the fail method is invoked (false negative) on the spout. This only results in a test failure if I comment out the decrement in the fail method, since what's sent and received matches on both ends. By not decrementing the topology will never finish.

I'm going to fix by doing the following:

Change the spout to re-emit tuples when fail is called.
Change the spout to not decrement the number-of-remaining-items-to-ack count on fail, only on ack.
Change test assertion to verify at least once (as opposed to exactly once), since we could get re-transmits.

… result checker

billonahill · 2016-11-03T21:58:18Z

Updated per my previous comment. We now have the ability to test at least once, which I use for the scaling tests.

objmagic · 2016-11-04T18:33:48Z

integration-test/src/python/test_runner/main.py

+          map(lambda (k, v): "%s(%s)" % (str(k), v), actual_counts.iteritems())))
+      logging.info("Expected value frequencies ---------- \n" + ', '.join(
+          map(lambda (k, v): "%s(%s)" % (str(k), v), expected_counts.iteritems())))
+      return "fail"


Can we return False and True, instead of strings...?

Yeah, it's painful that these strings are being returned here and we should fix that. I'd like to contain the scope of changes in this PR though, and change all the true/falses in a PR of it's own.

objmagic · 2016-11-04T21:39:05Z

Besides one minor comment, this PR looks good to me.

…hout scaling event (#1524) * check for null before setting topology urls * Changing int test spout to be at least once, added at least once test result checker

Change scaling int tests to constantly emit a stream of tuples throug…

102ad89

…hout scaling event

billonahill added this to the 0.14.5 milestone Oct 28, 2016

billonahill self-assigned this Oct 28, 2016

check for null before setting topology urls

ac26390

Changing int test spout to be at least once, added at least once test…

639a56f

… result checker

objmagic suggested changes Nov 4, 2016

View reviewed changes

objmagic approved these changes Nov 5, 2016

View reviewed changes

billonahill merged commit 1123f73 into apache:master Nov 5, 2016

billonahill deleted the billg/add_scaling_int_tests branch November 5, 2016 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change scaling int tests to constantly emit a stream of tuples throughout scaling event #1524

Change scaling int tests to constantly emit a stream of tuples throughout scaling event #1524

billonahill commented Oct 28, 2016

kramasamy commented Nov 1, 2016

billonahill commented Nov 2, 2016

kramasamy commented Nov 2, 2016

billonahill commented Nov 3, 2016

billonahill commented Nov 3, 2016

objmagic Nov 4, 2016

billonahill Nov 4, 2016

objmagic Nov 5, 2016

objmagic commented Nov 4, 2016

Change scaling int tests to constantly emit a stream of tuples throughout scaling event #1524

Change scaling int tests to constantly emit a stream of tuples throughout scaling event #1524

Conversation

billonahill commented Oct 28, 2016

kramasamy commented Nov 1, 2016

billonahill commented Nov 2, 2016

kramasamy commented Nov 2, 2016

billonahill commented Nov 3, 2016

billonahill commented Nov 3, 2016

objmagic Nov 4, 2016

Choose a reason for hiding this comment

billonahill Nov 4, 2016

Choose a reason for hiding this comment

objmagic Nov 5, 2016

Choose a reason for hiding this comment

objmagic commented Nov 4, 2016