Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Change scaling int tests to constantly emit a stream of tuples throughout scaling event #1524

Merged
merged 3 commits into from
Nov 5, 2016

Conversation

billonahill
Copy link
Contributor

Previously the scaling int test spouts did the following:

  • emit 10 tuples
  • wait for the topology update to happen
  • emit 10 tuples
  • send terminal

The problem with this approach is that tuples might not be sent during the scaling changes. This patches changes those tests to do the following:

  • emit a tuple every second continuously
  • wait for the topology update to happen
  • continue emitting a tuple every second for 10 more seconds
  • send terminal
  • post the tuples sent to the state server

Upon topology completion, the test runner would then fetch the dynamic set of expected results from the state server, instead of from a static expected results file.

@billonahill billonahill added this to the 0.14.5 milestone Oct 28, 2016
@billonahill billonahill self-assigned this Oct 28, 2016
@kramasamy
Copy link
Contributor

👍

@billonahill
Copy link
Contributor Author

IntegrationTest_BasicTopologyOneTaskScaleUpAndDown failed. I was able to reproduce, but not yet sure of the fix. Capturing my thoughts here re what's happening.

The test topology is Spout -> Identity Bolt -> Aggregator Bolt. Parallelism is S(1), IB(2), A(1) and we scale S up and IB down to produce S(2), IB(1), A(1). This is the sequence of events:

  1. Launch topology.
  2. S emits A,B,A
  3. IB[1] receives and relays A, A. IB[2] does the same for B.
  4. Scaling event occurs, stream managers restarted, S[2] started, IB[2] killed.
  5. Spouts emit 41 more tuples, all processed by IBs and As.

The 3 tuples emitted in steps 2-3 go unaccounted for on A. Of 44 total tuples emitted, only 41 are received.

Still trying to understand why...

@kramasamy
Copy link
Contributor

@billonahill - is this atmost once or atleast once semantics?

@billonahill
Copy link
Contributor Author

The topology is at least once. I've located the issue and verified with @maosongfu. The current integration test spout tracks how many tuples it sends and decrements that count back to zero to determine when it should send terminals, which indicate the job is done. The decrement occurs in both the ack and fail methods of the spout and there is no resend mechanism implemented in fail to re-emit. This opens up two failure scenarios that I've seen:

  1. In the scenario I reported above the stream manager restarts and fails to emit some tuples to the aggregator bolt. The final tuple counts are low because there is no re-emit implemented.
  2. The stream manager successfully relays all tuples, but upon restart it loses it's state and reports a failures back upstream. In this cases all tuples are received, but the fail method is invoked (false negative) on the spout. This only results in a test failure if I comment out the decrement in the fail method, since what's sent and received matches on both ends. By not decrementing the topology will never finish.

I'm going to fix by doing the following:

  1. Change the spout to re-emit tuples when fail is called.
  2. Change the spout to not decrement the number-of-remaining-items-to-ack count on fail, only on ack.
  3. Change test assertion to verify at least once (as opposed to exactly once), since we could get re-transmits.

@billonahill
Copy link
Contributor Author

Updated per my previous comment. We now have the ability to test at least once, which I use for the scaling tests.

map(lambda (k, v): "%s(%s)" % (str(k), v), actual_counts.iteritems())))
logging.info("Expected value frequencies ---------- \n" + ', '.join(
map(lambda (k, v): "%s(%s)" % (str(k), v), expected_counts.iteritems())))
return "fail"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return False and True, instead of strings...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's painful that these strings are being returned here and we should fix that. I'd like to contain the scope of changes in this PR though, and change all the true/falses in a PR of it's own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@objmagic
Copy link
Contributor

objmagic commented Nov 4, 2016

Besides one minor comment, this PR looks good to me.

@billonahill billonahill merged commit 1123f73 into apache:master Nov 5, 2016
@billonahill billonahill deleted the billg/add_scaling_int_tests branch November 5, 2016 03:51
nicknezis pushed a commit that referenced this pull request Sep 14, 2020
…hout scaling event (#1524)



* check for null before setting topology urls

* Changing int test spout to be at least once, added at least once test result checker
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants