Skip to content

Conversation

@dragos
Copy link
Contributor

@dragos dragos commented Jan 29, 2015

This fixes two sources of non-deterministic failures in this test:

  • wait for a receiver to be up before pushing data through MQTT
  • gracefully handle the case where the MQTT client is overloaded. There’s
    a hard-coded limit of 10 in-flight messages, and this test may hit it.
    Instead of crashing, we retry sending the message.

Both of these are needed to make the test pass reliably on my machine.

According to the MQTT client docs the callback needs to be installed
before the client connects or subscribes to a topic. Otherwise
messages may be lost.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@tdas
Copy link
Contributor

tdas commented Jan 30, 2015

Jenkins, this is okay to test.

@tdas
Copy link
Contributor

tdas commented Jan 30, 2015

Can you add "[FIX]" to the title, so that it is clear that this fixes SPARK-631

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change, but why this change for this fix? Was it necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was not necessary, I just noticed that it wasn't needed, so I thought I should be a boy scout :)

@tdas
Copy link
Contributor

tdas commented Jan 30, 2015

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26366 has started for PR 4270 at commit 261653f.

  • This patch merges cleanly.

@tdas
Copy link
Contributor

tdas commented Jan 30, 2015

@Blina @prabeesh Please take a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining this function inside the "if" looks a little weird, too nested. Might be better to just have it in for loop. Code would look less complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do.

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26366 has finished for PR 4270 at commit 261653f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26366/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can modify the code like this.

for (i <- 0 to 100) {
  try {
    msgtopic.publish(message)
  } catch {
    case e: MqttException => Thread.sleep(1)
  }
}

I think tread sleep 1 second is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but it won't retry the same message, instead if would just ignore a failure. In that case the total number of messages will be less than 101.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think waitForReceiverToStart() method is sufficient. Once the receiver is started, since the qos is set to 1 , as soon as the receiver receives the message, message will be flushed out from queue. If still problem exists the Thread.Sleep in the catch can solve the problem. There is no need for tryPublish (I guess)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll check that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try this also.
Along with waitForReceiverToStart(), change the 'for loop' range from 0 to 9 . It is just for testing.
I am getting the error only if I change the MqttDefaultFilePersistence to memoryPersistence. I solved the problem by applying the solution provided above.

@dragos dragos changed the title [SPARK-4631][streaming] Wait for a receiver to start before publishing test data. [SPARK-4631][streaming][FIX] Wait for a receiver to start before publishing test data. Jan 30, 2015
@dragos
Copy link
Contributor Author

dragos commented Jan 30, 2015

@Bilna, @tdas what is the preferred protocol for reviews? Should I push a new commit, or amend the existing one and force-push to my repo?

…g test data.

This fixes two sources of non-deterministic failures in this test:

- wait for a receiver to be up before pushing data through MQTT
- gracefully handle the case where the MQTT client is overloaded. There’s
a hard-coded limit of 10 in-flight messages, and this test may hit it.
Instead of crashing, we retry sending the message.

Both of these are needed to make the test pass reliably on my machine.
@dragos dragos force-pushed the issue/fix-flaky-test-SPARK-4631 branch from 261653f to f66c482 Compare January 30, 2015 09:50
@dragos
Copy link
Contributor Author

dragos commented Jan 30, 2015

OK, I went for the amend (the way we do in our projects). This way the commit history is cleaner.

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26397 has started for PR 4270 at commit f66c482.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26397 has finished for PR 4270 at commit f66c482.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26397/
Test PASSed.

@srowen
Copy link
Member

srowen commented Jan 30, 2015

@dragos you can just add a commit. It is easier to review and commits are squashed on merge anyway.

@dragos
Copy link
Contributor Author

dragos commented Jan 30, 2015

@srowen if it's not too bad I'll do that the next time. I already amended the last commit, and the change is pretty small, it shouldn't be too hard to re-review.

@dragos
Copy link
Contributor Author

dragos commented Feb 2, 2015

I think I addressed all reviewer's comments. @tdas @Bilna Is there anything else I should do?

@tdas
Copy link
Contributor

tdas commented Feb 2, 2015

Merging this. Thanks for catching and fixing this.

@asfgit asfgit closed this in e908322 Feb 2, 2015
asfgit pushed a commit that referenced this pull request Feb 13, 2015
…ishing test data.

This fixes two sources of non-deterministic failures in this test:

- wait for a receiver to be up before pushing data through MQTT
- gracefully handle the case where the MQTT client is overloaded. There’s
a hard-coded limit of 10 in-flight messages, and this test may hit it.
Instead of crashing, we retry sending the message.

Both of these are needed to make the test pass reliably on my machine.

Author: Iulian Dragos <jaguarul@gmail.com>

Closes #4270 from dragos/issue/fix-flaky-test-SPARK-4631 and squashes the following commits:

f66c482 [Iulian Dragos] [SPARK-4631][streaming] Wait for a receiver to start before publishing test data.
d408a8e [Iulian Dragos] Install callback before connecting to MQTT broker.

(cherry picked from commit e908322)
Signed-off-by: Sean Owen <sowen@cloudera.com>

# Conflicts:
#	external/mqtt/src/test/scala/org/apache/spark/streaming/mqtt/MQTTStreamSuite.scala
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Feb 24, 2015
…ishing test data.

This fixes two sources of non-deterministic failures in this test:

- wait for a receiver to be up before pushing data through MQTT
- gracefully handle the case where the MQTT client is overloaded. There’s
a hard-coded limit of 10 in-flight messages, and this test may hit it.
Instead of crashing, we retry sending the message.

Both of these are needed to make the test pass reliably on my machine.

Author: Iulian Dragos <jaguarul@gmail.com>

Closes apache#4270 from dragos/issue/fix-flaky-test-SPARK-4631 and squashes the following commits:

f66c482 [Iulian Dragos] [SPARK-4631][streaming] Wait for a receiver to start before publishing test data.
d408a8e [Iulian Dragos] Install callback before connecting to MQTT broker.

(cherry picked from commit e908322)
Signed-off-by: Sean Owen <sowen@cloudera.com>

# Conflicts:
#	external/mqtt/src/test/scala/org/apache/spark/streaming/mqtt/MQTTStreamSuite.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants