Skip to content

Conversation

@kevinconaway
Copy link
Contributor

Fixes for https://issues.apache.org/jira/browse/STORM-2608

@abhishekagarwal87 or @revans2 please let me know your thoughts here.

@kevinconaway
Copy link
Contributor Author

I've added a test case for this. Its a bit complicated due to the specific nature of how the bug occurs so I'm open to suggestions on how to improve it.

@HeartSaVioR
Copy link
Contributor

@kevinconaway
I don't mind the change is not covered by test since the change is really small, but adding a new test is better for sure.
The change looks good, but I didn't have time to check the test code. I'll revisit this soon.

Btw, I'd like to see the PR against master branch too. Please note that checkstyle is applied to the master branch.

@@ -0,0 +1,202 @@
package org.apache.storm.kafka;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache license header is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@kevinconaway
Copy link
Contributor Author

I'd like to see the PR against master branch too. Please note that checkstyle is applied to the master branch.

Sure, I'll wait until this is finalized and then submit a PR with the changes against master. Does that work?

@HeartSaVioR
Copy link
Contributor

@kevinconaway Sure. I'll review the test code and let you know.

@HeartSaVioR
Copy link
Contributor

@kevinconaway
Builds in Travis CI seem to fail consistently at this point:

classname: org.apache.storm.kafka.PartitionManagerTest / testname: test2608
java.lang.NullPointerException: null
	at org.apache.storm.kafka.PartitionManagerTest.shutdown(PartitionManagerTest.java:102)

I think this is really odd but happens anyway. Seems like running test cluster inside of test seems not working on Travis CI build.
https://travis-ci.org/apache/storm/jobs/250442626

If you're willing to resolve the build failure, please enable Travis CI in your fork, and fix it. I'm even OK to remove the test code and just having one-liner change.

@HeartSaVioR
Copy link
Contributor

@kevinconaway
Seems like it is failing intermittently, but high probability. I also saw test failing on my local, but also saw test succeed on my local.
Could you run tests several times and see how many time it fails?

@kevinconaway
Copy link
Contributor Author

It succeeds for me locally. The NPE that you mentioned is actually due to another error in the @Before method

test2608(org.apache.storm.kafka.PartitionManagerTest)  Time elapsed: 1.158 sec  <<< ERROR!
java.lang.RuntimeException: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/testTopic/partitions
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230)
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219)
	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216)
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207)
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40)
	at org.apache.storm.kafka.DynamicBrokersReader.getNumPartitions(DynamicBrokersReader.java:110)
	at org.apache.storm.kafka.DynamicBrokersReader.getBrokerInfo(DynamicBrokersReader.java:83)
	at org.apache.storm.kafka.trident.ZkBrokerReader.<init>(ZkBrokerReader.java:44)
	at org.apache.storm.kafka.PartitionManagerTest.setup(PartitionManagerTest.java:81)

test2608(org.apache.storm.kafka.PartitionManagerTest)  Time elapsed: 1.159 sec  <<< ERROR!
java.lang.NullPointerException: null
	at org.apache.storm.kafka.PartitionManagerTest.shutdown(PartitionManagerTest.java:102)

The ZkBrokerReader isn't seeing the topic information in Zookeeper which makes me think its a race condition between when the topic is created and then ZkBrokerReader tries to read the topic / partition info

@HeartSaVioR
Copy link
Contributor

@kevinconaway Your last change seemed to make compilation failure. Could you ensure that it succeeds to build with both JDK7 and JDK8?

@kevinconaway
Copy link
Contributor Author

The constructor for ZkCoordinator changed in the upstream branch. Will fix.

@HeartSaVioR
Copy link
Contributor

Your patch seems not remedy the NPE issue. Same exception raised for newer build.

https://travis-ci.org/apache/storm/jobs/251340422
https://travis-ci.org/apache/storm/jobs/251340423

classname: org.apache.storm.kafka.PartitionManagerTest / testname: test2608
java.lang.NullPointerException: null
	at org.apache.storm.kafka.PartitionManagerTest.shutdown(PartitionManagerTest.java:103)

@kevinconaway
Copy link
Contributor Author

Thanks for your patience on this. I did some debugging on Travis on my fork and found 2 issues

  • The behavior of binding to localhost seems somewhat inconsistent for Zookeeper and Kafka. I noticed a bunch of ZK connection issues trying to connect to the hostname of the travis machine that wouldn't resolve. Using 127.0.0.1 fixes this
  • More importantly, there were timing issues related to creating the topic and the partition leaders being assigned. DynamicBrokersReader#getNumPartitions will fail if the topic doesn't yet have leaders assigned to its partitions and this was the root of the NPE.

I fixed the second issue by waiting until the topic is fully ready in kafka (exists in zookeeper and all partitions have leaders) before attempting to use it. I also adjusted the logic around emitting messages from the partition manager to allow for timing delays on the broker end as this was another cause of spurious failures.

I've also adjusted the integration-test parent pom version to match the current one in the branch. It looks like this was missed when the version was bumped to 1.2.0-SNAPSHOT

@HeartSaVioR
Copy link
Contributor

@kevinconaway
Great. Thanks for the patience on fixing all the issues. +1
Could you squash the commits into one? And please craft another PR against master branch.
Thanks in advance!

HeartSaVioR and others added 12 commits July 8, 2017 21:39
* Implement HBase state backend
  * picked Redis state backend implementation
  * also implemented state iterator
* Added 'how to set up' to State-checkpointing doc
* address @arunmahadevan's review comment
STORM-2608 Add test case

STORM-2608 Add missing license header

STORM-2608 Ensure that the topic is created and available in ZK before returning from #createTopic

STORM-2608 Update to latest from 1.x branch

STORM-2608 Fix IT pom version

STORM-2608 Ensure that partitions have leaders before returning from createTopic

STORM-2608 Remove any pending offsets that are no longer valid
@kevinconaway
Copy link
Contributor Author

I borked the history trying to squash the commits. I've created a separate PR:

#2190

Closing this out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants