-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[STORM-1565] Multi-Lang Performance Improvements #1136
Conversation
You may want to look at the Pyleus project's |
Also, this would be amazing, as I was trying to convince the Pyleus folks to contribute their serializer to Storm anyway in Yelp/pyleus#159. |
conf/defaults.yaml
Outdated
@@ -226,7 +226,7 @@ topology.eventlogger.executors: null | |||
topology.tasks: null | |||
# maximum amount of time a message has to complete before it's considered failed | |||
topology.message.timeout.secs: 30 | |||
topology.multilang.serializer: "org.apache.storm.multilang.JsonSerializer" | |||
topology.multilang.serializer: "org.apache.storm.multilang.MessagePackSerializer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While having the MessagePackSeriailzer
built-in would be amazing, changing it to the default serializer would break a lot of multi-lang libraries, so I don't think that's a great idea. Let people opt in.
Thanks @dan-blanchard I'd love to hear from pyleus contributors. |
cc @HeartSaVioR You might be interested in this improvement. |
@@ -202,6 +202,7 @@ | |||
<clj-time.version>0.8.0</clj-time.version> | |||
<curator.version>2.9.0</curator.version> | |||
<json-simple.version>1.1</json-simple.version> | |||
<msgpack.version>0.6.12</msgpack.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be much more useful if it were implemented using a newer version of the msgpack-core
library—they changed the name after 0.6.12—because 0.7 and above supports the BINARY
format, which lets you send arbitrary bytes. Without that, you won't be able to send tuples containing arbitrary bytes with this serializer. This is also a problem with the JSON serializer (because JSON strings can't contain non-Unicode characters), but it would be great if we didn't have the problem here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoyingly, they changed the whole API with the change from msgpack
to msgpack-core
, so the template approach won't work anymore. You can see new usage examples here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dan-blanchard Thanks. I will update.
@vesense
Personally I feel that we're stuck on having less-maintaining and non-library-dependent default implementations. I can take care of python / ruby implementation though I'm not expert on these, but have no idea on nodejs. |
So it may need to discuss with current state of multi-lang support once rather than adding something continuously. |
@HeartSaVioR Agree with you. We should have a discussion about multi-lang support. (Can open a thread on dev@ mailing list) |
I'm not on the Storm dev mailing list yet—I guess I should probably fix that—but we actually already have a Python implementation of msgpack serialization in pystorm, the Python multi-lang implementation that powers streamparse. I've been meaning to propose for a while that instead of having your own default Python multi-lang implementation in Storm that very few people use (because it's not very Pythonic or production-ready), you should instead point people to use pystorm at the very least. It provides all the functionality that the |
@dan-blanchard |
@dan-blanchard And please note that opinions are my own, so we should raise this to discussion and let community give various opinions if we would really want to. |
I work full-time for a company that uses pystorm and streamparse in production, and we have a team dedicated to maintaining these projects, so I don't think you need to worry about that much. |
OK great. Could you subscribe dev@ mailing list? I occasionally initiate discussion from there (multilang, too) so you might want to have a talk regarding to multilang. |
Yup. I subscribed yesterday.
|
@vesense Any update on this? It looks like pystorm/streamparse are waiting for this to be merged. |
# Conflicts: # pom.xml
And another problem is that it is required to specify the encoding character when using JSON serializing, default is utf-8 . It will raise exception if the object contains non-utf8 characters. Will MessagePackSerializer solve this problem? |
In my opinion, I prefer to have this in multilang module rather than core. Upgrading storm cluster is not convenient. |
It will if this code is updated to use the latest version of the
|
Thats great. I hope this pull request can be accepted and released soon. |
So is there any update about merging this pull request? |
Was chatting with @roshannaik and @dan-blanchard today, and this PR came up. Someone on Storm team may benefit from taking a look at this as part of the performance revisions being done for Storm 2.0. As mentioned in #428 in streamparse -- the community project for running Python multi-lang topologies with Storm -- getting this merged somewhere in the Storm codebase would open up the possibility to switch serializer from Kryo & JSON to msgpack throughout, which would speed up multi-lang use cases considerably. This PR includes a pure Java implementation of a msgpack serializer, as well as pointers to the right msgpack library in the Java community; it just needs to be reviewed, tested, and merged. |
https://issues.apache.org/jira/browse/STORM-1565