-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rdd execution support for function Items #353
Conversation
CanBerker
commented
Oct 31, 2019
•
edited
Loading
edited
- Function Items are refactored to store RuntimeIterators rather than body expressions
- FunctionItemIterator renamed to FunctionRuntimeIterator for consistency
- Types references by FunctionItem (FunctionItem, FunctionIdentifier, SequenceType, SequenceType.arity and ItemType) added to kryo serialization
- Dynamic Function Call - FunctionItem input verification improved
- forClauseSparkIterator's was failing to pass Dynamic context properly. Implementation is changed to match other clause iterators.
- RuntimeIterator's hasNext, isOpen and DynamicContext fields are made transient and removed from Kryo serialization.
- FunctionItem refactored to use Java.io serialization instead of Kryo for serializing the runtimeIterator of the function body.
@@ -45,12 +45,15 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check whether Kryo was even used before? It may be that iterators were automatically serialized with Java IO when used in Spark closures, and that Kryo serialization was never tested. I am not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried setting breakpoints for the write and read methods of RuntimeIterator but nothing was hit when i tried a few queries manually and our regular tests as well. I also tried to falsify the values in the read method (so that isOpen and hasNext would be set to true upon object instantiation) but code ran normally again.
When I tried to set similar breakpoints for write and read methods of BooleanItem, the breakpoints were actually hit in some of the tests so I believe that the kryo serialization for RuntimeIterators are in fact not used.
Please let me know if you have a better testing method in mind or if there is something missing in my understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A place to look at would be the Spark configuration. There are two things that should be configurable: the serialization of results (the Items in the RDDs), which Stefan made use Kryo already during his thesis (plus you did the same for manual serialization in DataFrames), and on the other hand the serialization of all closures passed to Spark in transformations (and this is were the Iterators are serialized), which we are unsure uses Java IO or Kryo.
While I do not have the exact parameters on my mind, this may be a useful reading to start with:
https://ogirardot.wordpress.com/2015/01/09/changing-sparks-default-java-serialization-to-kryo/
However, since for now the matter with FunctionItems is fixed with using Java IO for the iterator, there is no urgency in this. You can look into this when you see fit.
src/main/java/sparksoniq/jsoniq/runtime/iterator/RuntimeIterator.java
Outdated
Show resolved
Hide resolved
src/main/java/sparksoniq/jsoniq/runtime/iterator/RuntimeIterator.java
Outdated
Show resolved
Hide resolved
src/main/java/sparksoniq/jsoniq/runtime/iterator/RuntimeIterator.java
Outdated
Show resolved
Hide resolved
src/main/java/sparksoniq/jsoniq/runtime/iterator/RuntimeIterator.java
Outdated
Show resolved
Hide resolved
src/main/java/sparksoniq/jsoniq/runtime/iterator/RuntimeIterator.java
Outdated
Show resolved
Hide resolved
e3f7b24
to
eba9586
Compare
Java serializer works like a charm. Putting the PR up for review now In a quick test using writeClassAndObject was causing a classNotFound exception as before for the subclass of RuntimeIterator that was used. I didn't investigate this further for the time being. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good.
Can you please make sure there is an open issue so that we'll come back on making Kryo work at some indeterminate point in the future?
Thanks!
Yes, this can be left aside for now. You can include this problem in the issue to not lose this information. |
eba9586
to
5474eb3
Compare