Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rdd execution support for function Items #353

Merged
merged 8 commits into from
Nov 2, 2019
Merged

Conversation

CanBerker
Copy link
Collaborator

@CanBerker CanBerker commented Oct 31, 2019

  • Function Items are refactored to store RuntimeIterators rather than body expressions
  • FunctionItemIterator renamed to FunctionRuntimeIterator for consistency
  • Types references by FunctionItem (FunctionItem, FunctionIdentifier, SequenceType, SequenceType.arity and ItemType) added to kryo serialization
  • Dynamic Function Call - FunctionItem input verification improved
  • forClauseSparkIterator's was failing to pass Dynamic context properly. Implementation is changed to match other clause iterators.
  • RuntimeIterator's hasNext, isOpen and DynamicContext fields are made transient and removed from Kryo serialization.
  • FunctionItem refactored to use Java.io serialization instead of Kryo for serializing the runtimeIterator of the function body.

@@ -45,12 +45,15 @@

Copy link
Member

@ghislainfourny ghislainfourny Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check whether Kryo was even used before? It may be that iterators were automatically serialized with Java IO when used in Spark closures, and that Kryo serialization was never tested. I am not sure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried setting breakpoints for the write and read methods of RuntimeIterator but nothing was hit when i tried a few queries manually and our regular tests as well. I also tried to falsify the values in the read method (so that isOpen and hasNext would be set to true upon object instantiation) but code ran normally again.

When I tried to set similar breakpoints for write and read methods of BooleanItem, the breakpoints were actually hit in some of the tests so I believe that the kryo serialization for RuntimeIterators are in fact not used.

Please let me know if you have a better testing method in mind or if there is something missing in my understanding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A place to look at would be the Spark configuration. There are two things that should be configurable: the serialization of results (the Items in the RDDs), which Stefan made use Kryo already during his thesis (plus you did the same for manual serialization in DataFrames), and on the other hand the serialization of all closures passed to Spark in transformations (and this is were the Iterators are serialized), which we are unsure uses Java IO or Kryo.

While I do not have the exact parameters on my mind, this may be a useful reading to start with:

https://ogirardot.wordpress.com/2015/01/09/changing-sparks-default-java-serialization-to-kryo/

However, since for now the matter with FunctionItems is fixed with using Java IO for the iterator, there is no urgency in this. You can look into this when you see fit.

@CanBerker CanBerker force-pushed the functionRefactoring branch from e3f7b24 to eba9586 Compare October 31, 2019 17:00
@CanBerker
Copy link
Collaborator Author

CanBerker commented Oct 31, 2019

Java serializer works like a charm. Putting the PR up for review now

In a quick test using writeClassAndObject was causing a classNotFound exception as before for the subclass of RuntimeIterator that was used. I didn't investigate this further for the time being.

@CanBerker CanBerker changed the title WIP: Rdd execution support for function Items Rdd execution support for function Items Oct 31, 2019
Copy link
Member

@ghislainfourny ghislainfourny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good.

Can you please make sure there is an open issue so that we'll come back on making Kryo work at some indeterminate point in the future?

Thanks!

@ghislainfourny
Copy link
Member

Java serializer works like a charm. Putting the PR up for review now

In a quick test using writeClassAndObject was causing a classNotFound exception as before for the subclass of RuntimeIterator that was used. I didn't investigate this further for the time being.

Yes, this can be left aside for now. You can include this problem in the issue to not lose this information.

@CanBerker CanBerker force-pushed the functionRefactoring branch from eba9586 to 5474eb3 Compare November 2, 2019 11:19
@CanBerker CanBerker merged commit 8452870 into master Nov 2, 2019
@CanBerker CanBerker deleted the functionRefactoring branch November 2, 2019 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants