[SNAP-1136] Pooled version of Kryo serializer which works for closures #426

sumwale · 2016-11-25T11:51:16Z

Changes proposed in this pull request

new PooledKryoSerializer that does pooling of Kryo objects (else performance is bad if
new instance is created for every call which needs to register and walk tons of classes)
has an overridden version for ASCII strings to fix (readClassAndObject throws KryoException: Buffer underflow. EsotericSoftware/kryo#128);
currently makes a copy but will be modified to use one extra byte to indicate end of string
optimized external serializers for StructType, and Externalizable having readResolve() method;
using latter for StorageLevel and BlockManagerId
added optimized serialization for the closure used by SparkSQLExecuteImpl (now a proper class instead)
fixed index column determination in RowFormatRelation (was off by 1 due to 0 based vs 1 based)

Patch testing

precheckin with spark precheckin in progress

ReleaseNotes.txt changes

NA

Other PRs

TIBCOSoftware/snappy-spark#27

- new PooledKryoSerializer that does pooling of Kryo objects (else performance is bad if new instance is created for every call which needs to register and walk tons of classes) - has an overridden version for ASCII strings to fix (EsotericSoftware/kryo#128); currently makes a copy but will be modified to use one extra byte to indicate end of string - optimized external serializers for StructType, and Externalizable having readResolve() method; using latter for StorageLevel and BlockManagerId - added optimized serialization for the closure used by SparkSQLExecuteImpl (now a proper class instead) - fixed index column determination in RowFormatRelation (was off by 1 due to 0 based vs 1 based)

sumwale · 2016-11-25T12:00:12Z

run precheckin

…ce it does not use Lead API formatting changes and fixed some compiler warnings

- Kryo serialization for RowFormatScanRDD, SparkShellRowRDD, ColumnarStorePartitionedRDD, SparkShellCachedBatchRDD and MultiBucketExecutorPartition - added base RDDKryo to encapsulate serialization of bare minimum fields in RDD (using reflection where required) - removed unused SparkShellRDDHelper.mapBucketsToPartitions

correcting an issue in SparkSQLExecuteImpl avoid sending ConnectionProperties in RowFormatScanRDD for cases where it is not required in execution updated log4j.properties for core/cluster tests removed LaunchTasks from PooledKryoSerializer which is not yet in the code base change Attribute to StructField in columns decoders since StructType has an efficient serializer as well as being cleaner since it doesn't depend on Attribute (with potentially invalid ExprId for remote node though those fields are not used) removing "a." alias in ORDER BY for Query36 in NorthWind tests (due to fix for SPARK-17863 as reported in SNAP-1183)

- renamed passed partitioning to "parts" to avoid conflict with RDD.partitions that caused infinite loop in getPartitions - renamed TypeUtils to TypeUtilities to fix docs error (due to conflict with Spark TypeUtils)

- updating spark link to fix AQP dunits with the new kryo serializer - skip DUnitSingleTest from the aqp test target since those really are dunits which should not be run like normal junit tests

- re-create snappy catalog connection for MetaException failures too (message says "... we don't support retries ...") - clear the serializer/codec system properties when stopping Spark so that these are not carried through to subsequent tests in same JVMs

sumwale · 2016-11-27T22:09:23Z

run precheckin

soubhik-c · 2016-11-30T12:52:12Z

for my understanding, could you please tell how kryo serialization in core is compatible with stock spark where none of the kryo serialization in our fork is there yet.

sumwale · 2016-11-30T13:22:01Z

In upstream Spark core kryo serialization has been disabled for closures (SPARK-12414, apache/spark#11150) and is only for data. Our changes to snappy-spark make it usable for closures, as well as for Spark Netty messages which was not configurable earlier. Secondly the PooledKryoSerializer has the fix for string overwrite issue (EsotericSoftware/kryo#128) and has been updated to 4.0.0 to get the fix for EsotericSoftware/kryo#342 so it corrects those issues that can get hit. However, with only data serialization these issues are also not hit so this works with upstream Spark.

In addition this is not enabled by default when working with upstream spark. It gets set only in LeadImpl and ExecutorInitiator by default. Users can enable it for split mode too using the "spark.serializer" property but that has not been tested (though I believe it should work fine as per the comments above).

sumwale assigned kneeraj, hbhanawat and soubhik-c Nov 25, 2016

Sumedh Wale added 8 commits November 26, 2016 03:17

set serializer/codec options explicitly in ClusterManagerTestBase sin…

1644b1c

…ce it does not use Lead API formatting changes and fixed some compiler warnings

Merge remote-tracking branch 'origin/master' into SNAP-1136

1fc90d8

fixing precheckin failures

30e331a

- renamed passed partitioning to "parts" to avoid conflict with RDD.partitions that caused infinite loop in getPartitions - renamed TypeUtils to TypeUtilities to fix docs error (due to conflict with Spark TypeUtils)

fixes for remaining precheckin failure

d420fb3

- updating spark link to fix AQP dunits with the new kryo serializer - skip DUnitSingleTest from the aqp test target since those really are dunits which should not be run like normal junit tests

Merge remote-tracking branch 'origin/master' into SNAP-1136

cf30eec

Sumedh Wale added 2 commits November 28, 2016 16:28

Merge remote-tracking branch 'origin/master' into SNAP-1136

d5d5826

reverting NWQueries change which will come with SNAP-1183

e59cc95

sumwale merged commit 9e75292 into master Nov 28, 2016

sumwale deleted the SNAP-1136 branch November 28, 2016 11:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SNAP-1136] Pooled version of Kryo serializer which works for closures #426

[SNAP-1136] Pooled version of Kryo serializer which works for closures #426

sumwale commented Nov 25, 2016

sumwale commented Nov 25, 2016

sumwale commented Nov 27, 2016

soubhik-c commented Nov 30, 2016

sumwale commented Nov 30, 2016

[SNAP-1136] Pooled version of Kryo serializer which works for closures #426

[SNAP-1136] Pooled version of Kryo serializer which works for closures #426

Conversation

sumwale commented Nov 25, 2016

Changes proposed in this pull request

Patch testing

ReleaseNotes.txt changes

Other PRs

sumwale commented Nov 25, 2016

sumwale commented Nov 27, 2016

soubhik-c commented Nov 30, 2016

sumwale commented Nov 30, 2016