Increase in SnappyOutputStream output size after #82

It appears that the size of the compressed output generated by SnappyOutputStream increased between versions 1.1.1.1 and 1.1.1.2.  To see this, I ran a microbenchmark which serializes 1000 integers using Java serialization, compresses the result using a SnappyOutputStream, and reports the serialized size.

You can find the full source of my benchmark at https://gist.github.com/JoshRosen/f2b568662c3c6011df08.  I've included a script that runs my benchmark against all recently-published snappy-java versions.  Here are the results:

```
1.1.1.6    489
1.1.1.5    489
1.1.1.4
1.1.1.3    489
1.1.1.2    489
1.1.1.1    386
1.1.1    386
1.1.1-M4    386
1.1.1-M3    386
1.1.1-M2    386
1.1.1-M1    386
1.1.0.1    386
1.1.0    386
1.1.0-M4    386
1.1.0-M3    386
1.1.0-M2    386
1.1.0-M1    386
1.0.x
1.0.5.4    386
1.0.5.3    386
1.0.5.2    386
1.0.5.1    386
1.0.5    386
1.0.5-M4    386
1.0.5-M3    386
1.0.5-M2    386
1.0.5-M1    386
```

Based on this, it looks like the compression size got worse between 1.1.1.1 and 1.1.1.2.  When I compare the commits between these versions (https://github.com/xerial/snappy-java/compare/1.1.1...1.1.1.2), it looks like the only change was #82.

This result might be workload-dependent, so it may be worth investigating this with other benchmarks.  I discovered this issue while investigating https://issues.apache.org/jira/browse/SPARK-5081, a Spark bug in which the size of shuffle data increased across Spark versions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase in SnappyOutputStream output size after #82 #100

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Increase in SnappyOutputStream output size after #82 #100

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions