-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable compressRunOnSerialization for Roaring bitmaps. #3228
Conversation
dd7a743
to
be4d494
Compare
This is important. |
Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience, 2016. |
👍 |
|Field|Type|Description|Required| | ||
|-----|----|-----------|--------| | ||
|`type`|String|Must be `roaring`.|yes| | ||
|`compressRunOnSerialization`|Boolean|Use a run-length encoding where it is estimated as more space efficient.|no (default == `false`)| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to not enable RLE by default ?
LGTM, 👍 , would be nice to make RLE enable by default in case it does not have any significant overheads. |
@nishantmonu51 The main reason I didn't make it default was that the discussion in metamx/bytebuffer-collections#15 indicated that the RLE containers are not readable by pre 0.5 versions of roaring. Druid updated to 0.5 in 0.8.2 (#1759), so that means bitmaps written with this option won't be readable by Druid 0.8.1 or earlier. Maybe that's fine? @lemire is the above correct? |
No, there should be no significant downside to activating run compression. In theory, there is a tiny overhead during serialization, but it is small, and often more than compensated by storage savings so that the end result is faster overall serialization. In the worst case, run compression brings little storage benefit and slightly lower performance, but there should be many cases where you have a large net win. We recommend it as the default. As for compatibility problems... I think that what you are saying is correct...
... but I am not sure there is a problem. Roaring is backward compatible. So the library will happily load bitmaps without run compression. The older versions of the library will balk at bitmaps with run compression, but then why would it ever have to read them? It would only be a problem if someone upgraded Druid, rewrote the indexes, and then downgraded Druid... then you'd get (clean) exceptions (something about an unrecognized cookie). If this ever happened and you wanted to make you index "go back in time", that is, undo the added compression, the Roaring API got you covered if you ever want to support this... https://github.com/RoaringBitmap/RoaringBitmap/blob/master/src/main/java/org/roaringbitmap/buffer/MutableRoaringBitmap.java#L1194 Please note that Roaring uses header cookies. This comes at a tiny cost storage-wise, but it has the benefit that the Roaring bitmaps are "self described". It limits the risk regarding data corruption. That is, you don't get Roaring trying to load and interpret garbage... instead, it balks after reading one word and throws up an exception which you can quickly interpret. |
Yeah, that's the case I was thinking of. It can happen when someone updates a cluster, decides something about it isn't working for them after a few days, and rolls it back. Maybe that's fine though, we could point this out in the release notes and anyone concerned about that could set the property to false until they're confident they won't need to roll back. I'm OK with setting the default to "true" and including something in the release notes. Anyone care to chime in on that plan? |
@gianm 👍 We only officially support version-to-version upgrades, not jumping around. So as long as a rollback can occur between, say, 0.9.2 and 0.9.1, I think its OK. Anyone expecting to roll back to 0.8.1 from 0.9.2 has other problems. |
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
be4d494
to
4223e7e
Compare
@fjy @nishantmonu51 @drcrallen @lemire I changed the default to true and updated the docs |
thanks @gianm |
To get the benefit of the changes in metamx/bytebuffer-collections#15 we need to pass compressRunOnSerialization = true.