-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd: Custom Dictionary Compression Support #140
Comments
Support for using dictionaries is on the horizon - but not on the top of the list. Doesn't seem like a huge task, mainly a question of initializing the encoder/decoder. Decoder: Line 339 in 415e534
Line 432 in 415e534
Encoder: Line 59 in 54eaa44
The encoder models would also need to have a funciton added that indexes a blob of bytes as history. Should be fairly trivial. I don't have plans for creating dictionaries. I wouldn't expect this to be a trivial task. |
@klauspost Ah ok. Creating dictionaries is what I need. Maybe I'll try and find the time to investigate that task and see how hard it would be. Thanks! |
Decompression dictionary support has been added: https://github.com/klauspost/compress/tree/master/zstd#dictionaries |
Any timeline for encoding support? |
@rs No, no concrete timeline. |
@richardartoul @rs You can test out #281 - fuzz tests are now stable, but of course any testing helps. |
Performed a quick test and it works. Performance is bad compared to a simple deflate tho, I'm not sure why. I'm working on small payloads. I tried with different compression levels or smaller dictionaries, the compression ratio stays better than deflate but compression time is two orders or magnitude slower, even with the fastest compression level and a tiny dictionary. |
@rs Yes. it will be slower since more state needs to be initialized. What is the actual difference you are seeing? I do believe initialization is currently being done twice for small blocks, so some can be clawed back. Other stuff is unavoidable. Since small blocks now suddenly have a (potentially big) history we cannot take some of the shortcuts we do for standalone blocks.. Edit: Of course your actual code would also help. |
@rs Fixed most of the things
Same with no dictionaries:
|
So the typical setup "price" is around 0.01ms/operation. An Interesting sidepoint is that |
Apologies for commenting on an old issue, but before opening a new issue I wanted to ask here, are there any plans, or has anyone already written the ability to train a zstd dict in Go? I have a use case I'd love to try zstd dicts for but would prefer to avoid calling out to the binary. |
Follow up in #682 |
Thanks so much for the super fast reply! I think I can validate whether dicts would work for my use case without, but if they do and I happen to look into implementing them I’ll make sure to communicate it on the discussion! |
First off, thanks for writing a pure Go implementation! My team has wanted to use zstd in our project for a long time now, but have been trying to avoid having any c-Go dependencies.
We have one use-case in particular that would really benefit from the ability to train and use custom dictionaries on the fly.
Is that feature on your roadmap anytime soon? and if not, how challenging do you think it would be for me to try upstream it? I'm happy to contribute some engineering work.
Cheers,
Richie
The text was updated successfully, but these errors were encountered: