[Enhancement] Allow custom string dictionary + use location of repeated strings #122

joetex · 2023-11-09T03:47:56Z

I didn't know this package existed, and wrote my own, doh. But mine is scoped too heavily for my project. I want to switch to something a bit more flexible that has community support, and would love to get some of the reductions I implemented. In msgpackr for strings, I've been unable to get any boost from bundleStrings, which was odd.

Enhancements:

Allow custom string dictionary. An array of commonly used strings that is fed identically to both Packr and Unpackr. It should only take up two bytes per string to lookup against this table for dictionary length of 255.
Store location of repeated strings instead of encoding strings twice. If "hello" gets encoded at byte position 53, and the serializer sees "hello" again later, it should just encode the location position 53 for that 2nd "hello". Again, taking only 2 bytes or more if distance is greater than 255.

Feel free to see my own awful implementation, acos-json-encoder.
Edit: link goes to line where I implemented

The text was updated successfully, but these errors were encountered:

lmachens · 2024-04-08T11:00:34Z

I am looking for this enhancement too.
@kriszyp can you check this request?

I think the bundleStrings could be optimized by not saving the same string multiple times.

kriszyp · 2024-04-08T12:05:46Z

You might consider using CBOR packing, which was designed for this purpose:
https://github.com/kriszyp/cbor-x?tab=readme-ov-file#cbor-packing
However, this will only find exact string value duplicates (no duplicates within string, it won't do any compression of {foo: 'hello', bar: 'hello world'}. For more general string deduping, that is kind of the whole point RLE compression, and there are plenty of great compression formats and tools which are much better than anything msgpack could offer.

lmachens · 2024-04-08T12:37:23Z

@kriszyp Thank you very much!
This is exactly what I was looking for. I only need to find exact string duplicates.

Great results!
original size: 386275 bytes
msgpackr (with useRecords): 101464 bytes
cbor (with useRecords and pack): 61865 bytes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Allow custom string dictionary + use location of repeated strings #122

[Enhancement] Allow custom string dictionary + use location of repeated strings #122

joetex commented Nov 9, 2023 •

edited

Loading

lmachens commented Apr 8, 2024 •

edited

Loading

kriszyp commented Apr 8, 2024

lmachens commented Apr 8, 2024

[Enhancement] Allow custom string dictionary + use location of repeated strings #122

[Enhancement] Allow custom string dictionary + use location of repeated strings #122

Comments

joetex commented Nov 9, 2023 • edited Loading

lmachens commented Apr 8, 2024 • edited Loading

kriszyp commented Apr 8, 2024

lmachens commented Apr 8, 2024

joetex commented Nov 9, 2023 •

edited

Loading

lmachens commented Apr 8, 2024 •

edited

Loading