Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ULEs to be bit-packed rather than byte-packed #1487

Open
sffc opened this issue Jan 9, 2022 · 6 comments
Open

Allow ULEs to be bit-packed rather than byte-packed #1487

sffc opened this issue Jan 9, 2022 · 6 comments
Labels
A-performance Area: Performance (CPU, Memory) C-zerovec Component: Yoke, ZeroVec, DataBake discuss Discuss at a future ICU4X-SC meeting help wanted Issue needs an assignee S-large Size: A few weeks (larger feature, major refactoring) T-enhancement Type: Nice-to-have but not required
Milestone

Comments

@sffc
Copy link
Member

sffc commented Jan 9, 2022

Currently, all ULE values are aligned to a byte boundary. We should consider making them aligned to a bit boundary instead.

This would solve two problems:

  1. Inefficient storage: for example, Script requires 10 bits, which means it needs to use a 16-bit unit, wasting 37% of space. (Restrict Script (and ScriptWithExt) to specific range #1404)
  2. Inability for the upcoming #[derive(ULE)] to efficiently pack a struct containing many small fields. (Proc macros for AsULE and AsVarULE #1079)

There are crates like bitpacking which may provide some prior art. However, I believe that our bitpacking can be less sophisticated, since ZeroVec already returns stack copies of elements in the array.

This may be suited to be a new type, like ZeroVecPacked, rather than re-writing the current class, which is optimized for its current byte-aligned use case.

CC @Manishearth

@sffc sffc added T-enhancement Type: Nice-to-have but not required good first issue Good for newcomers help wanted Issue needs an assignee A-performance Area: Performance (CPU, Memory) C-data-infra Component: provider, datagen, fallback, adapters S-large Size: A few weeks (larger feature, major refactoring) labels Jan 9, 2022
@Manishearth
Copy link
Member

I'm fine with a new type, strongly against making this change to ZeroVec itself since it would trade off performance for size in a much stronger way. We could even make this work in ZeroMap by introducing a Packed wrapper type (i.e. ZeroMap<Packed<Script>, str>)

To me, the primary value proposition of ZeroVec is zero-copy deserialization, efficient encoding is nice to have but if we're trading off performance for it I'd prefer not to.

It's going to be tricky to implement; I suspect what we'll need to do is:

  • add a PACKED_BITS const to ULE (we already plan to add BITS)
  • the ULE trait invariants also say that the first PACKED_BITS bits of Self are the important ones, if there are any leftover they must be zero.
  • ZeroVecPacked has a slightly different internal implementation that does not explicitly store ULEs and does more on-the-fly transmutes

@iainireland
Copy link
Contributor

To me, the primary value proposition of ZeroVec is zero-copy deserialization, efficient encoding is nice to have but if we're trading off performance for it I'd prefer not to.

I agree with this.

@sffc
Copy link
Member Author

sffc commented Jan 10, 2022

As part of the work on this issue, the performance impact of bitpacking should be measured, and weighed against the resulting impact on data size.

@sffc sffc added the backlog label Apr 28, 2022
@sffc sffc added this to the Backlog milestone Dec 22, 2022
@sffc sffc added C-zerovec Component: Yoke, ZeroVec, DataBake and removed backlog C-data-infra Component: provider, datagen, fallback, adapters labels Dec 22, 2022
@sffc
Copy link
Member Author

sffc commented Mar 14, 2024

As a step in this direction, we could add a byte-sized type that is valid for only certain bits, which could be composed into a ULE type, such that the only manual impl required for bit-packing is a (safe) AsULE impl. I think the safe transmute people are working on something like this.

@sffc
Copy link
Member Author

sffc commented Sep 25, 2024

I think we should pull this back up and discuss it in the context of zerovec 1.0 and the holistic design.

@sffc sffc modified the milestones: Backlog ⟨P4⟩, Utilities 1.0 Sep 25, 2024
@sffc sffc added discuss Discuss at a future ICU4X-SC meeting and removed good first issue Good for newcomers labels Sep 25, 2024
@Manishearth
Copy link
Member

I kind of feel like this would probably be a third trait PackedULE and BitPackedZeroVec, I have a hard time seeing how this can work within ZeroVec without a tradeoff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-performance Area: Performance (CPU, Memory) C-zerovec Component: Yoke, ZeroVec, DataBake discuss Discuss at a future ICU4X-SC meeting help wanted Issue needs an assignee S-large Size: A few weeks (larger feature, major refactoring) T-enhancement Type: Nice-to-have but not required
Projects
None yet
Development

No branches or pull requests

3 participants