Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce stack size of ShortVec in icu_locid, helping reduce Locale & LanguageIdentifier #2084

Closed
sffc opened this issue Jun 16, 2022 · 11 comments
Assignees
Labels
A-performance Area: Performance (CPU, Memory) blocked A dependency must be resolved before this is actionable C-locale Component: Locale identifiers, BCP47 help wanted Issue needs an assignee S-epic Size: Major project (create smaller child issues) T-enhancement Type: Nice-to-have but not required

Comments

@sffc
Copy link
Member

sffc commented Jun 16, 2022

Currently, we define ShortVec to be

enum ShortVec<T> {
    Empty,
    Single(T),
    Multi(Vec<T>),
}

This requires 32 bytes on x86_64: Vec<T> is 3*usize (24 bytes), and the discriminant requires one additional word.

Pending some dependencies, for T = NonZeroUsize and smaller, 16 bytes should be achievable as follows:

enum ShortBoxSlice<T> {
    ZeroOne(Option<T>),
    Multi(Box<[T]>),
}

(thanks @Manishearth and @mikebenfield)

Some notes:

  1. This depends on the Rust upstream PR Use niche-filling optimization even when multiple variants have data. rust-lang/rust#94075 landing first
  2. For this to work for Variant, which is the motivating use case, we need to fix Make Option<TinyAsciiStr> be the same size as TinyAsciiStr #2083 first
  3. It hopefully won't be necessary to expand the Box into an explicit NonZeroPtr pointer and NonZeroUsize length, but that is an option we could pursue if needed
@sffc sffc added T-enhancement Type: Nice-to-have but not required help wanted Issue needs an assignee C-locale Component: Locale identifiers, BCP47 A-performance Area: Performance (CPU, Memory) blocked A dependency must be resolved before this is actionable S-epic Size: Major project (create smaller child issues) labels Jun 16, 2022
@robertbastian
Copy link
Member

If we want to go further than the ShortVec regression we could put the Extensions struct on the heap and cut the size of Locale by 75% (216 to 56 bytes with the current ShortVec).

Once on the heap we can make Extensions ?Sized, which would unlock some optimizations that could offset the extra deref, such as storing the most common extension type (unicode?) inside the struct instead of its own allocation.

@sffc sffc added the backlog label Jun 23, 2022
@pdogr pdogr self-assigned this Aug 11, 2022
@robertbastian
Copy link
Member

robertbastian commented Oct 12, 2022

Edited, had a typo that messed up results before. I didn't sleep much.

I just tried this with today's nightly:

enum ShortVec<T> {
    Empty,
    Single(T),
    Multi(Vec<T>),
}

assert_eq!(size_of::<ShortVec<usize>>(), 32);
assert_eq!(size_of::<ShortVec<NonZeroUsize>>(), 32);

enum ShortVec2<T> {
    Empty,
    Single(T),
    Multi(Box<[T]>),
}

assert_eq!(size_of::<ShortVec2<usize>>(), 24);
assert_eq!(size_of::<ShortVec2<NonZeroUsize>>(), 24);

enum ShortVec3<T> {
    ZeroOne(Option<T>),
    Multi(Box<[T]>),
}

assert_eq!(size_of::<ShortVec3<usize>>(), 24);
assert_eq!(size_of::<ShortVec3<NonZeroUsize>>(), 16); // this is 24 on 1.64

@robertbastian
Copy link
Member

Don't celebrate, this seems very brittle and won't work with TinyAsciiStr:

https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=89c6184a24634bf80c245ffbcbfc0928

@sffc

This comment was marked as resolved.

@sffc
Copy link
Member Author

sffc commented Oct 13, 2022

Oh, wait, it's fine. It should be an array of NonZeroU8 not NonZeroUsize

https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=cd9d9465aa1b58927f7668ac1148e8e5

@robertbastian
Copy link
Member

Oh yeah. But will it work with a struct around the array? 🤷‍♂️

@mikebenfield
Copy link

mikebenfield commented Oct 13, 2022

FWIW, if it's strongly preferable for you all to maintain the 3 variants like

enum ShortVec<T> {
    Empty,
    Single(T),
    Multi(Box<[T]>),
}

and for ShortVec<NonZeroUsize> to still be 16 bytes...

That would actually be achievable with the changes from this PR plus one more easy optimization that maybe I should add to it anyway. However, no guarantees about when that PR gets merged; I'm kind of waiting on something else before I continue work on it.

@robertbastian
Copy link
Member

Yes that would be ideal. I filed rust-lang/rust#102997 about this.

@sffc sffc added this to the Backlog milestone Dec 22, 2022
@sffc sffc removed the backlog label Dec 22, 2022
@robertbastian
Copy link
Member

Let's go with ShortVec3 and not wait on Rust. cc @sffc

@robertbastian
Copy link
Member

@pdogr implemented ShortVec3 but used Vec instead of Box, as otherwise pushing would always reallocate.

If we use Box we probably want to remove push, remove, and insert. @pdogr to investigate the impact of this. Afaict we only push in one location (Value::try_from_bytes), which we can probably customize to not use push but somehow assemble from the iterator with a single allocation.

@robertbastian
Copy link
Member

Closed by #3220

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-performance Area: Performance (CPU, Memory) blocked A dependency must be resolved before this is actionable C-locale Component: Locale identifiers, BCP47 help wanted Issue needs an assignee S-epic Size: Major project (create smaller child issues) T-enhancement Type: Nice-to-have but not required
Projects
None yet
Development

No branches or pull requests

4 participants