Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding::encode_mut is very code-size heavy #97

Closed
GnomedDev opened this issue Feb 1, 2024 · 15 comments
Closed

Encoding::encode_mut is very code-size heavy #97

GnomedDev opened this issue Feb 1, 2024 · 15 comments
Milestone

Comments

@GnomedDev
Copy link

Hello, I've just ran cargo-bloat on my project as I'm working on reducing compile times and binary sizes can be a proxy for compile times, and found that data_encoding::Encoding::encode_mut is taking up 76.9kb , the third largest function in my entire project (that pulls reqwest, tungstenite, etc).

If code size isn't a concern for this project, I'm entirely fine with this being closed, but thought that you might want a heads up that this library is somewhat code-size heavy.

@ia0
Copy link
Owner

ia0 commented Feb 1, 2024

Thanks for opening an issue! Yes, binary size is a concern for this library, but only with LTO (because it heavily relies on the actual encoding being known at compile time). Are you measuring bloat with full LTO enabled? If yes, could you provide me reproduction steps? There wasn't any heavy benchmarks in that regard and also no CI to check for regressions, so might be a valid issue :-/

@GnomedDev
Copy link
Author

The reproduction I have is that project, but I don't have enough energy to look into this much further. It uses thin lto, but it's probably a bad idea to rely on LTO in the first place. My suggestion would be to feature-lock each encoding to allow dependents to only enable what they need, avoiding the need to rely on LTO.

@ia0
Copy link
Owner

ia0 commented Feb 18, 2024

I'm back from vacation and could take a look. However I'm not able to reproduce. I tried cargo bloat --release in your repo but I only see:

0.0%   0.1%  16.1KiB   data_encoding data_encoding::Encoding::encode_mut

Could you provide me with exact instructions to reproduce (including commit hash). I have an idea that could easily fix the issue but I want to confirm it actually fixes your problem first.

Thanks!

@ia0
Copy link
Owner

ia0 commented Mar 4, 2024

Thanks! I was able to reproduce on your project with cargo bloat -p route-weaver-router --release. I'll take a look when I get time. I'm quite busy (and a bit sick) at the moment. I'll ping the issue when I have something.

ia0 added a commit that referenced this issue Mar 23, 2024
@ia0
Copy link
Owner

ia0 commented Mar 23, 2024

Sorry for the long delay. I tried to #[inline(always)] the API but the dead code is still not eliminated, so I went with the manual solution of providing features. In the route-weaver-router reproduction case, this means changing the dependency from:

[dependencies]
data-encoding = "2.5"

to:

[dependencies.data-encoding]
version = "2.6"  # not released yet
default-features = false
features = ["std", "base16"]

The code size measured with the text column of cargo size -p route-weaver-router --release reduces from 2029548 to 1965156 bytes (64392 less). The data-encoding functions measured with cargo bloat -p route-weaver-router --release --crates -n50 | grep data_encoding reduce from 72.2KiB to 12.6KiB (59.6KiB less).

It is probably possible to do better but it would require quite some refactoring. Would this change be enough for you?

The change is in #101 and I'll keep it open until I decide whether it should be a breaking change or not. It is technically one for those who disable default features.

@ia0
Copy link
Owner

ia0 commented Mar 24, 2024

Actually, I'll probably postpone submitting that PR until I experiment with another idea that would bring much more benefit. I'm just afraid that it might make the API a bit less usable by having many type parameters (a little bit like RustCrypto does). Because if I'll have to make a major version bump, then let's try to make the most out of it.

@ia0
Copy link
Owner

ia0 commented Mar 29, 2024

Ok, my experiment using StaticEncoding<B: Static<usize>> (and making Encoding a dyn object) seems to provide similar benefits, but doesn't need features. It might still be a breaking change because I'll have to change the constants from const to static. This also would need more work, so not sure how long it will take.

@ia0
Copy link
Owner

ia0 commented Mar 30, 2024

I don't think so. The current design assumes inlining and dead-code elimination to work cross-crate. This seems to work rather well with LTO but not otherwise. The new design I have in mind (I didn't fully test it though) doesn't rely on inlining by adding one layer of indirection (which can be optimized by inlining) and simplifies dead-code elimination by introducing a symbol for each combination, so the linker can always do it regardless of LTO.

ia0 added a commit that referenced this issue Apr 28, 2024
This new module addresses the following breaking changes of #72:
- `static` instead of `const` for pre-defined encodings
- Private `Encoding` implementation with `unsafe` constructor
- Use `MaybeUninit` internally and expose `_uninit` functions
- Use static dispatch instead of dynamic dispatch (fixing #97)
@ia0
Copy link
Owner

ia0 commented Apr 28, 2024

Ok, I got some time to look into this again. I've created the v3-preview branch for now. It will eventually be merged in the main branch. It essentially provides a v3-preview feature that provides a v3_preview module. This module would eventually be the 3.0.0 of data-encoding (but for now is unstable and uses a higher MSRV).

I did the following change:

diff --git a/common/Cargo.toml b/common/Cargo.toml
-data-encoding = "2.5"
+data-encoding = { git = "https://github.com/ia0/data-encoding.git", branch = "v3-preview", features = ["v3-preview"] }
diff --git a/common/src/noise.rs b/common/src/noise.rs
-use data_encoding::HEXLOWER_PERMISSIVE;
+use data_encoding::v3_preview::HEXLOWER_PERMISSIVE;

For the following measurements:

cargo size cargo bloat
before 2093644 72.1KiB
after 2017828 219B
diff 75816 73611

Where "cargo size" is cargo size -p route-weaver-router --release and "cargo bloat" is cargo bloat -p route-weaver-router --release --crates -n100 | grep data_encoding.

So this looks pretty good so far. I'll have to finish adding all features to the v3_preview module and see if I want to add other things. I'll ping here when the v3-preview branch makes it to main, then I'll wait a month or so and do a release, at which point I'll probably close this issue.

ia0 added a commit that referenced this issue Apr 29, 2024
This new module addresses the following breaking changes of #72:
- `static` instead of `const` for pre-defined encodings
- Private `Encoding` implementation with `unsafe` constructor
- Use `MaybeUninit` internally and expose `_uninit` functions
- Use static dispatch instead of dynamic dispatch (fixing #97)
ia0 added a commit that referenced this issue May 1, 2024
This new module addresses the following breaking changes of #72:
- `static` instead of `const` for pre-defined encodings
- Private `Encoding` implementation with `unsafe` constructor
- Use `MaybeUninit` internally and expose `_uninit` functions
- Use static dispatch instead of dynamic dispatch (fixing #97)
ia0 added a commit that referenced this issue May 5, 2024
This new module addresses the following breaking changes of #72:
- `static` instead of `const` for pre-defined encodings
- Private `Encoding` implementation with `unsafe` constructor
- Use `MaybeUninit` internally and expose `_uninit` functions
- Use static dispatch instead of dynamic dispatch (fixing #97)
ia0 added a commit that referenced this issue May 5, 2024
This new module addresses the following breaking changes of #72:
- `static` instead of `const` for pre-defined encodings
- Private `Encoding` implementation with `unsafe` constructor
- Use `MaybeUninit` internally and expose `_uninit` functions
- Use static dispatch instead of dynamic dispatch (fixing #97)
@ia0 ia0 added this to the v3.0.0 milestone May 5, 2024
ia0 added a commit that referenced this issue May 5, 2024
This new module addresses the following breaking changes of #72:
- `static` instead of `const` for pre-defined encodings
- Private `Encoding` implementation with `unsafe` constructor
- Use `MaybeUninit` internally and expose `_uninit` functions
- Use static dispatch instead of dynamic dispatch (fixing #97)
@ia0
Copy link
Owner

ia0 commented May 5, 2024

This is now merged in main. It will eventually be part of 3.0.0 (tracked by #106) but will probably be released as a "preview" module too (to catch possible usability issues).

@ia0 ia0 closed this as completed May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
@ia0 @GnomedDev and others