-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include "simple" UTF-8 validation and transcoding logic for interpreted and low-footprint scenarios #48006
Comments
Tagging subscribers to this area: @tarekgh, @krwq, @eiriktsarpalis, @layomia Issue DetailsSee comment at #47928 (comment) for more information. Essentially, the current UTF-8 transcoding and validation logic is "fast" logic, but there may be scenarios (interpreted, low-footprint, etc.) where we'd prefer "simple" logic instead. Imagine a Foo.SpeedOpt.cs and Foo.Simple.cs. @marek-safar Do you know offhand of any scenarios where this might be useful? What runtimes would we compile in each mode?
|
I'm wondering if we should control the size optimizations with a feature switch instead of hardcoding it for specific platforms. We used browser, android, ios, tvos as size-sensitive targets but even on these platforms, someone could prefer speed over size. Related PR #47918 @jkotas @stephentoub @eerhardt thoughts? |
I'd be ok with that as long as the speed-focused targets trim out the size-focused paths as well and not just the other way around, and as long as it doesn't negatively impact performance for the speed-focused targets, e.g. with extra cruft being left in that can't be successfully and cheaply removed by the JIT. |
We should quantify how much we are talking about here. A lot of the vectorized code should be removed by linker already on platforms that do not have hardware accelerated Vectors or hardware intrinsics. There is some small tax left after that, but I do not think it will add up to much. We are paying similar tax by e.g. artificially splitting methods between platform neutral and platform specific parts that is much more pervasive pattern and that it may be more worthy to do something about. |
@jkotas If I get codegen numbers for various changes would that help? My numbers would be for x64 since that's what I build. But presumably we could extrapolate to wasm and other archs. |
I am not sure how you can reliably extrapolate from x64 codegen numbers to impact on wasm. |
It would be something like "the [non-vectorized] x64 codegen went from xxx bytes to yyy bytes." It's not directly comparable to wasm, but I think as a first-order approximation any reduction as a percentage of the original footprint would be consistent across archs. |
Hi @jkotas, @GrabYourPitchforks, There is first attempt PR on simple implementation, which actually passed all unit tests un the CI. It also includes your code review feedback so far. When I build it locally on my ubuntu with Original larger size: 3044864 Open questions
Thanks |
Unfortunately I don't think that would have significant payoff. Much of the complicated code is hidden behind an "is SIMD enabled?" switch, and that entire branch should be pruned for wasm builds. The code that remains should already be fairly small.
I wish I could answer that question. 🙂 There are other people (@eerhardt?) who have a deeper understanding of the size-on-disk requirements. Maybe they could chime in either here or in the original linked issue? |
@pavelsavara - would it be possible to get size measurements (before and after) of your change in a default In general a 6KB uncompressed IL savings turns into roughly 2KB .br compressed savings. But the question is how much of that savings will still be there after trimming the entire app? If the change was "free", I'd definitely think a 1-2KB .br size savings would be "worth it". But as the expense/maintenance cost goes up, the savings need to go up as well. |
Very good estimate @eerhardt :-) Old size
New size
Brotli Difference: 2370 bytes |
I'd definitely take a 2KB size reduction in the default Blazor WASM app! So I guess it is up to the UTF-8 owners (@GrabYourPitchforks?) to decide if the maintenance cost isn't unbearable. If the answer is "the maintenance cost is really great here", I don't think I'd push super hard for 2KB. But if the answer is "yeah, this should be OK to maintain", I'd like to see us move forward with the change. |
Tagging subscribers to 'size-reduction': @eerhardt, @SamMonoRT, @marek-safar, @tannergooding, @CoffeeFlux Issue DetailsSee comment at #47928 (comment) for more information. Essentially, the current UTF-8 transcoding and validation logic is "fast" logic, but there may be scenarios (interpreted, low-footprint, etc.) where we'd prefer "simple" logic instead. Imagine a Foo.SpeedOpt.cs and Foo.Simple.cs. @marek-safar Do you know offhand of any scenarios where this might be useful? What runtimes would we compile in each mode?
|
I don't think it adds significant maintenance burden. It's a clean change with separate implementations kept in separate files. And we have good regression test coverage here. |
Resolved by #49372. |
See comment at #47928 (comment) for more information. Essentially, the current UTF-8 transcoding and validation logic is "fast" logic, but there may be scenarios (interpreted, low-footprint, etc.) where we'd prefer "simple" logic instead. Imagine a Foo.SpeedOpt.cs and Foo.Simple.cs.
@marek-safar Do you know offhand of any scenarios where this might be useful? What runtimes would we compile in each mode?
The text was updated successfully, but these errors were encountered: