Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider expanding number formatting cache to larger values #76888

Closed
Tracked by #79053
stephentoub opened this issue Oct 11, 2022 · 8 comments · Fixed by #79061
Closed
Tracked by #79053

Consider expanding number formatting cache to larger values #76888

stephentoub opened this issue Oct 11, 2022 · 8 comments · Fixed by #79061
Assignees
Labels
Milestone

Comments

@stephentoub
Copy link
Member

Several releases ago, we added to number formatting a cache of strings for the values 0 through 9:

private static readonly string[] s_singleDigitStringCache = { "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" };

Since with the default format non-negative values aren't impacted by culture, this cache is used for all numerical types for all single digit values as part of value.ToString(). We should consider expanding the cache to larger values, e.g. up to 300 to account for other frequently used values on success paths, like HTTP success status codes.

The upside to this is significantly less allocation as well as better throughput for formatting such values. The primary downside is increased memory costs for a larger table, e.g. caching all values 0 through 299 would add ~12K in allocated objects (caching all through 1000 would add ~39K). If the full cost is deemed prohibitive, it could be made lazy, such that at the expense of an additional null check on each access, we could populate only the strings actually used. We could also clear the table under certain conditions if it was deemed an issue. And we could choose to not include the table in corelib builds where we expect memory to be at a premium.

cc: @marklio

@stephentoub stephentoub added this to the 8.0.0 milestone Oct 11, 2022
@ghost
Copy link

ghost commented Oct 11, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Several releases ago, we added to number formatting a cache of strings for the values 0 through 9:

private static readonly string[] s_singleDigitStringCache = { "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" };

Since with the default format non-negative values aren't impacted by culture, this cache is used for all numerical types for all single digit values as part of value.ToString(). We should consider expanding the cache to larger values, e.g. up to 300 to account for other frequently used values on success paths, like HTTP success status codes.

The upside to this is significantly less allocation as well as better throughput for formatting such values. The primary downside is increased memory costs for a larger table, e.g. caching all values 0 through 299 would add ~12K in allocated objects (caching all through 1000 would add ~39K). If the full cost is deemed prohibitive, it could be made lazy, such that at the expense of an additional null check on each access, we could populate only the strings actually used. We could also clear the table under certain conditions if it was deemed an issue. And we could choose to not include the table in corelib builds where we expect memory to be at a premium.

cc: @marklio

Author: stephentoub
Assignees: -
Labels:

area-System.Runtime, tenet-performance

Milestone: 8.0.0

@jkotas

This comment was marked as resolved.

@stephentoub

This comment was marked as resolved.

@jkotas

This comment was marked as resolved.

@stephentoub
Copy link
Member Author

stephentoub commented Oct 11, 2022

I think it would end up looking something like this:
main...stephentoub:runtime:smallnumberscache

[HideColumns("Job", "Error", "StdDev", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Bytes
{
    [Params(123)]
    public byte Value { get; set; }

    [Benchmark]
    public string ValueToString() => Value.ToString();
}

[HideColumns("Job", "Error", "StdDev", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Integers
{
    [Params(1, 12, 123, 1234, 12345)]
    public int Value { get; set; }

    [Benchmark]
    public string ValueToString() => Value.ToString();
}
Type Method Toolchain Value Mean Median Ratio Allocated Alloc Ratio
Integers ValueToString \main\corerun.exe 1 1.393 ns 1.392 ns 1.00 - NA
Integers ValueToString \pr\corerun.exe 1 2.344 ns 2.335 ns 1.68 - NA
Integers ValueToString \main\corerun.exe 12 10.003 ns 9.952 ns 1.00 32 B 1.00
Integers ValueToString \pr\corerun.exe 12 2.351 ns 2.318 ns 0.24 - 0.00
Bytes ValueToString \main\corerun.exe 123 11.340 ns 11.200 ns 1.00 32 B 1.00
Bytes ValueToString \pr\corerun.exe 123 1.183 ns 1.185 ns 0.10 - 0.00
Integers ValueToString \main\corerun.exe 123 13.434 ns 11.441 ns 1.00 32 B 1.00
Integers ValueToString \pr\corerun.exe 123 2.191 ns 2.181 ns 0.11 - 0.00
Integers ValueToString \main\corerun.exe 1234 11.536 ns 11.439 ns 1.00 32 B 1.00
Integers ValueToString \pr\corerun.exe 1234 10.444 ns 10.339 ns 0.91 32 B 1.00
Integers ValueToString \main\corerun.exe 12345 11.341 ns 11.260 ns 1.00 32 B 1.00
Integers ValueToString \pr\corerun.exe 12345 11.300 ns 11.150 ns 1.00 32 B 1.00

@EgorBo
Copy link
Member

EgorBo commented Oct 11, 2022

👍
image

from a memory dump of bingsnr (string objects - number of them in the current snapshot). But I can't say if they come from this formatting.

@stephentoub
Copy link
Member Author

stephentoub commented Oct 12, 2022

@EgorBo shared with me the dump he was looking at, and I did some querying of it using clrmd.

It has 8.7M strings, of which 177K are parseable with long.TryParse(s, NumberStyles.None, CultureInfo.InvariantCulture, out long value), and of those, ~25% are for values < 300. Of the 5.2MB consumed by all the number strings, ~1.2MB are consumed by these strings for values <300.

If I increase the threshold from 300 to 1000, that ~25% goes up to ~40%.

@geeknoid
Copy link
Member

Can the size of the cache be determined at startup based on some environmental state?

Clearly, the cache memory is a trivial cost for a service application so in that environment it is warranted to make it large. But for constrained environments, the extra memory may be problematic.

@stephentoub stephentoub self-assigned this Nov 8, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 30, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Dec 2, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jan 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants