Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigInteger parsing optimizations #47842

Merged
merged 4 commits into from
May 9, 2021
Merged

Conversation

jfd16
Copy link
Contributor

@jfd16 jfd16 commented Feb 4, 2021

Improves performance of BigInteger parsing from decimal and hex strings (both speed and memory allocations), according to the benchmark that I have included. The difference is particularly significant for large decimal strings, given that the current implementation allocates two BigInteger instances for each input digit.

Benchmark code

using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using BenchmarkDotNet.Attributes;

public class BigIntegerParseBenchmarks
{
    public IEnumerable<object> NumberStrings =>
        new string[]
        {
            "123",
            int.MinValue.ToString(),
            string.Concat(Enumerable.Repeat("1234567890", 20)),
            string.Concat(Enumerable.Repeat("654719003", 57)),
        };

    public IEnumerable<object> NumberStringsHex =>
        new string[]
        {
            "12A",
            int.MinValue.ToString("X"),
            string.Concat(Enumerable.Repeat("1234567890abcdefffff", 10)),
            string.Concat(Enumerable.Repeat("18a473f5d", 57)),
        };

    [Benchmark]
    [ArgumentsSource(nameof(NumberStrings))]
    public BigInteger Parse(string numberString) => BigInteger.Parse(numberString);

    [Benchmark]
    [ArgumentsSource(nameof(NumberStringsHex))]
    public BigInteger ParseHex(string numberString) => BigInteger.Parse(numberString, NumberStyles.HexNumber);
}

Benchmark results (on master branch)

BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.18363.1316 (1909/November2019Update/19H2)
Intel Core i7-8550U CPU 1.80GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.102
  [Host]     : .NET 6.0.0 (6.0.21.10203), X64 RyuJIT
  Job-NSGODO : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  Toolchain=CoreRun  
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15  
WarmupCount=1  
Method numberString Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
Parse 123 186.5 ns 3.43 ns 2.87 ns 187.0 ns 181.9 ns 192.6 ns 0.0246 - - 104 B
Parse -2147483648 418.0 ns 16.69 ns 18.55 ns 419.0 ns 359.6 ns 442.6 ns 0.0312 - - 136 B
Parse 123456789012(...)901234567890 [200] 17,866.9 ns 193.43 ns 180.93 ns 17,901.3 ns 17,527.7 ns 18,163.3 ns 13.1653 - - 55,176 B
Parse 654719003654(...)003654719003 [513] 69,284.7 ns 643.04 ns 536.97 ns 69,112.3 ns 68,447.0 ns 70,545.0 ns 65.5556 - - 274,272 B
ParseHex 12A 121.8 ns 0.64 ns 0.50 ns 121.7 ns 121.3 ns 123.2 ns 0.0323 - - 136 B
ParseHex 80000000 183.5 ns 1.28 ns 1.07 ns 183.6 ns 182.4 ns 185.5 ns 0.0319 - - 136 B
ParseHex 1234567890ab(...)90abcdefffff [200] 2,520.9 ns 17.84 ns 14.90 ns 2,511.0 ns 2,505.7 ns 2,544.7 ns 0.2609 - - 1,128 B
ParseHex 18a473f5d18a(...)f5d18a473f5d [513] 6,069.5 ns 83.14 ns 77.77 ns 6,045.6 ns 5,938.0 ns 6,202.7 ns 0.7287 - - 3,128 B

Results after changes

BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.18363.1316 (1909/November2019Update/19H2)
Intel Core i7-8550U CPU 1.80GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.102
  [Host]     : .NET 6.0.0 (6.0.21.10203), X64 RyuJIT
  Job-WMUKDU : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  Toolchain=CoreRun  
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15  
WarmupCount=1  
Method numberString Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
Parse 123 126.0 ns 1.81 ns 1.51 ns 125.8 ns 123.7 ns 128.9 ns 0.0248 - - 104 B
Parse -2147483648 176.5 ns 3.55 ns 3.65 ns 174.7 ns 171.9 ns 182.6 ns 0.0322 - - 136 B
Parse 123456789012(...)901234567890 [200] 1,843.5 ns 21.83 ns 20.42 ns 1,843.6 ns 1,819.9 ns 1,875.5 ns 0.2339 - - 984 B
Parse 654719003654(...)003654719003 [513] 5,197.3 ns 23.14 ns 20.51 ns 5,206.4 ns 5,134.6 ns 5,210.8 ns 0.6656 - - 2,792 B
ParseHex 12A 116.3 ns 1.23 ns 1.09 ns 116.2 ns 114.9 ns 118.8 ns 0.0245 - - 104 B
ParseHex 80000000 166.1 ns 1.76 ns 1.56 ns 165.9 ns 164.2 ns 169.3 ns 0.0320 - - 136 B
ParseHex 1234567890ab(...)90abcdefffff [200] 1,705.1 ns 10.51 ns 8.77 ns 1,703.0 ns 1,692.7 ns 1,715.7 ns 0.2362 - - 1,000 B
ParseHex 18a473f5d18a(...)f5d18a473f5d [513] 3,905.5 ns 34.60 ns 30.67 ns 3,907.6 ns 3,866.8 ns 3,975.6 ns 0.6660 - - 2,840 B

@ghost
Copy link

ghost commented Feb 4, 2021

Tagging subscribers to this area: @tannergooding, @pgovind
See info in area-owners.md if you want to be subscribed.

Issue Details

Improves performance of BigInteger parsing from decimal and hex strings (both speed and memory allocations), according to the benchmark that I have included. The difference is particularly significant for large decimal strings, given that the current implementation allocates two BigInteger instances for each input digit.

Benchmark code

using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using BenchmarkDotNet.Attributes;

public class BigIntegerParseBenchmarks
{
    public IEnumerable<object> NumberStrings =>
        new string[]
        {
            "123",
            int.MinValue.ToString(),
            string.Concat(Enumerable.Repeat("1234567890", 20)),
            string.Concat(Enumerable.Repeat("654719003", 57)),
        };

    public IEnumerable<object> NumberStringsHex =>
        new string[]
        {
            "12A",
            int.MinValue.ToString("X"),
            string.Concat(Enumerable.Repeat("1234567890abcdefffff", 10)),
            string.Concat(Enumerable.Repeat("18a473f5d", 57)),
        };

    [Benchmark]
    [ArgumentsSource(nameof(NumberStrings))]
    public BigInteger Parse(string numberString) => BigInteger.Parse(numberString);

    [Benchmark]
    [ArgumentsSource(nameof(NumberStringsHex))]
    public BigInteger ParseHex(string numberString) => BigInteger.Parse(numberString, NumberStyles.HexNumber);
}

Benchmark results (on master branch)

BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.18363.1316 (1909/November2019Update/19H2)
Intel Core i7-8550U CPU 1.80GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.102
  [Host]     : .NET 6.0.0 (6.0.21.10203), X64 RyuJIT
  Job-NSGODO : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  Toolchain=CoreRun  
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15  
WarmupCount=1  
Method numberString Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
Parse 123 186.5 ns 3.43 ns 2.87 ns 187.0 ns 181.9 ns 192.6 ns 0.0246 - - 104 B
Parse -2147483648 418.0 ns 16.69 ns 18.55 ns 419.0 ns 359.6 ns 442.6 ns 0.0312 - - 136 B
Parse 123456789012(...)901234567890 [200] 17,866.9 ns 193.43 ns 180.93 ns 17,901.3 ns 17,527.7 ns 18,163.3 ns 13.1653 - - 55,176 B
Parse 654719003654(...)003654719003 [513] 69,284.7 ns 643.04 ns 536.97 ns 69,112.3 ns 68,447.0 ns 70,545.0 ns 65.5556 - - 274,272 B
ParseHex 12A 121.8 ns 0.64 ns 0.50 ns 121.7 ns 121.3 ns 123.2 ns 0.0323 - - 136 B
ParseHex 80000000 183.5 ns 1.28 ns 1.07 ns 183.6 ns 182.4 ns 185.5 ns 0.0319 - - 136 B
ParseHex 1234567890ab(...)90abcdefffff [200] 2,520.9 ns 17.84 ns 14.90 ns 2,511.0 ns 2,505.7 ns 2,544.7 ns 0.2609 - - 1,128 B
ParseHex 18a473f5d18a(...)f5d18a473f5d [513] 6,069.5 ns 83.14 ns 77.77 ns 6,045.6 ns 5,938.0 ns 6,202.7 ns 0.7287 - - 3,128 B

Results after changes

BenchmarkDotNet=v0.12.1.1466-nightly, OS=Windows 10.0.18363.1316 (1909/November2019Update/19H2)
Intel Core i7-8550U CPU 1.80GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.102
  [Host]     : .NET 6.0.0 (6.0.21.10203), X64 RyuJIT
  Job-WMUKDU : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  Toolchain=CoreRun  
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15  
WarmupCount=1  
Method numberString Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
Parse 123 126.0 ns 1.81 ns 1.51 ns 125.8 ns 123.7 ns 128.9 ns 0.0248 - - 104 B
Parse -2147483648 176.5 ns 3.55 ns 3.65 ns 174.7 ns 171.9 ns 182.6 ns 0.0322 - - 136 B
Parse 123456789012(...)901234567890 [200] 1,843.5 ns 21.83 ns 20.42 ns 1,843.6 ns 1,819.9 ns 1,875.5 ns 0.2339 - - 984 B
Parse 654719003654(...)003654719003 [513] 5,197.3 ns 23.14 ns 20.51 ns 5,206.4 ns 5,134.6 ns 5,210.8 ns 0.6656 - - 2,792 B
ParseHex 12A 116.3 ns 1.23 ns 1.09 ns 116.2 ns 114.9 ns 118.8 ns 0.0245 - - 104 B
ParseHex 80000000 166.1 ns 1.76 ns 1.56 ns 165.9 ns 164.2 ns 169.3 ns 0.0320 - - 136 B
ParseHex 1234567890ab(...)90abcdefffff [200] 1,705.1 ns 10.51 ns 8.77 ns 1,703.0 ns 1,692.7 ns 1,715.7 ns 0.2362 - - 1,000 B
ParseHex 18a473f5d18a(...)f5d18a473f5d [513] 3,905.5 ns 34.60 ns 30.67 ns 3,907.6 ns 3,866.8 ns 3,975.6 ns 0.6660 - - 2,840 B
Author: jfd16
Assignees: -
Labels:

area-System.Numerics

Milestone: -

@dnfadmin
Copy link

dnfadmin commented Feb 4, 2021

CLA assistant check
All CLA requirements met.

@danmoseley
Copy link
Member

The use of ArrayPool is much more common than ArrayPool

Grepped under libraries/*/src/**cs for interest and I see

object - 2
IntPtr, GCHandle - 4 each
char - 118
byte - 301

there are no uses I see for int or uint.

@tannergooding
Copy link
Member

Grepped under libraries/*/src/**cs for interest and I see

I wonder if there is something we can do here to promote better sharing. Particularly when using Span the underlying data type doesn't really matter too much (provided it has the correct alignment)

@jfd16
Copy link
Contributor Author

jfd16 commented Feb 5, 2021

The use of ArrayPool is much more common than ArrayPool

Grepped under libraries/*/src/**cs for interest and I see

object - 2
IntPtr, GCHandle - 4 each
char - 118
byte - 301

there are no uses I see for int or uint.

ArrayPool<int> is being used indirectly in some places through ValueListBuilder: https://github.com/dotnet/runtime/search?q=ValueListBuilder+path%3Asrc

@jfd16 jfd16 force-pushed the bigint-parse-perf branch 3 times, most recently from 3041198 to 2383bd3 Compare February 9, 2021 01:42
@jfd16 jfd16 force-pushed the bigint-parse-perf branch from 2383bd3 to 7b5eb62 Compare February 9, 2021 01:44
{
char c = number.digits[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub, is this an opportunity for an analyzer? Indexing into StringBuilder can be deceptively slow due to its internal chunking.

isNegative = true;
foreach (ReadOnlyMemory<char> digitsChunkMem in number.digits.GetChunks())
{
ReadOnlySpan<char> chunkDigits = digitsChunkMem.Span;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, I wonder if we should have a (internal for now) CurrentSpan property that is slightly more efficient.

We don't really need to return a ReadOnlyMemory when we only want to extract the underlying Span<T>, particularly when the ROM<T> is just being constructed over a backing array.

It's, in practice, no more unsafe than say CollectionsMarshal.AsSpan

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think it can be merged provided @stephentoub has no additional feedback

@danmoseley
Copy link
Member

I wonder if there is something we can do here to promote better sharing. Particularly when using Span the underlying data type doesn't really matter too much (provided it has the correct alignment

Worth an issue do you think @tannergooding

Base automatically changed from master to main March 1, 2021 09:07
@jeffhandley
Copy link
Member

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeffhandley jeffhandley merged commit b3278ca into dotnet:main May 9, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@jfd16 jfd16 deleted the bigint-parse-perf branch May 21, 2021 04:16
@jfd16 jfd16 restored the bigint-parse-perf branch June 16, 2021 02:12
@ghost ghost locked as resolved and limited conversation to collaborators Jul 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants