Improve performance of BigInteger.Multiply(large, small) #92208

kzrnm · 2023-09-18T06:29:14Z

BigInteger.Multiply is based on Karatsuba algorithm. If implemented correctly, the computational complexity of multiply is $\Theta(n^{\log_2 3})$ where n is number of digits.

However, in the current implementation, it is not. This is because it the half of the smaller value is used when the larger one should be.

runtime/src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.SquMul.cs

Line 214 in 353d5ea

int n = right.Length >> 1;

Benchmark


BenchmarkDotNet v0.13.8, Windows 11 (10.0.22621.2283/22H2/2022Update/SunValley2)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.100-rc.1.23415.11
  [Host]   : .NET 8.0.0 (8.0.23.41404), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.0 (8.0.23.41404), X64 RyuJIT AVX2

Job=ShortRun  Toolchain=.NET 8.0  IterationCount=3  
LaunchCount=1  WarmupCount=3

Method	largeLength	smallLength	Mean	Error	StdDev	Ratio	RatioSD
PR_Multiply	1000	1000	0.0375 ms	0.0442 ms	0.0024 ms	1.37	0.09
Multiply	1000	1000	0.0274 ms	0.0002 ms	0.0000 ms	1.00	0.00

PR_Multiply	10000	1000	0.3143 ms	0.0750 ms	0.0041 ms	0.58	0.01
Multiply	10000	1000	0.5409 ms	0.0160 ms	0.0009 ms	1.00	0.00

PR_Multiply	10000	10000	1.1306 ms	0.2769 ms	0.0152 ms	1.29	0.02
Multiply	10000	10000	0.8797 ms	0.0351 ms	0.0019 ms	1.00	0.00

PR_Multiply	100000	1000	3.0304 ms	0.1180 ms	0.0065 ms	0.52	0.00
Multiply	100000	1000	5.7730 ms	0.6991 ms	0.0383 ms	1.00	0.00

PR_Multiply	100000	10000	11.0477 ms	0.7075 ms	0.0388 ms	0.23	0.00
Multiply	100000	10000	48.9797 ms	1.8604 ms	0.1020 ms	1.00	0.00

PR_Multiply	100000	100000	39.4357 ms	0.2381 ms	0.0131 ms	1.25	0.00
Multiply	100000	100000	31.6607 ms	0.2959 ms	0.0162 ms	1.00	0.00

PR_Multiply	1000000	1000	30.2207 ms	1.4899 ms	0.0817 ms	0.48	0.00
Multiply	1000000	1000	63.0998 ms	10.1509 ms	0.5564 ms	1.00	0.00

PR_Multiply	1000000	10000	120.0384 ms	8.3355 ms	0.4569 ms	0.21	0.00
Multiply	1000000	10000	582.8313 ms	39.3648 ms	2.1577 ms	1.00	0.00

PR_Multiply	1000000	100000	448.9208 ms	3.7954 ms	0.2080 ms	0.09	0.00
Multiply	1000000	100000	5,091.4759 ms	151.9431 ms	8.3285 ms	1.00	0.00

PR_Multiply	1000000	995000	1,608.9999 ms	75.3231 ms	4.1287 ms	1.09	0.00
Multiply	1000000	995000	1,482.7404 ms	58.9864 ms	3.2332 ms	1.00	0.00

PR_Multiply	1000000	1000000	1,627.7224 ms	63.8822 ms	3.5016 ms	1.35	0.01
Multiply	1000000	1000000	1,209.0913 ms	45.2731 ms	2.4816 ms	1.00	0.00

public class BigIntegerMultiplyBenchmark
{
    static ReadOnlySpan<byte> MakeBytes(int length)
    {
        var random = new Random(918);
        var bytes = new byte[length];
        random.NextBytes(bytes);
        return bytes;
    }

    public IEnumerable<object[]> LengthArguments()
    {
        var lengths = new int[] { 1000, 10000, 100000, 1000000 };
        for (int i = lengths.Length - 1; i >= 0; i--)
        {
            if (i == lengths.Length - 1)
            {
                yield return new object[] { lengths[i], (int)(0.995 * lengths[i]), };
            }
            for (int j = i; j >= 0; j--)
            {
                yield return new object[] { lengths[i], lengths[j], };
            }
        }
    }

    [Benchmark]
    [ArgumentsSource(nameof(LengthArguments))]
    public PrBigInteger PR_Multiply(int largeLength, int smallLength)
    {
        return (new PrBigInteger(MakeBytes(smallLength)) * new PrBigInteger(MakeBytes(largeLength)));
    }

    [Benchmark(Baseline = true)]
    [ArgumentsSource(nameof(LengthArguments))]
    public BigInteger Multiply(int largeLength, int smallLength)
    {
        return (new BigInteger(MakeBytes(smallLength)) * new BigInteger(MakeBytes(largeLength)));
    }
}

ghost · 2023-09-18T06:29:25Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

BigInteger.Multiply is based on Karatsuba algorithm. If implemented correctly, the computational complexity of multiply is $\Theta(n^{\log_2 3})$ where n is number of digits.

However, in the current implementation, it is not. This is because it the half of the smaller value is used when the larger one should be.

runtime/src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.SquMul.cs

Line 214 in 353d5ea

int n = right.Length >> 1;

In this PR, the larger one is used. The reason for using ceiling value is to ensure that rightLow.Length is larger than or equal to rightHigh.Length.

https://github.com/dotnet/runtime/blob/ccc9ccfb51df6c914ae8e51f04e49e1aa8b41a16/src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.SquMul.cs#L214

Benchmark


BenchmarkDotNet v0.13.8, Windows 11 (10.0.22621.2283/22H2/2022Update/SunValley2)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.100-rc.1.23415.11
  [Host]   : .NET 8.0.0 (8.0.23.41404), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.0 (8.0.23.41404), X64 RyuJIT AVX2

Job=ShortRun  Toolchain=.NET 8.0  IterationCount=3  
LaunchCount=1  WarmupCount=3

Method	LargeLength	SmallLength	Mean	Error	StdDev
PR_Multiply	100000	1000	2.971 ms	0.0429 ms	0.0024 ms
Multiply	100000	1000	5.473 ms	0.0598 ms	0.0033 ms

PR_Multiply	100000	10000	11.227 ms	0.6197 ms	0.0340 ms
Multiply	100000	10000	48.625 ms	0.5162 ms	0.0283 ms

PR_Multiply	100000	100000	40.675 ms	0.7527 ms	0.0413 ms
Multiply	100000	100000	31.146 ms	0.9128 ms	0.0500 ms

PR_Multiply	500000	1000	15.168 ms	0.3428 ms	0.0188 ms
Multiply	500000	1000	28.169 ms	1.0945 ms	0.0600 ms

PR_Multiply	500000	10000	61.152 ms	1.8457 ms	0.1012 ms
Multiply	500000	10000	265.648 ms	9.8196 ms	0.5382 ms

PR_Multiply	500000	100000	232.004 ms	7.7583 ms	0.4253 ms
Multiply	500000	100000	2,119.005 ms	73.9840 ms	4.0553 ms

PR_Multiply	1000000	1000	30.607 ms	2.0196 ms	0.1107 ms
Multiply	1000000	1000	60.214 ms	3.2863 ms	0.1801 ms

PR_Multiply	1000000	10000	122.832 ms	7.3378 ms	0.4022 ms
Multiply	1000000	10000	576.569 ms	104.5699 ms	5.7318 ms

PR_Multiply	1000000	100000	462.791 ms	27.2802 ms	1.4953 ms
Multiply	1000000	100000	5,094.210 ms	349.3127 ms	19.1470 ms

public class Benchmark
{
    [Params(100000, 500000, 1000000)]
    public int LargeLength { get; set; }

    [Params(1000, 10000, 100000)]
    public int SmallLength { get; set; }

    byte[] bytes1, bytes2;

    [GlobalSetup]
    public void Setup()
    {
        var random = new Random(918);
        bytes1 = new byte[LargeLength];
        bytes2 = new byte[SmallLength];
        random.NextBytes(bytes1);
        random.NextBytes(bytes2);
    }

    [Benchmark]
    public PrBigInteger PR_Multiply()
    {
        return (new PrBigInteger(bytes1) * new PrBigInteger(bytes2));
    }

    [Benchmark]
    public BigInteger Multiply()
    {
        return (new BigInteger(bytes1) * new BigInteger(bytes2));
    }
}

Author:	kzrnm
Assignees:	-
Labels:	`area-System.Numerics`, `community-contribution`
Milestone:	-

tannergooding · 2023-09-20T17:14:23Z

Could you add the benchmark to https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Runtime.Numerics/Perf.BigInteger.cs (or ensure the existing Multiply benchmark sufficiently covers the scenario)?

Changes in general LGTM, just want to ensure we have some perf numbers before we merge so it can be correctly tracked in our historical data.

adamsitnik

The changes LGTM.

I've used the benchmarks provided by @kzrnm in dotnet/performance#3361 and run them on my PC. For large inputs, where right is half of the left size the gains are up to 60%. For other test cases the difference is within the range of error.

BenchmarkDotNet v0.13.10-nightly.20231019.90, Windows 11 (10.0.22621.2428/22H2/2022Update/SunValley2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-alpha.1.23531.2
  [Host]     : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT AVX2
          PR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
        main : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Job	arguments	Mean	Ratio
Multiply	PR	1024,1024 bits	879.885 ns	1.01
Multiply	main	1024,1024 bits	868.360 ns	1.00

Multiply	PR	1024,512 bits	496.243 ns	1.02
Multiply	main	1024,512 bits	484.839 ns	1.00

Multiply	PR	16,16 bits	8.754 ns	0.99
Multiply	main	16,16 bits	9.107 ns	1.00

Multiply	PR	16,8 bits	8.457 ns	0.88
Multiply	main	16,8 bits	9.658 ns	1.00

Multiply	PR	65536,32768 bits	517,364.993 ns	0.38
Multiply	main	65536,32768 bits	1,364,740.916 ns	1.00

Multiply	PR	65536,65536 bits	776,079.607 ns	1.01
Multiply	main	65536,65536 bits	771,191.496 ns	1.00

Thank you for your contribution @kzrnm !

adamsitnik · 2023-11-03T15:34:46Z

src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.SquMul.cs

+                    ulong carry = 0UL;
+                    for (int j = 0; j < left.Length; j++)
+                    {
+                        ref uint elementPtr = ref Unsafe.Add(ref resultPtr, i + j);
+                        ulong digits = elementPtr + carry + (ulong)left[j] * right[i];


nit: we could most likely get minor improvement by hoisting the result of right[i] (however I am not 100% sure that JIT does not perform this optimization already)

Suggested change

ulong carry = 0UL;

for (int j = 0; j < left.Length; j++)

{

ref uint elementPtr = ref Unsafe.Add(ref resultPtr, i + j);

ulong digits = elementPtr + carry + (ulong)left[j] * right[i];

ulong carry = 0UL;

uint right_i = right[i];

for (int j = 0; j < left.Length; j++)

{

ref uint elementPtr = ref Unsafe.Add(ref resultPtr, i + j);

ulong digits = elementPtr + carry + (ulong)left[j] * right_i;

adamsitnik · 2023-11-03T15:51:32Z

src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.SquMul.cs

+                        upperRight.Clear();
+
+                        Multiply(left, rightHigh, upperRight);


Multiply does not use upperRight as an input and it's going to overwrite all values starting from 0 to left.Length:

runtime/src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.SquMul.cs

Lines 143 to 149 in e2ce987

for ( ; i < left.Length; i++)

{

ulong digits = (ulong)left[i] * right + carry;

bits[i] = unchecked((uint)digits);

carry = digits >> 32;

}

bits[i] = (uint)carry;

So we can reduce the clear to only last element (this span has left.Length + 1 elements)

Suggested change

upperRight.Clear();

Multiply(left, rightHigh, upperRight);

// Multiply has set 0..left.Length elements, the size is left.Length+1

// We need to zero the last element to make sure it does not contain any garbage.

Multiply(left, rightHigh, upperRight);

upperRight[^1] = 0;

dotnet-issue-labeler bot added the area-System.Numerics label Sep 18, 2023

ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 18, 2023

kzrnm force-pushed the fix/BigIntegerMultiply branch 2 times, most recently from 396fa2d to 20c5d77 Compare September 18, 2023 06:38

Improve performance of BigInteger.Multiply(large, small)

68e9567

kzrnm force-pushed the fix/BigIntegerMultiply branch 2 times, most recently from 87bc8fc to 68e9567 Compare September 18, 2023 10:26

Optimize Karatsuba boundary

bc1aa19

kzrnm force-pushed the fix/BigIntegerMultiply branch from dc4f0b6 to bc1aa19 Compare September 18, 2023 15:49

build-analysis bot mentioned this pull request Sep 18, 2023

[8.0] Expected and actual version of WASI SDK does not match. Please delete /usr/local/wasi-sdk/ folder to provision a new version. #92233

Closed

kzrnm added a commit to kzrnm/performance that referenced this pull request Sep 21, 2023

Add BigInteger.Multiply benchmark for dotnet/runtime#92208

150f7ba

kzrnm added a commit to kzrnm/performance that referenced this pull request Sep 21, 2023

Add BigInteger.Multiply benchmark for dotnet/runtime#92208

c244a91

kzrnm added a commit to kzrnm/performance that referenced this pull request Sep 21, 2023

Add BigInteger.Multiply benchmark for dotnet/runtime#92208

723814c

kzrnm mentioned this pull request Sep 21, 2023

Add BigInteger.Multiply benchmark for dotnet/runtime#92208 dotnet/performance#3361

Merged

cincuranet pushed a commit to dotnet/performance that referenced this pull request Sep 21, 2023

Add BigInteger.Multiply benchmark for dotnet/runtime#92208 (#3361)

f56b57a

adamsitnik approved these changes Nov 3, 2023

View reviewed changes

adamsitnik added the tenet-performance Performance related issue label Nov 3, 2023

adamsitnik added this to the 9.0.0 milestone Nov 3, 2023

adamsitnik closed this Nov 3, 2023

adamsitnik reopened this Nov 3, 2023

adamsitnik merged commit e733539 into dotnet:main Nov 6, 2023

Rob-Hague mentioned this pull request Nov 9, 2023

[Perf] Linux/x64: 5 Improvements on 10/31/2023 4:24:48 PM dotnet/perf-autofiling-issues#24205

Closed

kzrnm deleted the fix/BigIntegerMultiply branch November 9, 2023 12:21

EgorBo mentioned this pull request Nov 9, 2023

[Perf] Linux/arm64: 2 Improvements on 11/6/2023 6:34:58 PM dotnet/perf-autofiling-issues#24307

Closed

tannergooding mentioned this pull request Nov 10, 2023

System numerics abort at System.Numerics.BigIntegerCalculator.Multiply #94610

Closed

This was referenced Nov 14, 2023

[Perf] Windows/x64: 1 Improvement on 11/6/2023 12:32:51 PM dotnet/perf-autofiling-issues#24436

Open

[Perf] Windows/x64: 1 Improvement on 11/6/2023 12:32:51 PM dotnet/perf-autofiling-issues#24480

Open

github-actions bot locked and limited conversation to collaborators Dec 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of BigInteger.Multiply(large, small) #92208

Improve performance of BigInteger.Multiply(large, small) #92208

kzrnm commented Sep 18, 2023 •

edited

Loading

ghost commented Sep 18, 2023

Benchmark

tannergooding commented Sep 20, 2023

adamsitnik left a comment

adamsitnik Nov 3, 2023

adamsitnik Nov 3, 2023

	for ( ; i < left.Length; i++)
	{
	ulong digits = (ulong)left[i] * right + carry;
	bits[i] = unchecked((uint)digits);
	carry = digits >> 32;
	}
	bits[i] = (uint)carry;

Improve performance of BigInteger.Multiply(large, small) #92208

Improve performance of BigInteger.Multiply(large, small) #92208

Conversation

kzrnm commented Sep 18, 2023 • edited Loading

Benchmark

ghost commented Sep 18, 2023

Benchmark

tannergooding commented Sep 20, 2023

adamsitnik left a comment

Choose a reason for hiding this comment

adamsitnik Nov 3, 2023

Choose a reason for hiding this comment

adamsitnik Nov 3, 2023

Choose a reason for hiding this comment

kzrnm commented Sep 18, 2023 •

edited

Loading