Skip to content

Commit 40a72a0

Browse files
committed
feat: update apfloat-bigdec with plots and more tests.
1 parent a5dd32e commit 40a72a0

18 files changed

+15342
-98
lines changed

content/blog/2025/apfloat-bigdecimal.md

Lines changed: 49 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ featuredpath = "date"
1010
type = "post"
1111
+++
1212

13+
*{{< sp orange >}}Edit (2025-05-08):{{</ sp >}} I changed some test parameters and re-run the tests. Adding bar plots.*
14+
1315
I recently set out to compare the performance of [`Apfloat`](http://www.apfloat.org) and [`BigDecimal`](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/math/BigDecimal.html) for arbitrary precision arithmetic in Java. I use arbitrary precision floating point numbers in key places of the update cycle in Gaia Sky, so it made sense to explore this. My initial approach was a naive benchmark: a simple `main()` method running arithmetic operations in a loop and measuring the time taken. The results were strongly in favor of `BigDecimal`, even for large precision values. This was unexpected, as the general consensus I [found](https://stackoverflow.com/questions/277309/java-floating-point-high-precision-library) [online](https://groups.google.com/g/javaposse/c/YDYDPbzxntc?pli=1) [suggested](http://www.apfloat.org/apfloat_java/) that `Apfloat` is more performant, especially for higher precision operations (hundreds of digits).
1416

1517
To get more accurate and reliable measurements, I decided to implement a proper [JMH](@ "Java Microbenchmark Harness") benchmark. The benchmark project source is available in [this repository](https://codeberg.org/langurmonkey/java-arbitrary-precision-benchmark). The benchmarks test addition, subtraction, multiplication, division, power, natural logarithm, and sine for both `Apfloat` and `BigDecimal` at different precision levels.
@@ -23,15 +25,15 @@ JMH is a benchmarking framework specifically designed for measuring performance
2325
### The Benchmark Implementation
2426

2527
The JMH benchmark project is structured to measure the average time taken for each arithmetic operation over several iterations and precision levels. Here's the structure:
26-
- Separate benchmarks for **addition**, **subtraction**, **multiplication**, **division**, **natural logarithm**, **power**, and **sine**.
28+
- Separate benchmarks for **addition**, **subtraction**, **multiplication**, **division**, **natural logarithm**, **power**, and **sine**, additionally to an **allocation** test.
2729
- Each benchmark tests `Apfloat` and `BigDecimal`.
28-
- Create the actual objects at benchmark level to factor out allocation costs. Later on I provide a test with in-loop allocations.
29-
- Settled on two precision levels, representative of *low* and *high* precision settings. They are **25** and **1000**.
30+
- Create the actual objects at benchmark level to factor out allocation costs. Specific benchmark to test allocation overhead.
31+
- Settled on four precision levels, on a scale ranging from *low* and *high* precision settings, represented as the number of digits. They are **25**, **50**, **500**, and **1000** digits.
3032
- Average time mode.
31-
- 200 in-test iterations.
32-
- Two warm-up iterations of two seconds each to minimize JVM effects.
33-
- Two main iterations of two seconds each in the main test.
34-
- Finally, send result into `Blackhole` to prevent JIT optimizations.
33+
- Every benchmark function only runs one operation once. The allocation test creates a couple of objects and consumes them.
34+
- One warm-up iterations of one second each to minimize JVM effects (`@Warmup(iterations = 1, time = 1)`).
35+
- Three main iterations of five seconds each for the measurement (`@Measurement(iterations = 3, time = 5)`).
36+
- Send results into `Blackhole` to prevent JIT optimizations.
3537

3638
Here is an example for the `Sin` benchmark:
3739

@@ -40,115 +42,85 @@ Here is an example for the `Sin` benchmark:
4042

4143
### The Results
4244

43-
I have run the benchmark with Java 21 and JMH 1.37. Below are the specs of my laptop and the specific software versions.
45+
Below are the specs of the system I used to run the tests and the specific software versions used. Only the CPU and the memory should play a significant role.
4446

4547
```
4648
# JMH version: 1.37
4749
# VM version: JDK 21.0.7, OpenJDK 64-Bit Server VM, 21.0.7+6
4850
49-
CPU: Intel(R) Core(TM) i7-8550U (8) @ 4.00 GHz
50-
GPU: Intel UHD Graphics 620 @ 1.15 GHz [Integr]
51-
Memory: 16.00 GiB
51+
CPU: Intel(R) Core(TM) i7-7700 (8) @ 4.20 GHz
52+
GPU 1: NVIDIA GeForce GTX 1070 [Discrete]
53+
GPU 2: Intel HD Graphics 630 [Integrated]
54+
Memory: 32.00 GiB
5255
Swap: 8.00 GiB
5356
```
5457

5558
And here are the benchmark results.
5659

5760
**Addition**
5861

59-
```
60-
Benchmark (precision) Mode Cnt Score Error Units
61-
Addition.testApfloatAddition 25 avgt 2 0.058 ms/op
62-
Addition.testApfloatAddition 1000 avgt 2 0.058 ms/op
63-
Addition.testBigDecimalAddition 25 avgt 2 0.006 ms/op
64-
Addition.testBigDecimalAddition 1000 avgt 2 0.007 ms/op
65-
```
62+
{{< fig src="/img/2025/05/jmh-result-Addition.svg" class="fig-center" width="100%" title="Addition results" loading="lazy" >}}
63+
64+
We already see that `BigDecimal` is much faster in all precisions. It is not even close.
6665

6766
**Subtraction**
68-
```
69-
Benchmark (precision) Mode Cnt Score Error Units
70-
Subtraction.testApfloatSubtraction 25 avgt 2 0.082 ms/op
71-
Subtraction.testApfloatSubtraction 1000 avgt 2 0.083 ms/op
72-
Subtraction.testBigDecimalSubtraction 25 avgt 2 0.006 ms/op
73-
Subtraction.testBigDecimalSubtraction 1000 avgt 2 0.007 ms/op
74-
```
7567

76-
Surprising. With both addition and subtraction `BigDecimal` comes out on top.
68+
{{< fig src="/img/2025/05/jmh-result-Subtraction.svg" class="fig-center" width="100%" title="Subtraction results" loading="lazy" >}}
69+
70+
In the subtraction benchmark `BigDecimal` comes out on top as well.
7771

7872
**Multiplication**
79-
```
80-
Benchmark (precision) Mode Cnt Score Error Units
81-
Multiplication.testApfloatMultiplication 25 avgt 2 0.142 ms/op
82-
Multiplication.testApfloatMultiplication 1000 avgt 2 0.143 ms/op
83-
Multiplication.testBigDecimalMultiplication 25 avgt 2 0.008 ms/op
84-
Multiplication.testBigDecimalMultiplication 1000 avgt 2 0.009 ms/op
85-
```
73+
74+
{{< fig src="/img/2025/05/jmh-result-Multiplication.svg" class="fig-center" width="100%" title="Multiplication results" loading="lazy" >}}
75+
76+
The same story repeats for multiplication.
8677

8778
**Division**
88-
```
89-
Benchmark (precision) Mode Cnt Score Error Units
90-
Division.testApfloatDivision 25 avgt 2 1.629 ms/op
91-
Division.testApfloatDivision 1000 avgt 2 8.568 ms/op
92-
Division.testBigDecimalDivision 25 avgt 2 0.067 ms/op
93-
Division.testBigDecimalDivision 1000 avgt 2 1.730 ms/op
94-
```
9579

96-
Same story here. Division is a notoriously costly operation, but `BigDecimal` still comes out comfortably on top.
97-
Now, let's test some more involved arithmetic operation like the natural logarithm, sine, and power. Those are implemented directly in the `Apfloat` package. We use the [`big-math` project](https://github.com/eobermuhlner/big-math) for `BigDecimal`.
80+
{{< fig src="/img/2025/05/jmh-result-Division.svg" class="fig-center" width="100%" title="Division results" loading="lazy" >}}
81+
82+
Again. Division is a notoriously costly operation, but `BigDecimal` still comes out comfortably on top.
83+
84+
Now, let's test some more involved arithmetic operations, like the natural logarithm, the sine, and the power function. In `Apfloat`, those are directly implemented in the library. For `BigDecimal`, we use the [`big-math` project](https://github.com/eobermuhlner/big-math).
9885

9986
**Log**
100-
```
101-
Benchmark (precision) Mode Cnt Score Error Units
102-
Log.testApfloatLog 25 avgt 2 112.835 ms/op
103-
Log.testApfloatLog 1000 avgt 2 3977.143 ms/op
104-
Log.testBigDecimalLog 25 avgt 2 15.191 ms/op
105-
Log.testBigDecimalLog 1000 avgt 2 6006.199 ms/op
106-
```
10787

108-
The log is roughly twice as fast with `Apfloat` in the high precision setting, but it is much faster in `BigDecimal` in low precision.
88+
{{< fig src="/img/2025/05/jmh-result-Log.svg" class="fig-center" width="100%" title="Log results" loading="lazy" >}}
89+
90+
The logarithm is faster with `Apfloat` at the higher precision settings, but it `BigDecimal` still wins in the lower precisions.
10991

11092
**Sin**
111-
```
112-
Benchmark (precision) Mode Cnt Score Error Units
113-
Sin.testApfloatSin 25 avgt 2 610.609 ms/op
114-
Sin.testApfloatSin 1000 avgt 2 27157.444 ms/op
115-
Sin.testBigDecimalSin 25 avgt 2 7.516 ms/op
116-
Sin.testBigDecimalSin 1000 avgt 2 4504.473 ms/op
117-
```
11893

119-
The sine is much faster in `BigDecimal` in both precision settings.
94+
{{< fig src="/img/2025/05/jmh-result-Sin.svg" class="fig-center" width="100%" title="Sin results" loading="lazy" >}}
95+
96+
The sine is much faster in `BigDecimal` in all precision settings.
12097

12198
**Pow**
122-
```
123-
Benchmark (precision) Mode Cnt Score Error Units
124-
Pow.testApfloatPow 25 avgt 2 0.311 ms/op
125-
Pow.testApfloatPow 1000 avgt 2 0.350 ms/op
126-
Pow.testBigDecimalPow 25 avgt 2 0.194 ms/op
127-
Pow.testBigDecimalPow 1000 avgt 2 0.036 ms/op
128-
```
99+
100+
{{< fig src="/img/2025/05/jmh-result-Pow.svg" class="fig-center" width="100%" title="Pow results" loading="lazy" >}}
129101

130102
And finally, the power repeats the same story, with `BigDecimal` sitting comfortably on the throne again.
131103

132-
I also wanted to test the overhead due to allocation, so I prepared the **AdditionAlloc** test, which creates the operand instances in the loop.
133104

134-
**Addition (in-loop allocation)**
135-
```
136-
Benchmark (precision) Mode Cnt Score Error Units
137-
AdditionAllocation.testApFloatAdditionAlloc 25 avgt 2 0.210 ms/op
138-
AdditionAllocation.testApFloatAdditionAlloc 1000 avgt 2 0.234 ms/op
139-
AdditionAllocation.testBigDecimalAdditionAlloc 25 avgt 2 0.281 ms/op
140-
AdditionAllocation.testBigDecimalAdditionAlloc 1000 avgt 2 0.170 ms/op
141-
```
105+
**Allocation**
142106

143-
Here we clearly see that the allocation overhead dominates the results. Surprisingly, `BigDecimal` seems faster when using 1000 digits of precision than when it uses only 25. The results are otherwise similar for both libraries.
107+
For science, I thought it would be cool to test the allocation overhead, so I prepared the **Allocation** test, which allocates two instances of either `Apfloat` or `BigDecimal` and consumes them.
144108

109+
{{< fig src="/img/2025/05/jmh-result-Allocation.svg" class="fig-center" width="100%" title="Allocation results" loading="lazy" >}}
145110

146-
### Analysis
111+
We see that allocation is very costly in both libraries. However, while `Apfloat` seems to be roughly constant with the precision, `BigDecimal` shows a higher cost with 25 digits, the lowest precision setting. I though this was weird, so I re-ran the test a bunch of times with the same result. I'm not sure what's the root cause for this, but it is surprising nonetheless.
112+
113+
Since both `Apfloat` and `BigDecimal` are immutable, allocation costs need to be factored in. New objects need to be allocated every time new operands are needed.
147114

148-
Contrary to expectations, `BigDecimal` consistently outperformed `Apfloat` across all operations and precision levels, including the higher precisions (500 and 1000 digits) where `Apfloat` was expected to excel. There is a single case when `Apfloat` is faster, and that is in the high precision natural logarithm benchmark. It's safe to say that this is due to the particular implementation or algorithm being used. Otherwise, the disparity is particularly noticeable in division and sine operations, where `Apfloat` is significantly slower than `BigDecimal`.
149115

116+
### Analysis
117+
118+
Contrary to expectations, `BigDecimal` consistently outperformed `Apfloat` across all operations and precision levels, including the higher precisions (500 and 1000 digits) where `Apfloat` was expected to excel. There is a single case when `Apfloat` is faster, and that is in the high precision natural logarithm benchmark. I think it's safe to say that this is due to the particular implementation or algorithm being used. Otherwise, the disparity is particularly noticeable in division and sine operations, where `Apfloat` is significantly slower than `BigDecimal`.
150119
Specifically, `BigDecimal` was several times faster than `Apfloat` in most operations and precisions. Those are, in my opinion, significant results.
151120

121+
Finally, allocation seems to be faster with `Apfloat`, and there's a weird dependency on the precision for `BigDecimal` which I found strange.
122+
123+
152124
### Questions and Next Steps
153125

154126
I was genuinely surprised by the outcome of these benchmarks, as it contradicts the general consensus regarding `Apfloat`’s supposed performance advantage in high-precision arithmetic. I am reaching out to the community to validate my methodology and results. Are these findings trustworthy, or did I overlook something crucial in my benchmarking approach? Feedback and insights are very much welcome.

static/code/2025/SinBenchmark.java

Lines changed: 14 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,39 @@
11
@BenchmarkMode(Mode.AverageTime)
2-
@OutputTimeUnit(TimeUnit.MILLISECONDS)
2+
@OutputTimeUnit(TimeUnit.NANOSECONDS)
33
@Fork(value = 1)
4-
@Warmup(iterations = 2, time = 2)
5-
@Measurement(iterations = 2, time = 2)
4+
@Warmup(iterations = 1, time = 1)
5+
@Measurement(iterations = 3, time = 5)
66
public abstract class BaseBenchmark {
77

8-
protected static final int ITERATIONS = 200;
9-
108
@State(Scope.Thread)
119
public static class BenchmarkState {
1210
MathContext mc;
13-
BigDecimal aBD;
14-
Apfloat aAF;
11+
BigDecimal aBD, bBD;
12+
Apfloat aAF, bAF;
1513

16-
@Param({ "25", "1000" }) // Add different precision levels here
14+
@Param({ "25", "50", "500", "1000" }) // Add different precision levels here
1715
int precision;
1816

1917
@Setup(Level.Trial)
2018
public void setUp() {
2119
mc = new MathContext(precision);
22-
aBD = new BigDecimal("12345.6789012345678901234567890123456789", mc);
23-
aBD = new Apfloat("12345.6789012345678901234567890123456789", precision);
20+
aBD = new BigDecimal("12345.678901234567890123456789012345678934343434343434343434343434343434", mc);
21+
aAF = new Apfloat("12345.678901234567890123456789012345678934343434343434343434343434343434", precision);
2422
}
2523
}
2624
}
2725

2826
public class Sin extends BaseBenchmark {
2927

3028
@Benchmark
31-
public void testBigDecimalSin(BenchmarkState state, Blackhole bh) {
32-
for (int i = 0; i < ITERATIONS; i++) {
33-
var result = BigDecimalMath.sin(state.aBD, state.mc);
34-
bh.consume(result);
35-
}
29+
public void BigDecimalSin(BenchmarkState state, Blackhole bh) {
30+
var result = BigDecimalMath.sin(state.aBD, state.mc);
31+
bh.consume(result);
3632
}
3733

3834
@Benchmark
39-
public void testApfloatSin(BenchmarkState state, Blackhole bh) {
40-
for (int i = 0; i < ITERATIONS; i++) {
41-
var result = ApfloatMath.sin(state.aBD);
42-
bh.consume(result);
43-
}
35+
public void ApfloatSin(BenchmarkState state, Blackhole bh) {
36+
var result = ApfloatMath.sin(state.aAF);
37+
bh.consume(result);
4438
}
45-
4639
}

0 commit comments

Comments
 (0)