Faster Rational-like type #11522

timholy · 2015-06-01T15:30:36Z

Over in JuliaMath/Interpolations.jl#36 (comment) and JuliaMath/Interpolations.jl#37 it was discovered that doing computations with Rational is slow, because basically every usage calls gcd and div. The advantage of calling gcd and div is that it makes the type much less vulnerable to overflow, and that is a Good Thing. But as we discovered, certain computations may not need that kind of care, so there may be room for a faster variant. Switching to a stripped down variant provided an approximate 50-fold speed boost.

I suspect certain computations may demand an implementation that is as minimalistic as that Ratio type. There may also be an area of intermediate interest, where a Rational-like object is represented in terms of pre-factorized numbers, perhaps numerator and denominator each being a Dict{Int,Int} representing the base and power of the factors.

The text was updated successfully, but these errors were encountered:

simonbyrne · 2015-06-01T15:45:01Z

One option, suggested by @StefanKarpinski in #8672, is to get rid of the gcd call in the Rational constructor, just keeping it in //. This would allow elimination of at least one call to gcd in * and /.

We could take this even further and get rid of the coprime requirement altogether? We could keep the current operations (*, + etc) as they are, and define different symbols (e.g. \boxplus, etc) as fast non-cancelling operations.

StefanKarpinski · 2015-06-01T15:54:46Z

We could have a function (reduce is already taken, maybe coprime) that reduces a Rational to lowest terms. I think that for presentation doing the reduction still makes sense – people want to see their rational numbers in canonical form. And of course, when one asks for the numerator and denominator explicitly, one presumably wants it in lowest terms – otherwise the answer is ill defined.

simonbyrne · 2015-06-01T17:53:14Z

Another option (perhaps encompassing Stefan's proposal), would be to do fast, non-cancelling operations by default, and only fallback on cancelling operations when overflow is detected?

simonbyrne · 2015-06-01T18:01:24Z

@timholy Do you have some suggestions for useful benchmarks?

timholy · 2015-06-01T18:01:25Z

I've posted https://github.com/timholy/Ratios.jl as a playground (and because I need this for Interpolations, and as @tlycken pointed out it's better not to bury it inside Interpolations). Feel free to play here or elsewhere.

timholy · 2015-06-01T18:02:02Z

Operationally, I'd say anything fast enough so that it's not a bottleneck for Interpolations is currently the benchmark I care about 😄.

IainNZ · 2015-06-01T19:59:14Z

only fallback on cancelling operations when overflow is detected

This was first thing that sprung to mind too. When coupled with simplification only when absolutely needed (i.e. display, querying numerator/denominator), it could be quite nice. I assume (LOL) that it'd be closer to Ratios.jl performance than the current Rationals, but... not sure. The Ratios code is so simple that it should be blistering fast (SIMD-able even)

simonbyrne · 2015-06-03T10:51:43Z

I tested out the idea on the checked branch of Ratios.jl

Unfortunately, the resulting performance is somewhat disappointing: using @timholy's test on JuliaMath/Interpolations.jl#37, makes Interpolations slightly slower than Grid, though nowhere near as slow as using the Rational type (see here for the test script).

simonbyrne · 2015-06-03T11:04:55Z

I've also added a rational branch which just aliases SimpleRatio to Rational: the rough timings:

master (unchecked): 47 ms
checked: 980 ms (~20x slower)
rational: 15 s (~320x slower)
Grid.jl (for reference): 660 ms (~14x slower)

Given that we still get an order of magnitude speedup, I think this is worth pursuing. We could also then add (possibly unexported) unchecked_... operations for examples such as this where the bounds are known to be safe.

IainNZ · 2015-06-03T14:07:32Z

Pretty compelling to me. It looks like you are using exceptions, so is it plausible that a Int-specific version that does checking for overflow without exceptions could get to something like 10x slower?

timholy · 2015-06-03T14:29:27Z

+1 for the experiment, and the 10x speedup. Interpolations will still use the blisteringly-fast unchecked variants, but I agree this is quite promising.

StefanKarpinski · 2015-06-03T14:39:35Z

The main issue with the checked stuff is that we only expose it via exceptions, which are a total performance trap. We need to expose some way of doing operations and then checking the overflow bit.

garrison · 2015-06-03T14:46:09Z

And of course, when one asks for the numerator and denominator explicitly, one presumably wants it in lowest terms – otherwise the answer is ill defined.

There are currently packages where rat.num and rat.den are accessed directly (ContinuedFractions.jl, ValidatedNumerics.jl, perhaps others) instead of through the num and den functions. If this were to replace Rational, I would prefer to see these fields renamed so that 1. people are less likely to use them directly; and 2. existing code that uses them will break, giving the users a chance to make an explicit decision about whether they want the reduced numerator or whether the unreduced numerator will suffice. (Perhaps unreduced_num and unreduced_den would make better field names.)

EDIT: As an explicit example of potential subtle breakage, ValidatedNumerics takes a different code path based on whether iseven(r.den) is true. A different path could be taken here depending on whether the fraction is in reduced form or not.

simonbyrne · 2015-06-04T14:56:56Z

I don't really have time to play around with this much more the moment, but I did arrive at this:

function null_checked_add(x::Int, y::Int)
    n, x = Base.llvmcall("""
    %3 = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %0, i64 %1)
    %4 = extractvalue { i64, i1 } %3, 1
    %5 = zext i1 %4 to i8
    %6 = extractvalue { i64, i1 } %3, 0
    %7 = insertvalue { i8, i64 } undef, i8 %5, 0
    %8 = insertvalue { i8, i64 } %7, i64 %6, 1
    ret { i8, i64 } %8""",
    Tuple{Bool,Int64},Tuple{Int64,Int64},x,y)
end

This returns a (Bool, Int64) tuple, with the Bool indicating whether or not overflow occurred. It should be possible to wrap this in a Nullable, except for the fact that there is no Nullable constructor which takes 2 arguments.

timholy · 2015-06-04T16:31:27Z

I was thinking along the same directions, minus all the llvmcall magic :-). Can you write that in julia, or not possible?

simonbyrne · 2015-06-04T16:38:35Z

Not that I know of: the current checked_add instructions are defined in src/intrinsics.cpp, which includes the exception machinery.

JeffBezanson · 2015-06-04T17:13:51Z

This is a great use of llvmcall. We could adjust the intrinsics to return the overflow bit, and then throw the exception in a julia-level definition, but we don't want to add many more intrinsics.

StefanKarpinski · 2015-06-04T18:03:49Z

We could adjust the intrinsics to return the overflow bit, and then throw the exception in a julia-level definition, but we don't want to add many more intrinsics.

+1 to this – I was thinking that as well.

hayd · 2015-06-07T02:46:43Z

Is #11604 needed to use @llvm.smul.with.overflow.i64 (and friends) ?

simonbyrne · 2015-06-07T16:34:34Z

Is #11604 needed to use @llvm.smul.with.overflow.i64 (and friends) ?

Apparently not: I assume because they've already been declared by intrinsics.cpp?

hayd · 2015-06-07T17:04:50Z

sadd works. But when trying the above with smul:

julia> null_checked_mul(6, 7)
ERROR: error compiling null_checked_mul: Failed to parse LLVM Assembly:
julia: <string>:3:23: error: use of undefined value '@llvm.smul.with.overflow.i64'
%3 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 %0, i64 %1)

It may be that I'm doing something else wrong!

simonbyrne · 2015-06-07T18:04:46Z

Ah, sorry I missed that. Yes, you're right (though if you run it a second time it does work correctly).

simonbyrne · 2015-06-08T13:28:52Z

@JeffBezanson This is one thing I have often wondered: do we actually need most of the intrinsics? Would there be any disadvantage to using llvmcall for a lot of those (once #11604 is ironed out)?

simonbyrne · 2015-10-22T16:17:26Z

A rough plan for this issue:

Once Adding declarations to llvmcall #11604 is sorted, implement checked integer ops that include a flag for overflow (either returning a tuple or a Nullable)
Make Rational ops use non-cancelling operations unless overflow occurs (as suggested above)
- Keep current behaviour for Rational{BigInt} to avoid ridiculously large numbers (alternatively, we could use the GMP mpq functions, cf Exploit mpq functions for Rational{BigInt}? #9826).

garrison · 2015-10-22T22:16:30Z

@simonbyrne any thoughts on the issue I raised above? Renaming the num and den fields to something more obscure would at least explicitly (rather than subtly) break code relying on the current behavior.

Also, I assume calling num(frac) would reduce a fraction before returning the numerator. But then calling num(frac) repeatedly would be slower than it currently is, as it must check each time that the fraction is in reduced form.

StefanKarpinski · 2015-10-22T22:44:54Z

Could keep a flag of whether it's been reduced or not and have a reduce function that returns the same value in reduced form.

simonbyrne · 2015-10-23T20:48:29Z

Renaming the fields seems reasonable. The idea of a flag seems reasonable, though perhaps worth having some examples of where this might be a problem.

StefanKarpinski · 2015-10-23T21:48:18Z

Could use the sign of the denominator or something like that. We've also talked about having a separate powers of two field, which would give bigger range and make it possible to represent all floating-point values, which would be pretty useful.

simonbyrne · 2016-02-12T17:13:50Z

A quick update: I've managed to get the llvm-checked operations working, (on the llvm-checked branch), and it's down to 6x slower than completely unchecked ops.

timholy · 2016-02-12T19:58:31Z

Really nice! That's a heck of an improvement from 320x slower! Sounds like Base material to me (assuming we aren't planning on moving Rationals out of Base).

oscardssmith · 2016-12-07T22:21:12Z

What happened with this? A 20x performance increase would be a bad thing to lose to the sands of time.

simonbyrne · 2017-01-12T10:13:47Z

Note that add/sub/mul_with_overflow in Base.Checked should now make this easier to implement if someone is interested.

JeffreySarnoff · 2017-01-15T03:06:01Z

I spent the afternoon playing with this.

It seems to me that one may carry an unreduced rational if it become reduced on these occasions:

after it is entered, read, parsed, externally retrieved
before it is printed, shown, displayed, written, externally stored
after an arithmetic operation overflows, before one attempt at recalculation

For all other calculation processing, the use of unreduced rationals would be ok.
Next, we see that Rational{Int32} is not the target for this strategy.

T	floor( sqrt( T ) )	floor( cbrt( T ) )
Int16	181	31
Int32	46_340	1_290
Int64	3_037_000_499	2_097_152
Int128	13_043_817_825_332_783_104	5_541_191_377_756

I found this to be marginally faster than the current version:

immutable Rational{T<:Integer} <: Real
    num::T
    den::T

    function Rational(num::Integer, den::Integer)
        !iszero(den) && return new(num, den)
        !iszero(num) && return new(flipsign(one(T),num), zero(T))
        throw(ArgumentError("invalid rational: zero($T)//zero($T)"))
    end
end

Is it acceptable to use two Val{} types as a second parameter, encoding IS_REDUCED or MAY_REDUCE?

That is a way to work without a state field and let calculations with unreduced rationals go on unless there is overflow. The only other way that is type size respecting, as I read above, appropriates the denominator's sign bit for use as state bit ( signbit(den) ? IS_REDUCED. : MAY_REDUCE ). To date, Julia base has stayed away from reclaiming an internal bit of a built-in numeric type (I have).

StefanKarpinski · 2017-01-27T18:49:42Z

Nice work, @JeffreySarnoff. It would be great to have a faster rational type based on this approach. I'm not enthused about the type parameter indicating reduction status, but maybe it would be ok? At that point, we could actually just have reduced and unreduced rational types. I.e. this:

abstract Rational{T<:Integer} <: Real
immutable ReducedRational{T<:Integer} <: Rational{T} ... end
immutable UnreducedRational{T<:Integer} <: Rational{T} ... end

Then some operations would produce reduced rationals, while others would produce unreduced ones. Of course, the trouble is that you can't always predict statically when you'll get which. Which is why I don't think it really helps. Instead, I think having some sort of reduced flag to avoid repeated reduction would be the way to go.

timholy · 2017-01-27T19:02:34Z

👍 to run-time checking of the flag (I'd bet money that using the type system for this would make things worse).

JeffreySarnoff · 2017-04-27T14:42:30Z

This is a proof of concept.

To keep type constancy, there is no widening.
If a calculation overflows, any unreduced inputs are reduced and the calculation is reattempted once.

With element types of Int64 or Int32 the speedups are utilitarian.
A test script given relative performance is in the readme.

Peiffap · 2019-05-23T13:13:20Z

Any progress on this? As @oscardssmith said, "A 20x performance increase would be a bad thing to lose to the sands of time."

JeffreySarnoff · 2019-05-23T13:35:21Z

which one of these approaches to handle overflow .. rationals tend to grow their sigdigits

(a)   throw an OverflowException  

(b)   substitute a nearby rational of the same eltype     for values < typemax(Rational{T}
      saturate                                            for values > typemax(Rational{T}

(c)   substitute BigInt // BigInt

newptcai · 2020-04-14T11:23:00Z

I was trying to compute the 1000th harmonic numbers exactly. Have to use bigint in this case. It seems to be a bit slow.

JeffreySarnoff · 2020-04-14T15:13:41Z

try this for calculating harmonic numbers

using FastRationals
n = 10_000
qs = [Rational{BigInt}(1,i) for i=1:n];
fastqs = [FastQBig(1,i) for i=1:n];
qs_time = @belapsed sum($qs);
fastqs_time = @belapsed sum($fastqs);
round(qs_time / fastqs_time, digits=2)

I get 20x, 10x for n=1_000.

newptcai · 2020-04-15T08:32:20Z

try this for calculating harmonic numbers

using FastRationals
n = 10_000
qs = [Rational{BigInt}(1,i) for i=1:n];
fastqs = [FastQBig(1,i) for i=1:n];
qs_time = @belapsed sum($qs);
fastqs_time = @belapsed sum($fastqs);
round(qs_time / fastqs_time, digits=2)

I get 20x, 10x for n=1_000.

Indeed much faster. See benchmark here. Thanks.

https://newptcai.github.io/harmonic-number-and-zeta-functions-explained-in-julia-part-1.html

JeffreySarnoff · 2020-04-15T12:02:22Z

Great! @StefanKarpinski

lesobrod · 2023-07-16T07:37:28Z

Hi! I’m trying to use FastRationals package for long computations associated with the harmonic numbers.
Here’s a branch without FastRationals:
https://github.com/lesobrod/egyptian-fractions/tree/main
and with it:
https://github.com/lesobrod/egyptian-fractions/tree/fastRationals
The essence of the task is easy to understand from README.
Unfortunately, calculations with package occur much slower than without it ((
I’d appreciate it if someone could see if I’m using FastRationals correctly

JeffreySarnoff · 2023-07-17T02:45:30Z

Are you following the guidelines? Can you use Q64 staying around the tabulated sweet spot? If so, that is worth doing.

What is the result when you run this:

using FastRationals

n = 10_000
qs = [Rational{BigInt}(1,i) for i=1:n];
fastqs = [FastQBig(1,i) for i=1:n];
qs_time = @belapsed sum($qs);
fastqs_time = @belapsed sum($fastqs);
round(qs_time / fastqs_time, digits=2)

lesobrod · 2023-07-17T08:12:14Z

I've found that results are very unstable (o_O)

JeffreySarnoff · 2023-07-17T10:26:26Z

I have no idea why your results are unstable.

julia> n = 10_000
julia> qs = [Rational{BigInt}(1,i) for i=1:n]; fastqs = [FastQ64(1,i) for i=1:n];
julia> qs_time = @belapsed sum($qs);  fastqs_time = @belapsed sum($fastqs);
julia> round(qs_time / fastqs_time, digits=2)

repeating that five times, I see these results (showing a slowdown!)
(0.62, 0.62, 0.61, 0.62, 0.61)
-- clearly, something has changed in Julia that has altered processing of FastQBigs
(this package has not been edited materially since 2018; bugfix for exp in 2020)

using FastQ64 tells another story (n <= 46, as FastQ64 overflows this calc at n=47 )

julia> n = 10
julia> qs = [Rational{BigInt}(1,i) for i=1:n]; fastqs = [FastQ64(1,i) for i=1:n];
julia> qs_time = @belapsed sum($qs);  fastqs_time = @belapsed sum($fastqs);
julia> round(qs_time / fastqs_time, digits=2)
141.84

julia> n = 30 ... round(qs_time / fastqs_time, digits=2)
30.45

julia> n = 40 ... round(qs_time / fastqs_time, digits=2)
19.18

julia> n = 46 ... round(qs_time / fastqs_time, digits=2)
8.45

JeffreySarnoff · 2023-07-17T10:40:37Z

I placed a large CAUTION about FastQBig at the top of the readme.

timholy added the help wanted Indicates that a maintainer wants help on an issue or pull request label Jun 1, 2015

timholy mentioned this issue Jun 1, 2015

Register Ratios JuliaLang/METADATA.jl#2674

Merged

JeffBezanson added performance Must go faster rationals The Rational type and values thereof maths Mathematical functions labels Jun 1, 2015

simonbyrne mentioned this issue Jun 4, 2015

add 2 argument Nullable constructor #11576

Merged

simonbyrne mentioned this issue Jun 17, 2015

Reliability of rational arithmetic #11736

Closed

MichaeLeroy mentioned this issue Oct 29, 2015

abs behavior incorrect/not documented correctly #13549

Closed

simonbyrne mentioned this issue Feb 24, 2016

Overflow when rounding rationals in edge case #15215

Closed

simonbyrne mentioned this issue Apr 29, 2016

Use intrinsics for safe_add/safe_mul #16120

Closed

yuyichao mentioned this issue Sep 27, 2016

Fields of global constant immutables cannot be constant-folded #18387

Closed

Liozou mentioned this issue Apr 15, 2020

Faster Rationals by avoiding unnecessary divgcd #35492

Merged

cmichelenstrofer mentioned this issue May 4, 2022

Looks cool, but way slower yakir12/UnitfulAngles.jl#23

Closed

Faster Rational-like type #11522

Faster Rational-like type #11522

Comments

timholy commented Jun 1, 2015

simonbyrne commented Jun 1, 2015

StefanKarpinski commented Jun 1, 2015

simonbyrne commented Jun 1, 2015

simonbyrne commented Jun 1, 2015

timholy commented Jun 1, 2015

timholy commented Jun 1, 2015

IainNZ commented Jun 1, 2015

simonbyrne commented Jun 3, 2015

simonbyrne commented Jun 3, 2015

IainNZ commented Jun 3, 2015

timholy commented Jun 3, 2015

StefanKarpinski commented Jun 3, 2015

garrison commented Jun 3, 2015

simonbyrne commented Jun 4, 2015

timholy commented Jun 4, 2015

simonbyrne commented Jun 4, 2015

JeffBezanson commented Jun 4, 2015

StefanKarpinski commented Jun 4, 2015

hayd commented Jun 7, 2015

simonbyrne commented Jun 7, 2015

hayd commented Jun 7, 2015

simonbyrne commented Jun 7, 2015

simonbyrne commented Jun 8, 2015

simonbyrne commented Oct 22, 2015

garrison commented Oct 22, 2015

StefanKarpinski commented Oct 22, 2015

simonbyrne commented Oct 23, 2015

StefanKarpinski commented Oct 23, 2015

simonbyrne commented Feb 12, 2016

timholy commented Feb 12, 2016

oscardssmith commented Dec 7, 2016

simonbyrne commented Jan 12, 2017

JeffreySarnoff commented Jan 15, 2017

StefanKarpinski commented Jan 27, 2017

timholy commented Jan 27, 2017

JeffreySarnoff commented Apr 27, 2017

Peiffap commented May 23, 2019

JeffreySarnoff commented May 23, 2019

newptcai commented Apr 14, 2020

JeffreySarnoff commented Apr 14, 2020 • edited Loading

newptcai commented Apr 15, 2020

JeffreySarnoff commented Apr 15, 2020

lesobrod commented Jul 16, 2023

JeffreySarnoff commented Jul 17, 2023

lesobrod commented Jul 17, 2023

JeffreySarnoff commented Jul 17, 2023

JeffreySarnoff commented Jul 17, 2023

JeffreySarnoff commented Apr 14, 2020 •

edited

Loading