Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse x^literal as call to ^(x, Val{literal})? #20527

Closed
stevengj opened this issue Feb 8, 2017 · 22 comments
Closed

parse x^literal as call to ^(x, Val{literal})? #20527

stevengj opened this issue Feb 8, 2017 · 22 comments
Labels
maths Mathematical functions parser Language parsing and surface syntax speculative Whether the change will be implemented is speculative

Comments

@stevengj
Copy link
Member

stevengj commented Feb 8, 2017

When working on #20484, it occurred to me that it might be nice if x^literal were parsed as a call to x^Val{literal}, at least for small integer literals (say, between -32 and 32). There would still be a fallback ^{p}(x, ::Type{Val{p}}) = x^p that just called the regular ^ function, of course.

This way:

  • Dimensionful calculations (e.g. Unitful) could be type-stable even if they involve x^2 and similar common small exponents.

  • We could make x^-n faster for small literal n (^(-n) is slow #8939).

  • We could implement the optimal addition-chain exponentiation algorithm for small integer exponents, completely unrolled. (For the hardware numeric types IIRC we call pow which lowers LLVM's powi intrinsic, which probably does something like this. But this does not happen for other types. See also ^ is slow #2741)

  • Types which support certain cases especially efficiently can implement their own methods. e.g. MPFR has an optimized mpfr_sqr for x^2 that could be used by BigFloat.

  • We could make x^-n work in a type-stable way for integer x in the common case of literal n (Exponentiation by a negative power throws a DomainError #3024).

cc @ajkeller34, @timholy

@ararslan ararslan added maths Mathematical functions parser Language parsing and surface syntax speculative Whether the change will be implemented is speculative labels Feb 8, 2017
@StefanKarpinski
Copy link
Member

StefanKarpinski commented Feb 8, 2017

The main issue, as always with special syntactic treatment of literals, is that x^3 might do something different than p = 3; x^p which seems potentially quite confusing. We could consider that a documentation issue and just say that x^literal has a different meaning than x^expression.

@stevengj
Copy link
Member Author

stevengj commented Feb 8, 2017

@StefanKarpinski, I don't see that as being a problem in practice. The main practical effect is that the performance and the roundoff errors could be different in the two cases, but we can document this and I would think that most practitioners would be willing to accept that in order to get the benefits of hard-coded small powers.

Of course, someone could define a screwy method like ^(x, ::Type{Val{3}}) = x^2, but there are already lots of ways that people could potentially override Base functions to do something unexpected.

@stevengj
Copy link
Member Author

stevengj commented Feb 8, 2017

Just submitted a PR to make this concrete. The parser (actually lowering) changes are quite easy.

@StefanKarpinski
Copy link
Member

I guess the next natural question is if we should allow small negative literal powers since they could be done in a type-stable way with this change. This would probably eliminate 99% of complaints about things like n^-2 not working...

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Feb 8, 2017

I.e. on this PR we can do the following:

julia> 5^-2
ERROR: DomainError:
Cannot raise an integer x to a negative power -n.
Make x a float by adding a zero decimal (e.g. 2.0^-n instead of 2^-n), or write 1/x^n, float(x)^-n, or (x//1)^-n.
Stacktrace:
 [1] power_by_squaring(::Int64, ::Int64) at ./intfuncs.jl:170
 [2] ^(::Int64, ::Type{Val{-2}}) at ./intfuncs.jl:201

julia> Base.:^(x::Integer, ::Type{Val{p}}) where {p} =
           p < 0 ? 1/Base.power_by_squaring(x,-p) : Base.power_by_squaring(x,p)

julia> 5^-2
0.04

julia> 5^-31
1.2448081382545486e-19

julia> 5^-32
ERROR: DomainError:
Cannot raise an integer x to a negative power -n.
Make x a float by adding a zero decimal (e.g. 2.0^-n instead of 2^-n), or write 1/x^n, float(x)^-n, or (x//1)^-n.
Stacktrace:
 [1] power_by_squaring(::Int64, ::Int64) at ./intfuncs.jl:170
 [2] ^(::Int64, ::Int64) at ./intfuncs.jl:194

julia> p = -2
-2

julia> 5^p
ERROR: DomainError:
Cannot raise an integer x to a negative power -n.
Make x a float by adding a zero decimal (e.g. 2.0^-n instead of 2^-n), or write 1/x^n, float(x)^-n, or (x//1)^-n.
Stacktrace:
 [1] power_by_squaring(::Int64, ::Int64) at ./intfuncs.jl:170
 [2] ^(::Int64, ::Int64) at ./intfuncs.jl:194

julia> 5^-p
25

The power of #265 always makes me smile :)

There is something somehow quite different about ^ than other numeric operators. I think it's the fact that it is so often used with a literal argument, especially as a shorthand for writing things like x*x and x*x*x.

@stevengj
Copy link
Member Author

stevengj commented Feb 8, 2017

Now that I think about it, there's no reason we should restrict this to small integers. We might as well do it for all literal integers, since code generation only occurs for the integers that someone actually uses.

@StefanKarpinski
Copy link
Member

That strikes me as somewhat dangerous. It seems almost inevitable that someone will generate code that uses a lot of literal powers and ends up generating absurd amounts of code. OTOH, arbitrary cutoff like this are pretty annoying.

@stevengj
Copy link
Member Author

stevengj commented Feb 9, 2017

Since the amount of code generated for ^ methods is (at worst) proportional to the amount of user code, I don't see the problem. It is just a fixed percentage overhead, at worst. In practice, I doubt users will be generating tons of code that just computes different powers, so the extra code generated seems likely to be quite small.

@StefanKarpinski
Copy link
Member

I worry about code the generates code that calls x^p with literal powers, but I guess in that case you want an efficient inlined implementation anyway, regardless of what p is.

@stevengj
Copy link
Member Author

stevengj commented Feb 9, 2017

@StefanKarpinski, I still don't see your worry. Say that I generate a long function that calls x^p for a whole bunch of different literal p values. Maybe it's a 2000-line function, and calls x^p for 1000 different p values. Yes, this will compile 1000 x^Val{p} methods. But it's compiling my 2000-line function anyway. What's the fractional overhead of those x^Val{p} methods? 30%? 50%? And in a more realistic example I would think that the x^p calls would be a smaller fraction of the generated code.

@vtjnash
Copy link
Member

vtjnash commented Feb 9, 2017

I generally think that implementing compiler optimizations via parser rules is a bit iffy. We already have the optimization that x^3 --> x*x*x during inlining. (This slightly reduces the accuracy of pow in this case, but the max error is still small.) In fact, we already (and do) optimize any power, except that the only ones that can actually be optimized are p = 2, p = -1, Base.FastMath.pow_fast(x, p). Beyond that, there errors start to become observable (e.g. it appears that this proposal would break my yet-to-be-merged test for the correctness of p = 3 https://github.com/JuliaLang/julia/pull/19890/files)

@vtjnash
Copy link
Member

vtjnash commented Feb 9, 2017

MPFR has an optimized mpfr_sqr for x^2 that could be used by BigFloat

It already dispatches to that method internal to MPFR. Given the cost of computing x*x, I would be surprised if the extra runtime test was a measurable cost.

@stevengj
Copy link
Member Author

stevengj commented Feb 9, 2017

What about all the cases that currently call power_by_squaring, like Complex^Int or Int^Int, which could be inlined? And there's still the issue of enabling type-stable dimensionful code that uses x^2 etc., and enabling type-stable negative literal powers of integers.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Feb 9, 2017

I really do think there's merit to the idea that x^p with literal exponents is fundamentally different. In other words, this isn't just a syntax-level (premature) optimization. You can view it as a philosophical assertion that x^2 is a fundamentally different operation than x^y. I think it's arguable. The fact that one generally wants x^2 to mean exactly x*x is part of the case, and the desire for x^-2 to mean an entirely different (but related) operation for integer x is another part of the case. The fact that literal powers are so incredibly common is yet another part of the case.

@andyferris
Copy link
Member

andyferris commented Feb 9, 2017

TLDR - it's probably better to skip this post and the next (the point I wanted to make here is 4 posts below).

There's more than this case whereby one parameter is typically static and controls the action of the function (and possibly it's output type), and I feel it would be nice to solve this generically rather that adding parser tricks.

If inference, constant propagation and inlining occurred together (or iterated a few times, or whatever), then we could follow this pattern:

^(a, b::Int) = power_function_builder(typeof(a), b)(a)

where power_function_builder can be @pure (or whatever) so that it returns a constant function or callable whenever b is a compile-time constant. In theory, power_function_builder could do arbitrary code a bit like a @generated function can.

In practice, you also have to consider the case where b is not a compile-time constant, so you probably only want a small amount of logic/branches in power_function_builder such as b being negative, 0, 1, 2 or larger.

In the case like ^(::Float64, ::Int) you probably need to make sure the return of power_function_builder is type stable by wrapping a function pointer or something like that.

In the case like dimensionful ^ int you probably only care about optimizing when int is a compile-time constant, so standard Julia functions would suffice.

This approach opens up many new possible API options and optimizations both within Base and for end-users. For me personally, I would use this pattern in StaticArrays, TypedTables (or something like it), and more. You could potentially get rid of some of the tfuncs.

EDIT: Sorry, I got completely muddled in the above. Not that it is wrong, but function_builder is complete and utter overkill. In most cases we simply need to get the constant inside a standard inlined function so that branch elimination, type inference, etc are enabled on its value, so that branch elimination can be used in a method like this: @inline ^(x::Float64, n::Int) = n == 2 ? x*x : pow(x, n). See my post 4 comments below.

(Apologies - I really shouldn't post when I'm tired!)

@andyferris
Copy link
Member

A couple of observations:

  • My suggestion seems to have the danger of introducing a "design pattern". If you subscribe to the thought that patterns indicate some feature missing from a language, you could alternatively implement a thing where you can annotate inputs to a function as being "specialized on value" and have the compiler make code for each constant input as well as a generic one for when this is not constant. Speculating about the case of ^, the generic one might have branches which would be eliminated when the exponent is known.

  • Do generated functions fit into this "pattern" by having f(x...) = function_builder(map(typeof, x)...)(x...)? Can we do that already?

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Feb 9, 2017

I don't really see how the function_builder business would work or why it's particularly better. Isn't that just a generated function with dispatch on values? Seems like a step in the wrong direction, given that we're trying to discourage the use of generated functions.

I normally 100% agree with the sentiment about "parser tricks" but you're missing my core point: the difference between x^2 and x^y is semantically significant – the former is shorthand for x*x whereas the latter is computing an entirely different function that happens to approximately coincide when y = 2. You can think of the ^2 part as its own syntactic feature. The question is whether this is appropriate and beneficial or not. At one point we even considered parsing ^- (as in x^-2) as a different operator to allow exponentiation by negative exponents to work as expected, but decided that was a little too weird and tricky. This isn't cut and dried, but there's something to the fact that the difference between x^n for literal n and x^y for variable y keeps cropping up as a problem over and over again. The same thing does not happen for x + y or x * y – or any other arithmetic operation of function, for that matter.

@stevengj
Copy link
Member Author

stevengj commented Feb 9, 2017

Just a couple of data points: for f(x) = x^5 and g(x) = let y=x*x; y*y*x; end, I get:

julia> @btime f(3); @btime g(3);
  5.382 ns (0 allocations: 0 bytes)
  1.377 ns (0 allocations: 0 bytes)

julia> @btime f(3.0); @btime g(3.0);
  1.648 ns (0 allocations: 0 bytes)
  1.377 ns (0 allocations: 0 bytes)

julia> x = 3.0+4im; @btime f($x); @btime g($x);
  10.057 ns (0 allocations: 0 bytes)
  4.469 ns (0 allocations: 0 bytes)

Not surprisingly, the biggest benefits of inlining small powers are precisely in the cases where we currently use power_by_squaring and the individual multiplications are cheap (so the bookkeeping overhead of the non-inlined case is large).

But I think the non-performance implications for dimensionful code are more important, and I agree with @StefanKarpinski that literal integer exponents are conceptually a somewhat distinct operation.

@andyferris
Copy link
Member

andyferris commented Feb 9, 2017

You're completely right @StefanKarpinski about the function_builder being overkill. I apologize - I got a bit muddled thinking about the higher-order functional and generated function aspects of recursive inference/inlining. You certainly don't need function_builder for this (or StaticArrays, or TypedTables).

The simple (and correct) version of the point I was making was this: with constants injected into inlined functions, then e.g. @inline ^(x::Float64, n::Int) = n == 2 ? x*x : pow(x, n) could simplify whenever inference knows the value of n as 2. Of course that implementation isn't sufficient for all n, but I was wondering if taking advantage of existing branch elimination might yield the kind of performance that a parser transformation might?

I understand what you are saying about x^y and x^2 seeming semantically distinct. But allowing different behavior / results (even floating-point errors) for the former when y = 2 to the latter would seem to be breaking a fundamental expectation about the pureness of functions in programming languages. I don't think previous parser transformations have done this, have they? (Does Ac_mul_B count? Maybe it is the same). If we think of the two as fundamentally distinct, it leads me to speculate if a separate symbol/operator is the way to go, e.g. x ^^ n requires n literal and integer and always expands out (by the parser) to an explicit power-by-squaring method (perhaps with inv somewhere for negative n). The other thing I usually do is mentally equate literals and constants (so ideally we'd get the same behavior for n=2; x^n as we do for x^2) but unfortunately the parser can't do that.

(Finally, I'm wondering if at least part of the perception of a conceptual difference for when the base is dimensionful is simply because it breaks type inference? I don't think that is insurmountable - e.g. maybe we could use "lispy" recursion to append extra *x to the end, in an method like ^{n}(x, Val{n}) (or a more complex version for building power-by-squaring), and use a constant propagation -> inlining -> constant propagation step to transform n -> Val{n} (and similarly for ntuple(), which would be nice too...). )

@andyferris
Copy link
Member

@stevengj they are some nice speedups!

@stevengj
Copy link
Member Author

@andyferris, "pureness" is not the right word here. This is not a violation of function purity. It is classic syntactic sugar: the meaning is determined purely by the "spelling" of the code and not by anything else.

^literal is special because it seems to be the only common function where there is a big gain in functionality (not just performance) from being able to dispatch differently when the value of one of the arguments is known at compile-time. In contrast, one commonly does x*literal as well, but the generic method is perfectly fine there.

@andyferris
Copy link
Member

Right, that makes sense. The thing I was aiming to achieve here was expanding "literal" to "compile-time constant" (I note you use the two interchangeably in your post, but strictly speaking, literal is a subset of constant).

I do support the Val transformation here completely - but I also get jealous writing packages where I can't implement my special sugar to make everything work nicely without asking the user to explicitly write Val or whatever. Doing the parser trick is a great quick-win for exponentiation, but I aimed to point out that there exists a possible compiler optimization where we all can win in variety of settings. That's no reason to hold this up for v0.6, however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions parser Language parsing and surface syntax speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests

5 participants