Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: A path forward on --check-bounds #50239

Closed
wants to merge 4 commits into from
Closed

RFC: A path forward on --check-bounds #50239

wants to merge 4 commits into from

Conversation

Keno
Copy link
Member

@Keno Keno commented Jun 20, 2023

In 1.9, --check-bounds=no has started causing significant performance regressions (e.g. #50110). This is because we switched a number of functions that used to be @pure to new effects-based infrastructure, which very closely tracks the the legality conditions for concrete evaluation. Unfortunately, disabling bounds checking completely invalidates all prior legality analysis, so the only realistic path we have is to completely disable it.

In general, we are learning that these kinds of global make-things-faster-but-unsafe flags are highly problematic for a language for several reasons:

  • Code is written with the assumption of a particular mode being chosen, so it is in general not possible or unsafe to compose libraries (which in a language like julia is a huge problem).

  • Unsafe semantics are often harder for the compiler to reason about, causing unexpected performance issues (although the 1.9 --check-bounds=no issues are worse than just disabling concrete eval for things that use bounds checking)

In general, I'd like to remove the --check-bounds= option entirely (#48245), but that proposal has encountered major opposition.

This PR implements an alternative proposal: We introduce a new function Core.should_check_bounds(boundscheck::Bool) = boundscheck. This function is passed the result of Expr(:boundscheck) (which is now purely determined by the inliner based on @inbounds, without regard for the command line flag).

In this proposal, what the command line flag does is simply redefine this function to either true or false (unconditionally) depending on the value of the flag.

Of course, this causes massive amounts of recompilation, but I think this can be addressed by adding logic to loading that loads a pkgimage with appropriate definitions to cure the invalidations. The special logic we have now now to take account of the --check-bounds flag in .ji selection, would be replaced by automatically injecting the special pkgimage as a dependency to every loaded image. This part isn't implemented in this PR, but I think it's reasonable to do.

I think with that, the --check-bounds flag remains functional, while having much more well defined behavior, as it relies on the standard world age mechanisms.

A major benefit of this approach is that it can be scoped appropriately using overlay tables. For exmaple:

julia> using CassetteOverlay

julia> @MethodTable AssumeInboundsTable;

julia> @overlay AssumeInboundsTable Core.should_check_bounds(b::Bool) = false;

julia> assume_inbounds = @overlaypass AssumeInboundsTable

julia> assume_inbounds(f, args...) # f(args...) with bounds checking disabled dynamically

Similar logic applies to GPUCompiler, which already supports overlay tables.

@gbaraldi
Copy link
Member

This allows the compiler to run with checkbounds on when --check-bounds=no is set?

@gbaraldi
Copy link
Member

Or is it still bad with the global flag but initiates the machinery for a cassete overlay like package?

@Keno
Copy link
Member Author

Keno commented Jun 20, 2023

The compiler no longer has a notion of checkbounds, only the explicit bounds check arguments to the intrinsic, which gets overriden by the new function in Core, so you can mix and match using overlays or world age.

@gbaraldi
Copy link
Member

Right, but what if I start julia with --check-bounds = no and it attemps concrete eval? Or does it check the argument to the intrinsic and doesn't do the concrete eval?

@oscardssmith
Copy link
Member

oscardssmith commented Jun 20, 2023

This doesn't make --checkbounds=no better, but it allows for users of --checkbounds=no to hopefully move to setting this in their program (which will mean it doesn't affect things like typejoin).

@staticfloat
Copy link
Member

staticfloat commented Jun 20, 2023

I think this can be addressed by adding logic to loading that loads a pkgimage with appropriate definitions to cure the invalidations.

What if we used compile-time preferences for this? E.g. we set a "compile-time" preference for some package (potentially.... Core.Compiler?) that causes Julia to reject the previously-built pkgimages (maybe even the sysimg itself?) because the new set of preferences does not match those baked into the header of the pkgimages.

So concretely, what I'm proposing is that we map a command-line flag to setting a special compile-time preference that is then read from by should_check_bounds(). This automatically takes care of invalidating package images that depend on this compile-time preference, and leaves alone packages that don't dabble with this.

@vchuravy
Copy link
Member

What if we used compile-time preferences for this? E.g. we set a "compile-time" preference for some package (potentially.... Core.Compiler?) that causes Julia to reject the previously-built pkgimages (maybe even the sysimg itself?) because the new set of preferences does not match those baked into the header of the pkgimages.

That would generally be a good way forward for these kind of options, but I think independent of this?

I am a bit worried about the caching semantics, the current way of embedding it into the selection flags is a bit of a crutch and compile-time preference may be better, but ideally we would use something akin to multi-versioning to prevent the proliferation of cache files.

To that extend (and for the FMA&co) I have been thinking about the notion of Rusty's Target-Feature flags (https://rust-lang.github.io/rfcs/2045-target-feature.html).
This would be something like conditional dispatch/overlay methods with filters.

A method table would be instantiated with a set of target flags and methods would be made available if they match the set of target flags. If you then could do something like "many set inference" you could create cache files that match a set of combination of Cache-Files. This is of course vastly more complicated than this proposal, but it would open up the potential for Cross-Compilation and dealing with these pesky thinks like has_fma, preferred vector-width,...

@Keno
Copy link
Member Author

Keno commented Jun 20, 2023

Right, but what if I start julia with --check-bounds = no and it attemps concrete eval? Or does it check the argument to the intrinsic and doesn't do the concrete eval?

Yes, having anything other than Const(true) on the boundscheck argument of the intrinsic taints :consistent.

base/client.jl Outdated
Comment on lines 278 to 280
Core.eval(Main, :(Core.should_check_bounds(boundscheck::Bool) = true))
else
Core.eval(Main, :(Core.should_check_bounds(boundscheck::Bool) = false))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this we could have a lib like:

module BaseInbounds

using PrecompileTools

@recompile_invalidations begin
    Core.should_check_bounds(boundscheck::Bool) = true
end

end

and load that here?

That definition does take a while to compile. Loading it is 329 ms and on-disk size is 111M.

Right now this doesn't quite seem to work due to:

┌ BaseInbounds [e8e3d1da-c89d-4456-81ea-eb3240fbf100]
│  WARNING: Method definition should_check_bounds(Bool) in module Core at boot.jl:855 overwritten in module BaseInbounds at /home/vchuravy/src/julia/stdlib/BaseInbounds/src/BaseInbounds.jl:6.
│    ** incremental compilation may be fatally broken for this module **
└ 
julia> using BaseInbounds

julia> f(A) = A[100]
f (generic function with 1 method)

julia> @code_typed f([1])
CodeInfo(
1 ─ %1 = Base.arrayref(true, A, 100)::Int64
└──      return %1
) => Int64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this we could have a lib like:

Yes, I mentioned that in the original post, but it needs some special loading logic

@vchuravy
Copy link
Member

So I quite like this as the first step towards a full solution. In particular it would also allow us to naturally implement @cuda inline=no using overlaying-overlay tables.

I think my biggest worry is the caching implication.

Right-now we don't provide a sysimg that was compiled with --check-bounds=no and ignore that request for any code in the sysimage (I guess that's a bug, but who is counting). So the redefinition would invalidate large swaths of Base right now. I guess the compiler etc. are guarded from that invalidation.

Using this branch as-is just starting the REPL takes quite a bit longer.

Right now we are in a situation where users of --check-bounds=no pay for it themselves using disk-space (since it splits the pkg cache fully), so would we ship "BaseInbounds" as part of Julia?

@JeffBezanson
Copy link
Member

Triage thinks this is definitely moving in the right direction. The caching situation still needs to be decided on though.

@Keno
Copy link
Member Author

Keno commented Jul 18, 2023

Caching now implemented. Before caching:
Before caching:

keno@deepsea3:~/julia$ time ./julia --check-bounds=no -e 'exit()'

real	0m2.617s
user	0m2.967s
sys	0m1.589s

After caching:

time ./julia --check-bounds=no -e 'exit()'

real	0m0.635s
user	0m0.647s
sys	0m0.670s

REPL feels snappy also.

Keno added a commit that referenced this pull request Jul 19, 2023
In #48919, the tid selection logic inside `enq_task` gained a
`!GC.in_finalizer()` condition. However, this made it possible
for `workqueue_at` to be reached with `tid==0`, which would
attempt and out-of-bounds write under `@inbounds`, corrupting
memory. This was not caught in the test suite despite
`--check-bounds=yes`, because our `--check-bounds=yes` is currently
best effort. That would be fixed by #50239, which exposed this
bug.
@@ -1981,7 +1984,7 @@ static int is_replacing(char ambig, jl_value_t *type, jl_method_t *m, jl_method_
return 1;
}

JL_DLLEXPORT void jl_method_table_insert(jl_methtable_t *mt, jl_method_t *method, jl_tupletype_t *simpletype)
JL_DLLEXPORT void jl_method_table_insert(jl_methtable_t *mt, jl_method_t *method, jl_tupletype_t *simpletype, int nowarn_overwrite)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JL_DLLEXPORT void jl_method_table_insert(jl_methtable_t *mt, jl_method_t *method, jl_tupletype_t *simpletype, int nowarn_overwrite)
JL_DLLEXPORT void jl_method_table_insert(jl_methtable_t *mt, jl_method_t *method, jl_tupletype_t *simpletype, int warn_overwrite)

Keno added a commit that referenced this pull request Jul 19, 2023
In #48919, the tid selection logic inside `enq_task` gained a
`!GC.in_finalizer()` condition. However, this made it possible for
`workqueue_at` to be reached with `tid==0`, which would attempt and
out-of-bounds write under `@inbounds`, corrupting memory. This was not
caught in the test suite despite `--check-bounds=yes`, because our
`--check-bounds=yes` is currently best effort. That would be fixed by
#50239, which exposed this bug. This PR attempts to
fix this by marking any tasks launched inside a finalizer as not sticky.
Finalizers don't have any thread they run on
semantically, so i don't think there's a meaningful sense in which tasks
launched inside finalizers could be sticky.
Keno and others added 4 commits July 19, 2023 20:28
In 1.9, `--check-bounds=no` has started causing significant performance
regressions (e.g. #50110). This is because we switched a number of functions that
used to be `@pure` to new effects-based infrastructure, which very closely tracks
the the legality conditions for concrete evaluation. Unfortunately, disabling
bounds checking completely invalidates all prior legality analysis, so the only
realistic path we have is to completely disable it.

In general, we are learning that these kinds of global make-things-faster-but-unsafe
flags are highly problematic for a language for several reasons:

- Code is written with the assumption of a particular mode being chosen, so
  it is in general not possible or unsafe to compose libraries (which in a language
  like julia is a huge problem).

- Unsafe semantics are often harder for the compiler to reason about, causing
  unexpected performance issues (although the 1.9 --check-bounds=no issues are
  worse than just disabling concrete eval for things that use bounds checking)

In general, I'd like to remove the `--check-bounds=` option entirely (#48245),
but that proposal has encountered major opposition.

This PR implements an alternative proposal: We introduce a new function
`Core.should_check_bounds(boundscheck::Bool) = boundscheck`. This function is
passed the result of `Expr(:boundscheck)` (which is now purely determined by
the inliner based on `@inbounds`, without regard for the command line flag).

In this proposal, what the command line flag does is simply redefine this
function to either `true` or `false` (unconditionally) depending on the
value of the flag.

Of course, this causes massive amounts of recompilation, but I think this can
be addressed by adding logic to loading that loads a pkgimage with appropriate
definitions to cure the invalidations. The special logic we have now now
to take account of the --check-bounds flag in .ji selection, would be replaced
by automatically injecting the special pkgimage as a dependency to every
loaded image. This part isn't implemented in this PR, but I think it's reasonable
to do.

I think with that, the `--check-bounds` flag remains functional, while having
much more well defined behavior, as it relies on the standard world age
mechanisms.

A major benefit of this approach is that it can be scoped appropriately using
overlay tables. For exmaple:

```
julia> using CassetteOverlay

julia> @MethodTable AssumeInboundsTable;

julia> @overlay AssumeInboundsTable Core.should_check_bounds(b::Bool) = false;

julia> assume_inbounds = @overlaypass AssumeInboundsTable

julia> assume_inbounds(f, args...) # f(args...) with bounds checking disabled dynamically
```

Similar logic applies to GPUCompiler, which already supports overlay tables.
@Keno Keno force-pushed the kf/checkboundsrfc branch from 26c14f8 to d33cece Compare July 19, 2023 20:28
end

Recompile any invalidations that occur within the given expression. This is generally intended to be used
by users in creating "Startup" packages to ensure that the code compiled by package authors is not invalidated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
by users in creating "Startup" packages to ensure that the code compiled by package authors is not invalidated.
by users in creating "Startup" packages to ensure that the code compiled by package authors is not invalidated.
!!! note
Users should use `PrecompileTools.jl`. This macro is solely for internal usage in the Julia bootstrapping process.

@@ -1653,6 +1656,7 @@ const include_callbacks = Any[]

# used to optionally track dependencies when requiring a module:
const _concrete_dependencies = Pair{PkgId,UInt128}[] # these dependency versions are "set in stone", and the process should try to avoid invalidating them
const _mandatory_dependencies = Pair{PkgId,UInt128}[] # Like `_concrete_dependencies`, but required to actually be loaded (in order) in every precompile process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed since the --check-bounds=no stdlib is a "virtual dependency"?
Could we avoid duplicating this logic by having the load happen in Julia proper?

I am not fully sure why the __precompile__(:mandatory) is needed, so expanding comments throughout the code would be helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question.

@@ -2853,17 +2877,17 @@ struct CacheFlags
# OOICCDDP - see jl_cache_flags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# OOICCDDP - see jl_cache_flags
# OOIOCDDP - see jl_cache_flags

Comment on lines +2968 to +2977
# Check whether this is in the current list of mandatory packages
for (mandatory_modkey, mandatory_build_id) in _mandatory_dependencies
if mandatory_modkey == modkey && mandatory_build_id == build_id
@goto mandatory_ok
end
end
@debug "Rejecting cache file $cachefile for $modkey since it is a mandatory package that we have not loaded"
return true
@label mandatory_ok
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Check whether this is in the current list of mandatory packages
for (mandatory_modkey, mandatory_build_id) in _mandatory_dependencies
if mandatory_modkey == modkey && mandatory_build_id == build_id
@goto mandatory_ok
end
end
@debug "Rejecting cache file $cachefile for $modkey since it is a mandatory package that we have not loaded"
return true
@label mandatory_ok
end
# Check whether this is in the current list of mandatory packages
found = false
for (mandatory_modkey, mandatory_build_id) in _mandatory_dependencies
if mandatory_modkey == modkey && mandatory_build_id == build_id
found = true
break
end
end
if !found
@debug "Rejecting cache file $cachefile for $modkey since it is a mandatory package that we have not loaded"
return true
end
end

Since no other piece of this code uses gotos

@@ -597,18 +597,25 @@ static void write_mod_list(ios_t *s, jl_array_t *a)
// OPT_LEVEL should always be the upper bits
#define OPT_LEVEL 6

int8_t cache_is_mandatory = 0;
JL_DLLEXPORT uint8_t jl_cache_flags(void)
{
// OOICCDDP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// OOICCDDP
// OOIOCDDP

@@ -0,0 +1,2 @@
name = "--check-bounds=no"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is almost to cute. Maybe simply: CheckBoundsNo?

One other thought I just had is that it is kinda weird to use "distinct" packages for this kind of options settings.
Could we instead use Preferences? That already interacts with the caching subsystem.

cc: @staticfloat

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be a cache file that stores the curing definitions. Preferences don't (currently) have any mechanism that has identity for a cache file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was thinking that we would have one stdlib instead of two.

module CheckBounds

const my_uuid = #
Base.record_compiletime_preference(my_uuid, "check-bounds")
let d = Base.get_preferences(my_uuid)
    global const check_bounds = d["check-bounds"]
end

@Base.recompile_invalidations begin
    Core.should_check_bounds(boundscheck::Bool) = check_bounds
end

Compiletime preferences split the cache and you get the same curing effect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be fine, but does seem somewhat more complicated. Are there any other advantage of using Preferences that you're imagining that I'm not seeing. Also the precompile(:mandatory) is still required even with this setup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elliot and I were discussing that we might want to add Preferences usage for Julia instead of pushing more and more stuff into command line flags.

So while the user could use julia --check-bounds=no, they could also use their local LocalPreferences.toml to set

[CheckBounds]
check-bounds=no

I kinda like that from a user interface perspective.

Can you explain why the precompile(:mandatory) a bit more? I am not sure what problem it solves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why the precompile(:mandatory) a bit more? I am not sure what problem it solves.

Overwriting methods is disallowed in packages because we don't semantically impose an order on them. The :mandatory annotation requires that all future package loads implicitly gain the package as a dependency, so that there is a well-defined ordering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what #50578 turned into an explicit error. Gotcha


@testset "CacheFlags" begin
cf = Base.CacheFlags()
opts = Base.JLOptions()
@test cf.use_pkgimages == opts.use_pkgimages
@test cf.debug_level == opts.debug_level
@test cf.check_bounds == opts.check_bounds
@test cf.inline == opts.can_inline
@test cf.opt_level == opts.opt_level

# OOICCDDP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# OOICCDDP
# OOIOCDDP

Keno added a commit that referenced this pull request Jul 23, 2023
This proposal is an attempt to tie together some of the recent discussion around
the future of `@inbounds`, formal interface checking (a long the lines of
my proposal in https://github.com/Keno/InterfaceSpecs.jl), and `--check-bounds`.

# Reviewing `@inbounds`

## History
The `@inbounds` annotation has been in the langauge for a decade at this point [1].
It is a crictical tool to achieve machine performance in high-performance applications
where the overhead of boundschecking is prohibitive [2]. At the same time, as Julia
has matured, the dangers of this macro have become more clear. It is essentially
impossible to use correctly in generic code and instances where code (including
code in base from before we were culturally more careful about inbounds) are
using `@inbounds` have become a not-uncommon complaint-point about the language
as a whole [3]. In current practice, at least in Base, we have become extremely
hesitant about the introduction of new `@inbounds` annotations (see e.g.
discussion in #41200). At the same time, the ability to disable bounds checking
remains a critical capability for several of our user communities (c.f. #50239
and related discussions). So how do we move forward here? I'm hoping we can find
a way forward that achieves the same benefits of `@inbounds` but in a way where
it is possible to re-establish soundness if desired.

## When is inbounds still required?

### LLVM's current capabilities

Let's look at a couple of examples. First:
```
function sum1(A::Vector{Int64})
	a = zero(eltype(A))
	for i in eachindex(A)
		a += A[i]
	end
	return a
end
```

Is inbounds required here in modern julia? The answer is no - LLVM's range analysis
proves that the bounds check is dead and eliminates it.
(The same holds true for the `1:length(A)` version, although this was not always true).

What about the following:
```
function sum2(A::Vector{Int64}, B::Vector{Int64})
	a = zero(eltype(A))
	for i in eachindex(A, B)
		a += A[i] + B[i]
	end
	return a
end
```

Here, LLVM is again able to eliminate the boundscheck, though of course the
`eachindex` introduces an additional check that the array shapes are compatible.
Even this is still ok-ish:

```
# Noinline here just for illustration, think some big internal function that is
# not inlineable.
@noinline function sum_partial(A::Vector{Int64}, upto::Int)
	a = zero(eltype(A))
	for i in 1:upto
		a += A[i]
	end
	return a
end
sum3(A::Vector{Int64}) = sum_partial(A, length(A))
```

The generated code here runs through the loop. LLVM's vectorizer has support for bounds-like
checks and can remove them from the vector body. However, in scalar code (e.g. Float64
without fastmath), LLVM does still perform the bounds check in every loop iteration rather
than hoisting it outside. This is probably a bug somewhere. The IRCE pass is supposed to
take care of this, so for the present purposes, let's assume that this will also go through
eventually.

That said, the takeaway here is that, for simple cases, where everything is inlined, LLVM
is reasonably powerful at eliminating these checks.

### The effect-separation trick

Let's consider a case like this
```
function repeat_outer(a::AbstractMatrix, (m,n)::NTuple{2, Any})
    o, p = size(a,1), size(a,2)
    b = similar(a, o*m, p*n)
    for j=1:n
        d = (j-1)*p+1
        R = d:d+p-1
        for i=1:m
            c = (i-1)*o+1
            b[c:c+o-1, R] = a
        end
    end
    return b
end
```

The setindex! call in here goes through a few layers of dispatch, but eventually
ends up at:

```
function _setindex!(l::IndexStyle, A::AbstractArray, x, I::Union{Real, AbstractArray}...)
    @inline
    @BoundsCheck checkbounds(A, I...)
    _unsafe_setindex!(l, _maybe_reshape(l, A, I...), x, I...)
    A
end
```

This is a very common pattern in our code, where we have an `@inline` function that
first checks for the in-boundedness and then performs an unsafe operation that assumes
everything is inbounds.

This pattern is quite good. By allowing the boundscheck itself to be inlined, LLVM can
eliminate the boundscheck using local context from the calling function (in theory
at least - in practice we still have an `@inbounds` there because LLVM isn't strong
enough currently).

However, the pattern is somewhat hard to use consistently and correctly, so we generally
only use it in the low level indexing code.

# Effect-precondition synthesis

## `@inbounds` motivations

In the previous section I said that the effect-separation was good but hard
to do consistently and correctly. So can we have the compiler implement this
for us. Suppose we had a macro `@hoist_boundschecks` that worked something like
the following:

```
function _setindex!(l::IndexStyle, A::AbstractArray, x, I::Union{Real, AbstractArray}...)
    @inline
    @hoist_boundscheck _safe_setindex!(l, _maybe_reshape(l, A, I...), x, I...)
    A
end
# = Expands to =#
function _setindex!(l::IndexStyle, A::AbstractArray, x, I::Union{Real, AbstractArray}...)
    @inline
    #= All the boundschecks from _safe_setindex! hoisted =# checkbounds(A, I...)
    #= Switched to unsafe_setindex, but now proven correct =#
    _unsafe_setindex!(l, _maybe_reshape(l, A, I...), x, I...)
    A
end
```

Where essentially what the macro would do is go through the `_safe_setindex!` call
and synthesize a check that ensures that all calls are inbounds. Of course
such a synthesis is not always possible in general, but it's not that hard for simple
cases (a trivial way to do it is to duplicate the function, delete all the side
effects other than throwing boundserrors and then let LLVM go to town on it).

## Generalizing

There is a few problems with the macro as suggested in the previous section. In particular,
it changes the ordering of side effects and errors in the function, which may be desirable,
but I would like have explicit control over. My proposal is to have a slight generalization
of it that works roughly as follows:

```
function foo(A, I)
	@split_effects :nothrow bar(A, I)
end
#= Expands to =#
function foo(A, I)
	if precondition(Val{:nothrow}(), bar, A, I)
		@assume_effects :nothrow bar(A, I)
	else
		bar(A, I)
	end
end
```

Where `precondition` like the proposed macro above is some synthesized predicate
that ensures that the resulting function does not throw. Of course this brings up
many questions, such as the implementation of `precondition` and the statement-position
`@assume_effects`, which we currently don't have, but let's explore the implications
of this a bit further.

## Precondition inference and composition

The question of coming up with these kinds of preconditions is a well-known problem in the
compiler literature. The term to look for is "weakest precondition synthesis". How exactly
do this best is somewhat outside the scope of this writeup, but I do want to highlight
a the primary property that suggests the idea: functional composition of functions is boolean
composition of preconditions:

```
function foo()
	bar()
	baz()
end

precondition(n::Val{:nothrow}, foo) = precondition(n, bar) && precondition(n, baz)
```

Also, if a function uses the `@split_effects` macro internally, then an outside
`@assume_effects` can cause the precondition to be assumed to be true. This mirrors
the `@inbounds/@boundscheck` interaction we have right now, but with an alternative
that is provably safe.

# Extensions to `@assume_effects`

So far, we have assumed that these preconditions are synthesized automatically, but
getting this to work well, of course depends on the synthesis strength of the compiler.
To still allow users to take advantage of this mechanism, even if the compiler's synthesis
is not strong enough, we can extend @assume_effects to allow conditional effects:

```
@assume_effects (checkbounds(false, A, I) && :nothrow) setindex!(A, I)
	_safe_setindex!(A, I)
end
```

The idea here is that precondition is overriden by `checkbounds(false, A, I)`, so
any `@split_effects` of this function would use the `checkbounds` function for its
check and if this returned true, could `@assume_effects :nothrow` this function,
which as described above would allow the use of the unsafe function in the interior.

## Call-site `@assume_effects` and effects-assumption specialization

In the foregoing we have, in a few places, used a call-site `@assume_effects`, without
defining what it does. The idea here is pretty simple: We add a new specializaiton
axis based on effects assumption. For example, if a function is `@assume_effects :nothrow`,
then at codegen time, any path that throws an error (in particular bounds checks)
becomes undefined behavior and LLVM is allowed to assume that it does not happen.
Of course, this is macro is extremely dangerous (as the existing @assume_effects
and @inbounds are).

However, one significant benefit of this design is that there is a very clear notion of
what the assertion promises. This is not necessarily clear of `@inbounds`, which we
did not give semantics beyond it's effect on the `@boundscheck` macro. As a result,
even an infinitely-powerful prover could not check the correctness or non-correctness
of `@inbounds` (as currenlty used - we could of course consider an alternative @inbounds
design with stronger semantics). In contrast, the formal meaning of a conditional
`@assume_effects` is well defined and could in principle be checked by a tool (#49273).

# Implications for `--check-bounds`

In the whole, `--check-bounds` removal discussion, we had assumed that we did not want
to keep two copies of all code just for the purposes of `--check-bounds` which thus
required us disable constant propagation. However, in this proposal, the `@assume_effects`
specialization is exactly such a copy set and could thus be used for this purpose.
That said, this proposal would also hopefully provide a much stronger system for boundscheck
removal that would allow us to make `--check-bounds` much less necessary.

# Other uses

There are a few other places where domain checks can interfere with optimization.
For exmaple, consider the following situation:

```
function sin_loop(n)
	for i = 1:n
		# Imagine there was a whole bunch of other code here that used this value,
		# but all that got DCE'd, but we can't in general DCE `sin`, because it
		# may throw.
		sin(Float64(i))
	end
end
```
```
julia> @time sin_loop(1_000_000_000)
 20.383584 seconds
```

With the support in this PR, we can:
```
# The actual condition here is `!isinf`, but we're allowed to overapproximate and
# LLVM can handle `isfinite` better.
# TODO: In a more complete version of this PR, this precondition would be synthesized
@Base.assume_effects (isfinite(x) && :nothrow) @noinline function mysin(x::Float64)
	sin(x)
end

function sin_loop_split(n)
	for i = 1:n
		Core.invoke_split_effects(:nothrow, mysin, Float64(i))
	end
end
```
```
julia> @code_llvm sin_loop_split(1_000_000_000)
;  @ REPL[19]:1 within `sin_loop_split`
define void @julia_sin_loop_split_428(i64 signext %"n::Int64") #0 {
top:
;  @ REPL[19]:4 within `sin_loop_split`
  ret void
}

julia> @time sin_loop_split(1_000_000_000)
  0.000000 seconds
```

# Current status of this PR
This PR contains a complete sketch of this mechanism, including inference support
for the new intrinsic, as well as codegen and runtime support for effect-assumption
specialization. It also includes an (extremely incomplete) sketch of the supporting
macros. It does not implement any precondition synthesis logic. My plan is to support
synthesis for some of the simple `@inbounds` cases in Base, but then delagate fancier
synthesis support for packages, since that can get arbitrarily complicated.

# Implementation Plan

This PR itself is not suitable for merging, but if people like this direction, I would
like to get the basic pieces in in relatively short order. To that end, I would suggest
the following order of implementation as separate PRs once we've agreed on the overall plan:

1. New intrinsics, Method(Instance) fields, inference support
2. @assume_effects extensions
3. Codegen and specialization support, Code(Instance) fields
4. Basic Synthesis and `@inbounds` removal

[1] Introduced in 66ab577 for julia 0.2
[2] Note that it's usually not the boundschecking itself that is the problem,
    but rather that the presence of the boundschecking inhibits other optimizations.
[3] E.g. https://yuri.is/not-julia/
KristofferC pushed a commit that referenced this pull request Jul 24, 2023
In #48919, the tid selection logic inside `enq_task` gained a
`!GC.in_finalizer()` condition. However, this made it possible for
`workqueue_at` to be reached with `tid==0`, which would attempt and
out-of-bounds write under `@inbounds`, corrupting memory. This was not
caught in the test suite despite `--check-bounds=yes`, because our
`--check-bounds=yes` is currently best effort. That would be fixed by
#50239, which exposed this bug. This PR attempts to
fix this by marking any tasks launched inside a finalizer as not sticky.
Finalizers don't have any thread they run on
semantically, so i don't think there's a meaningful sense in which tasks
launched inside finalizers could be sticky.

(cherry picked from commit bd8350b)
@Keno
Copy link
Member Author

Keno commented Jul 28, 2023

I would propose sitting down at the hackathon tomorrow and talking this through. I think we can just knock this out and get this all behind us.

@Keno
Copy link
Member Author

Keno commented Jul 29, 2023

As discussed during the hackathon.

  1. We are abandoning this PR as is for --check-bounds purposes. The method overwriting implementation will still be used for checked arithmetic, which may re-use pieces of this PR, but probably not the PR as is.
  2. We will be accelerating work on RFC: Effect Preconditions - or - the future of @inbounds #50641 and using the effect assumptions infrastructure to implement --check-bounds.
  3. We are fine with using different mechanisms for bounds checking and checked arithmetic. We have some partial justification for this from out of bounds indexing being undefined behavior, but checked arithmetic violations not being undefined behavior.
  4. Checked arithmetic could potentially also use effect assumptions bits, but we don't think it's worth it.
  5. For system images and pkgimages, we will be caching both --check-bounds=yes and default options to address the issue we have right now where --check-bounds=yes requires a full recompilation.
  6. We did not reach a conclusion for caching --check-bounds=no, but it's possible to build a custom system image to do this.

@Keno Keno closed this Jul 29, 2023
Keno added a commit that referenced this pull request Aug 3, 2023
This adds operators `+%`, `-%`, `*%`, which are equivalent to the
non-`%` versions, but indicate an explicit semantic expectation that
twos completement wrapping behavior is expected and correct. As discussed
at JuliaCon 2014 and every year since, users have often requested
a way to opt into explicit overflow checking of arithmetic, whether
for debugging or because they have regulatory or procedural requirements
that expect to be able to do this. Having explicit operators for
overflowing semantics allows use cases that depend on overflow behavior
for correct functioning to explicitly opt-out of any such checking.

I want to explicitly emphasize that there are no plans to change
the default behavior of arithmetic in Julia, neither by introducing
error checking nor by making it undefined behavior (as in C). The
general consensus here is that while overflow checking can be useful,
and would be a fine default, even if hardware supported it efficiently
(which it doesn't), the performance costs of performing the check
(through inhibition of other optimization) is too high. In our experience
it also tends to be relatively harmless, even if it can be a very
rude awakeing to users coming from Python or other languages with
big-default integers.

The idea here is simply to give users another tool in their arsenal
for checking correctness. Think sanitizers, not language change.
This PR includes a macro `@Base.Experimental.make_all_arithmetic_checked`,
that will define overrides to make arithmetic checked, but does not
include any mechanism (e.g. #50239) to make this fast.

What is included in this PR:
 - Flisp parser changes to parse the new operators
 - Definitions of the new operators
 - Some basic replacements in base to give a flavor for using the
   new operator and make sure it works

Still to be done:
 - [] Parser changes in JuliaSyntax
 - [] Correct parsing for `+%` by itself, which currently parses as `+(%)`

The places to change in base were found by using the above-mentioned
macro and running the test suite. I did not work through the tests
exhaustively. We have many tests that explicitly expect overflow and
many others that we should go through on a case by case basis. The
idea here is merely to give an idea of the kind of changes that
may be required if overflow checking is enabled. I think they can
broadly be classed into:

- Crypto and hashing code that explicitly want modular arithmetic
- Bit twidelling code for arithmetic tricks (though many of these,
  particularly in Ryu, could probably be replaced with better
  abstractions).
- UInt8 range checks written by Stefan
- Misc
Keno added a commit that referenced this pull request Aug 3, 2023
This adds operators `+%`, `-%`, `*%`, which are equivalent to the
non-`%` versions, but indicate an explicit semantic expectation that
twos completement wrapping behavior is expected and correct. As discussed
at JuliaCon 2014 and every year since, users have often requested
a way to opt into explicit overflow checking of arithmetic, whether
for debugging or because they have regulatory or procedural requirements
that expect to be able to do this. Having explicit operators for
overflowing semantics allows use cases that depend on overflow behavior
for correct functioning to explicitly opt-out of any such checking.

I want to explicitly emphasize that there are no plans to change
the default behavior of arithmetic in Julia, neither by introducing
error checking nor by making it undefined behavior (as in C). The
general consensus here is that while overflow checking can be useful,
and would be a fine default, even if hardware supported it efficiently
(which it doesn't), the performance costs of performing the check
(through inhibition of other optimization) is too high. In our experience
it also tends to be relatively harmless, even if it can be a very
rude awakeing to users coming from Python or other languages with
big-default integers.

The idea here is simply to give users another tool in their arsenal
for checking correctness. Think sanitizers, not language change.
This PR includes a macro `@Base.Experimental.make_all_arithmetic_checked`,
that will define overrides to make arithmetic checked, but does not
include any mechanism (e.g. #50239) to make this fast.

What is included in this PR:
 - Flisp parser changes to parse the new operators
 - Definitions of the new operators
 - Some basic replacements in base to give a flavor for using the
   new operator and make sure it works

Still to be done:
 - [] Parser changes in JuliaSyntax
 - [] Correct parsing for `+%` by itself, which currently parses as `+(%)`

The places to change in base were found by using the above-mentioned
macro and running the test suite. I did not work through the tests
exhaustively. We have many tests that explicitly expect overflow and
many others that we should go through on a case by case basis. The
idea here is merely to give an idea of the kind of changes that
may be required if overflow checking is enabled. I think they can
broadly be classed into:

- Crypto and hashing code that explicitly want modular arithmetic
- Bit twidelling code for arithmetic tricks (though many of these,
  particularly in Ryu, could probably be replaced with better
  abstractions).
- UInt8 range checks written by Stefan
- Misc
@giordano giordano deleted the kf/checkboundsrfc branch February 25, 2024 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants