Possibly interesting features of Julia v1.8 #1075

ranocha · 2022-02-28T07:32:09Z

From https://github.com/JuliaLang/julia/blob/v1.8.0-beta1/NEWS.md:

Mutable struct fields may now be annotated as const to prevent changing them after construction, providing for greater clarity and optimization ability of these objects (#43305).

May be interesting for stuff like MHD
Type annotations can now be added to global variables to make accessing them type stable (#43671).

May be interesting to avoid global constants using Refs.
@inline and @noinline annotations can now be applied to a function call site or block to enforce the involved function calls to be (or not to be) inlined (#41312).

See Think about callsite inlining when it's shipped officially #836
Base.ifelse is now defined as a generic function rather than a builtin one, allowing packages to extend its definition (#37343).

Should allow us to get rid of our dependency IfElse.jl, see Remove IfElse.jl dependency #2255.
Inference now tracks various effects such as side-effectful-ness and nothrow-ness on a per-specialization basis. Code heavily dependent on constant propagation should see significant compile-time performance improvements and certain cases (e.g. calls to uninlinable functions that are nevertheless effect free) should see runtime performance improvements. Effects may be overwritten manually with the @Base.assume_effects macro (#43852).
The LazyString and the lazy"str" macro were added to support delayed construction of error messages in error paths (#33711).
New macro @time_imports for reporting any time spent importing packages and their dependencies (#41612).

Mostly for development
The standard library LinearAlgebra.jl is now completely independent of SparseArrays.jl, both in terms of the source code as well as unit testing (#43127). As a consequence, sparse arrays are no longer (silently) returned by methods from LinearAlgebra applied to Base or LinearAlgebra objects. Specifically, this results in the following breaking changes
...
New sparse concatenation functions sparse_hcat, sparse_vcat, and sparse_hvcat return SparseMatrixCSC output independent from the types of the input arguments. They make concatenation behavior available, in which the presence of some special "sparse" matrix argument resulted in sparse output by multiple dispatch. This is no longer possible after making LinearAlgebra.jl independent from SparseArrays.jl (#43127).

We should check whether we rely on this changed behavior somewhere (DGMulti, I'm looking at you)
CPU profiling now records sample metadata including thread and task. Profile.print() has a new groupby kwarg that allows grouping by thread, task, or nested thread/task, task/thread, and threads and tasks kwargs to allow filtering. Further, percent utilization is now reported as a total or per-thread, based on whether the thread is idle or not at each sample. Profile.fetch() includes the new metadata by default. For backwards compatibility with external profiling data consumers, it can be excluded by passing include_meta=false (#41742).

Should be helpful for investigating multithreaded performance

The text was updated successfully, but these errors were encountered:

sloede · 2022-02-28T07:50:19Z

Thanks a lot for this summary! You preempted me adding this as a topic to tomorrow's agenda ;-)

Type annotations can now be added to global variables to make accessing them type stable (#43671).
May be interesting to avoid global constants using `Ref`s.

How would that work? I understand that adding type annotations may make the code faster due to type stability, but what's the role of Refs here?

ranocha · 2022-02-28T07:53:07Z

what's the role of Refs here?

Our way of doing this right now, e.g.,

Trixi.jl/src/auxiliary/mpi.jl

Lines 38 to 43 in 041c4df

    
           const MPI_INITIALIZED = Ref(false) 
        
           const MPI_RANK = Ref(-1) 
        
           const MPI_SIZE = Ref(-1) 
        
           const MPI_IS_PARALLEL = Ref(false) 
        
           const MPI_IS_SERIAL = Ref(true) 
        
           const MPI_IS_ROOT = Ref(true)

sloede · 2022-02-28T08:39:36Z

Ah, you mean with the upcoming change we can get rid of the const VAR = Ref(something) for global variables whose values are not constant, changing them to something like VAR::TYPE = something, e.g., MPI_INITIALIZED::Bool = false?

ranocha · 2022-02-28T09:09:20Z

Yes, that's how I understand it.

jlchan · 2022-02-28T11:30:00Z

Re: sparse behavior. I don't think we use that but I'll check

ranocha · 2022-02-28T12:00:36Z

I also think that we sparsify the arrays explicitly, but checking is always better

ranocha · 2022-03-03T13:25:16Z

If JuliaLang/julia#44359 gets backported to Julia v1.8, we should check our new CI times with code coverage.

giordano · 2022-03-03T23:54:56Z

As far as I understand, actually constant globals are still the best option. The ability to type-annotate globals should be not to make non-constant globals horribly slow, but if you care about performance a lot you shouldn't get rid of const. See for example

julia> const A = 3.14
3.14

julia> f_A() = A + 1.0
f_A (generic function with 1 method)

julia> B::Float64 = 3.14
3.14

julia> f_B() = B + 1.0
f_B (generic function with 1 method)

julia> C = 3.14
3.14

julia> f_C() = C + 1.0
f_C (generic function with 1 method)

julia> @code_llvm debuginfo=:none f_A()
define double @julia_f_A_149() #0 {
top:
  ret double 0x40108F5C28F5C290
}

julia> @code_llvm debuginfo=:none f_B()
define double @julia_f_B_167() #0 {
top:
  %0 = load atomic double*, double** inttoptr (i64 139850997151064 to double**) unordered, align 8
  %1 = load double, double* %0, align 8
  %2 = fadd double %1, 1.000000e+00
  ret double %2
}

julia> @code_llvm debuginfo=:none f_C()
define nonnull {}* @julia_f_C_171() #0 {
top:
  %0 = alloca [2 x {}*], align 8
  %gcframe2 = alloca [3 x {}*], align 16
  %gcframe2.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 0
  %.sub = getelementptr inbounds [2 x {}*], [2 x {}*]* %0, i64 0, i64 0
  %1 = bitcast [3 x {}*]* %gcframe2 to i8*
  call void @llvm.memset.p0i8.i32(i8* noundef nonnull align 16 dereferenceable(24) %1, i8 0, i32 24, i1 false)
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #3
  %ppgcstack_i8 = getelementptr i8, i8* %thread_ptr, i64 -8
  %ppgcstack = bitcast i8* %ppgcstack_i8 to {}****
  %pgcstack = load {}***, {}**** %ppgcstack, align 8
  %2 = bitcast [3 x {}*]* %gcframe2 to i64*
  store i64 4, i64* %2, align 16
  %3 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 1
  %4 = bitcast {}** %3 to {}***
  %5 = load {}**, {}*** %pgcstack, align 8
  store {}** %5, {}*** %4, align 8
  %6 = bitcast {}*** %pgcstack to {}***
  store {}** %gcframe2.sub, {}*** %6, align 8
  %7 = load atomic {}*, {}** inttoptr (i64 139850997203800 to {}**) unordered, align 8
  %8 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 2
  store {}* %7, {}** %8, align 16
  store {}* %7, {}** %.sub, align 8
  %9 = getelementptr inbounds [2 x {}*], [2 x {}*]* %0, i64 0, i64 1
  store {}* inttoptr (i64 139850977822832 to {}*), {}** %9, align 8
  %10 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 139850795186000 to {}*), {}** nonnull %.sub, i32 2)
  %11 = load {}*, {}** %3, align 8
  %12 = bitcast {}*** %pgcstack to {}**
  store {}* %11, {}** %12, align 8
  ret {}* %10
}

julia> @benchmark f_A()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.370 ns … 15.942 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.387 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.438 ns ±  0.607 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █ ▆                                                        
  ██▄█▃▂▁▂▆▃▅▂▁▁▁▁▂▃▂▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂ ▂
  1.37 ns        Histogram: frequency by time        1.66 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark f_B()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  2.734 ns … 21.955 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.815 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.933 ns ±  0.980 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     █                                                        
  ▄▁▄█▂▃▄▂▂▂▁▁▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▁▁▂▁▂▁▂ ▂
  2.73 ns        Histogram: frequency by time        4.24 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark f_C()
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  19.486 ns …  1.241 μs  ┊ GC (min … max): 0.00% … 97.75%
 Time  (median):     20.803 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   22.922 ns ± 17.157 ns  ┊ GC (mean ± σ):  1.01% ±  1.38%

  ▄█▇▇▅   ▁▁▃▅▅▄▄▂▁       ▁▁▁▁                                ▂
  █████▆▅▅█████████▇▆▆▆▆▇▇████▇▅▄▄▅▅▇▇▇▇▇▆▅▄▂▄▄▄▅▃▅▆▆▅▃▄▄▄▄▄▅ █
  19.5 ns      Histogram: log(frequency) by time      45.1 ns <

 Memory estimate: 16 bytes, allocs estimate: 1.

Edit: however a constant Ref is probably not much different from a type-annotate global:

julia> const D = Ref(3.14)
Base.RefValue{Float64}(3.14)

julia> f_D() = D[] + 1.0
f_D (generic function with 1 method)

julia> @code_llvm debuginfo=:none f_D()
define double @julia_f_D_743() #0 {
top:
  %0 = load double, double* inttoptr (i64 140703666509008 to double*), align 16
  %1 = fadd double %0, 1.000000e+00
  ret double %1
}

julia> @benchmark f_D()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  2.727 ns … 22.539 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.805 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.903 ns ±  0.897 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █   ▅    ▁                                                  
  █▄▂▁█▂▄▁▁█▆▃▂▁▁▂▃▂▂▁▂▂▁▂▂▁▂▃▂▁▂▂▁▁▁▂▁▂▁▁▁▂▂▁▂▂▁▁▁▁▂▂▂▂▁▂▂▂ ▂
  2.73 ns        Histogram: frequency by time        3.64 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

The LLVM IR is similar (but not identical, access to the non-constant global is atomic).

ranocha · 2022-03-04T06:07:44Z

Yeah, the few global variables that we use are constant global Refs. So the basic difference is whether we have a plain load or an atomic load of them, but these first benchmarks seem to indicate that this does not matter a lot, does it?

vchuravy mentioned this issue Jan 29, 2025

Remove coverage_override and use primary test run for code coverage #2254

Merged

JoshuaLampert mentioned this issue Jan 30, 2025

Remove IfElse.jl dependency #2255

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly interesting features of Julia v1.8 #1075

Possibly interesting features of Julia v1.8 #1075

ranocha commented Feb 28, 2022 •

edited by JoshuaLampert

Loading

sloede commented Feb 28, 2022

ranocha commented Feb 28, 2022

sloede commented Feb 28, 2022

ranocha commented Feb 28, 2022

jlchan commented Feb 28, 2022

ranocha commented Feb 28, 2022

ranocha commented Mar 3, 2022

giordano commented Mar 3, 2022 •

edited

Loading

ranocha commented Mar 4, 2022

Possibly interesting features of Julia v1.8 #1075

Possibly interesting features of Julia v1.8 #1075

Comments

ranocha commented Feb 28, 2022 • edited by JoshuaLampert Loading

sloede commented Feb 28, 2022

ranocha commented Feb 28, 2022

sloede commented Feb 28, 2022

ranocha commented Feb 28, 2022

jlchan commented Feb 28, 2022

ranocha commented Feb 28, 2022

ranocha commented Mar 3, 2022

giordano commented Mar 3, 2022 • edited Loading

ranocha commented Mar 4, 2022

ranocha commented Feb 28, 2022 •

edited by JoshuaLampert

Loading

giordano commented Mar 3, 2022 •

edited

Loading