-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline callsite of invoke when possible (invoke
improvement No. 1)
#10964
Conversation
A slightly more complete test (doesn't include the @noinline f1(a, b) = a + b
g1_1() = f1(1, 2)
g1_2() = invoke(f1, Tuple{Any, Any}, 1, 2)
@inline t1() = Tuple{Any, Any}
@noinline t2() = Tuple{Any, Any}
g1_3() = invoke(f1, t1(), 1, 2)
g1_4() = invoke(f1, t2(), 1, 2)
@assert g1_1() == g1_2() == g1_3() == g1_4()
@code_llvm g1_1()
@code_llvm g1_2()
@code_llvm g1_3()
@code_llvm g1_4()
function timing(f, args...)
println(f, args)
f(args...)
gc()
@time for i in 1:10000000
f(args...)
end
gc()
end
timing(g1_1)
timing(g1_2)
timing(g1_3)
timing(g1_4) master without g1_1()
elapsed time: 0.308610973 seconds (0 bytes allocated)
g1_2()
elapsed time: 1.89693006 seconds (1449 MB allocated, 1.36% gc time in 66 pauses with 0 full sweep)
g1_3()
elapsed time: 1.892066647 seconds (1449 MB allocated, 1.41% gc time in 66 pauses with 0 full sweep)
g1_4()
elapsed time: 1.883954768 seconds (1449 MB allocated, 1.41% gc time in 66 pauses with 0 full sweep) with this PR with define i64 @julia_g1_1_43914() {
top:
%0 = call i64 @julia_f1_43903(i64 1, i64 2)
ret i64 %0
}
define i64 @julia_g1_2_43917() {
top:
%0 = call i64 @julia_f1_43903(i64 1, i64 2)
ret i64 %0
}
define i64 @julia_g1_3_43918() {
top:
%0 = call i64 @julia_f1_43903(i64 1, i64 2)
ret i64 %0
}
define i64 @julia_g1_4_43919() {
top:
%0 = call %jl_value_t* @julia_t2_43907()
%1 = call i64 @julia_f1_43903(i64 1, i64 2)
ret i64 %1
}
g1_1()
elapsed time: 0.319919904 seconds (0 bytes allocated)
g1_2()
elapsed time: 0.372261468 seconds (0 bytes allocated)
g1_3()
elapsed time: 0.323118056 seconds (0 bytes allocated)
g1_4()
elapsed time: 1.10516864 seconds (1449 MB allocated, 2.79% gc time in 66 pauses with 0 full sweep) |
5e5eed3
to
a423bea
Compare
I don't know what up with the error on Travis-CI (which seems to be popping up randomly on almost all commits) and is AppVeyor having some connection issue? |
The 32-bit error on Travis is indeed tupocalypse fallout and it's being worked on. |
I restarted the builds, though https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.4069/job/wwdnwkpxha87m2me looked like a timeout when trying to execute a trivial command with win64 Julia without using sys.dll. We've had subtle issues there before, will see if it happens again a second time. |
Yes, and that's why we added a new test for it. This second link https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.4066/job/o3777iciq624y3jd looks like something intermittent that we have seen before (possibly #7942), where the freeze occurs during one of the first few linalg tests. The first link (build 4069) froze during the newly added test for executing a trivial command without sys.dll present. If you didn't change any substantial code between the two builds, maybe build 4069 was a fluke. |
@tkelman Well, I rebased to latest master between the two build I refered to. I haven't changed anything since the second one and it seems that the build already passed the point where it got stuck before. |
I'm fine with calling it a probably-#7942-related fluke then |
@@ -1730,6 +1730,78 @@ DLLEXPORT jl_value_t *jl_gf_invoke_lookup(jl_function_t *gf, jl_datatype_t *type | |||
return (jl_value_t*)m; | |||
} | |||
|
|||
jl_function_t* | |||
jl_gf_invoke_get_specialization(jl_function_t *gf, jl_tupletype_t *types, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there seems to be a lot of code duplication between this method and jl_gf_invoke
. can you refactor the two so that the common bits are shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about just convert the argument to jl_gf_invoke
to tupletype and use this function to get the specialized function. I didn't bother too much (not that I like this kind of code duplication either though) because IMHO this is basically the same duplication of jl_method_table_assoc_exact
and jl_method_table_assoc_exact_by_type
.
Another way I've thought about is to just use a (more C friendly) (pointer to) array of types but again that kind of mean rewrite/merge jl_method_table_assoc_exact
and jl_method_table_assoc_exact_by_type
and I hoped to make this patch as small as possible.
Which one do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
a423bea
to
675d4dc
Compare
@vtjnash So I fixed two of the three issues you brought up. For the thrid one, it seems that I've tried to replace the |
And somehow the |
c8c79a9
to
cb78497
Compare
invoke
improvement No. 1)
I think I've finally understand how these code works (thanks to #11439) and why it was crashing before. As @carnaval guessed, it was related to GC and as I noted in the commit 8dce033#diff-6d4d21428a67320600faf5a1a9f3a16aR2437, the function object can be collected when the function object is passed to the jlcall function. This shouldn't be an issue now and not even with #11439 since a non-wrapper jlcall specialization never accesses the function object (first) argument. I'm still not entirely happy that there's a useless and possibly invalid pointer in the generated code though................... And I have changed it to use |
Replaced by #18444 |
This is the refactor of my previous attempt to improve the performance of invoke (#9642).
As I mentioned on the mailing list, the previous PR need to be refactored and I decide to start sth relatively self-contained and (hopefully) decoupled from other parts. Hopefully this can also make it easier to be reviewed.
The patch only deal with the new signature of
invoke
(i.e. the second parameter is a tuple type rather than a tuple of types) mainly because it's much easier to handle. It should not do anything for the old signature.Thanks @vtjnash for very helpful suggestions.
Tested with the following script.
Output on current master
With this patch