-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Vectorization implementation for GPU #223
Conversation
Codecov Report
@@ Coverage Diff @@
## master #223 +/- ##
==========================================
- Coverage 98.26% 97.02% -1.24%
==========================================
Files 8 8
Lines 690 705 +15
==========================================
+ Hits 678 684 +6
- Misses 12 21 +9
Continue to review full report at Codecov.
|
for M in (metrics..., weightedmetrics...) | ||
@eval @inline (dist::$M)(a, b) = _evaluate(dist, a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for loop is moved from L327-L329.
Base.@propagate_inbounds function eval_start(d::Chebyshev, a, b) | ||
T = result_type(d, a, b) | ||
if any(isnan, a) || any(isnan, b) | ||
return convert(T, NaN) | ||
else | ||
zero(T) # lower bound of chebyshev distance | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rewritten to support CuArray; scalar indexing first
is slow for CuArray
.
A quite significant effort has been made to make adding this package as a dependency a no brainer by making it have few dependencies and thus a fast load time. Unfortunately (as you mentioned), looking at the dependency here, a 10x load time increase is observed:
Some possible suggestions:
|
No, as far as I've understood. ArrayInterface heavily uses Requires.jl
Yes, if you are referring to the vectorization version I just added here. But it's still slower in CPU due to the additional memory allocation for intermediate results.
The only place that requires But I really don't think there's a need to be so conservative. For example, if we introduce LoopVectorization here and bring CPU runtime say >1.2x faster, does the overhead in loading time really matters? (Note that |
Looking a bit more at ArrayInterface it feels like it is way out of scope to add it to something so stable and lean as Distances.jl. For example, it does a lot of type piracy. I think it makes sense to add this to one of the packages that are in the "ArrayInterface" ecosystem (with possible modifications to Distance.jl to allow that to happen). |
It's too bad that ArrayInterface uses Requires.jl. I wish it would be a very lightweight package that others depend on, rather than the contrary. Or move the basic definitions to ArrayInterfaceBase so that packages can depend on that. |
I prefer to have something like Or maybe we can just work in the same Distances repo |
If the main goal is GPU support, maybe one could also just create a GPU-specific DistancesCUDA package, similar to https://github.com/FluxML/NNlibCUDA.jl, with implementations of |
Close it as ArrayInterface is too heavy a dependency for Distances. Without using |
One might even come up with a threaded
reduce
version for this, but that definitely belongs to future work.I did not expose this trait dispatching to the API level because I'm not sure if the user really wants this. Users might only want to control the computational device(e.g., GPU, CPU, CPU threads, CPU distributed) and may not want to control the under the hood implementation details.
I still feel the runtime benefit worths it; not to mention that GPU support is basically unavailable before. Given that a lot of packages depend on
ArrayInterface
, adding this dependency won't add much of the loading time in any real project. (From juliahub results:ArrayInterface
is depended by 692 packages whileDistances
is defended by 474 packages.)Edit:
ArrayInterface requires Julia 1.2. I'll let you guys decide whether 1) we want to instead depend on Requires, or 2) directly bump the Julia lower bound here and drop Julia 1.0 and 1.1 support, or even 3) make a new package DistancesCUDA :(
closes #137