-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of Matrix Chain Multiplication in Python #669
Comments
Important note: using |
Ah right, I updated the snippet and logs with |
One note on using Galley: it optimizes for the first inputs you give it. Are you testing it with multiple inputs? |
I'm benchmarking it for one set of input tensors. I have a given parametrization grid, e.g.: configs = [
{"LEN": 5000, "DENSITY": 0.00001},
{"LEN": 10000, "DENSITY": 0.00001},
{"LEN": 20000, "DENSITY": 0.00001},
{"LEN": 25000, "DENSITY": 0.00001},
{"LEN": 30000, "DENSITY": 0.00001},
] and for each one of them I construct input tensors, create benchmarked functions (Finch, Galley, etc.), precompile them and benchmark them. |
I see. Could you try giving Galley a new tag for each of them? I think you can set the tag parameter as a kwarg |
Could you paste the benchmark code somewhere? |
Here's a self-contained script: https://gist.github.com/mtsokol/5718e1e57b101b93c2b83d83112f56c0 |
Thanks for the script. This is super helpful. I'm taking a look now. |
For what it's worth, on my machine from the julia interface I can get galley to use a sparse format:
|
Another thing to discuss tomorrow: We can revisit |
they should be more similar now. Both the default julia scheduler and the galley scheduler have been adjusted to work better with tensordot |
I'm a bit lost, what did you change? |
it used to be that we needed to do lazy indexing in order to fuse sddmm in the default scheduler. Now that I have implemented backwards fusion, this is no longer necessary. This was never necessary with Galley. |
Okay, I spent maybe too much time thinking about this today. But, I have a few findings.
|
This could be solved with a foreign function interface, where Finch would be allowed to send kernels or subkernels to library implementations. |
One way to make this a more compelling plot in the short term would be to use 10% sparse matrices for A and B. This would allow us to compete on a more even playing field. |
Given this, are we comfortable closing this issue? |
I don't think this is pressing. We should add an ffi |
Hi @willow-ahrens @kylebd99,
I wanted to discuss a bit Matrix Chain Multiplication Python example that I'm working on.
The Python implementation relies on lazy indexing, as
tensordot
is slow for other examples, like SDDMM.Here's Python implementation:
But it looks like for chain multiplication it's also way slower compared to SciPy or Numba.
Here's a Julia snippet that reproduces it, give or take:
It looks like in each case the output is densified instead of staying sparse.
With
verbose=true
and a default scheduler logs I get:With
verbose=true
and a Galley scheduler logs I get:The text was updated successfully, but these errors were encountered: