Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse matrix testing and added features #3745

Merged
merged 10 commits into from
Sep 18, 2024

Conversation

ShreyasKhandekar
Copy link
Contributor

@ShreyasKhandekar ShreyasKhandekar commented Aug 30, 2024

This PR adds more features to the SparseMatrix Module:

  1. fill_vals : Populates the sparse matrix with a 1D array which has an equal number of elements as the number of non zero values in the sparse matrix. The order in which the values are assigned is the natural iteration order of the sparse matrix (CSC vs CSR)
  2. to_pdarray: Converts a sparse matrix to 3 pdarrays: row indices, col indices, and the values at the indices. The size of each array is equal and is equal to the nnz in the sparse array.

This PR also adds a few simple tests for the above utilities, as well as for sparse_matrix_matrix_mult

We also add a new compatibility shim ArkoudaSparseMatrixCompat.chpl, to make this work with 2.0. Note that even though this works with 2.0, performance will be worse than later releases. This is because chapel-lang/chapel#24391 and chapel-lang/chapel#25243 were two PRs that improved the performance of sparse computations which are not a part of 2.0

We also change the SparseMatrix module to be compiled by default in Arkouda.

Tested single and multilocale

@ShreyasKhandekar
Copy link
Contributor Author

I see the chapel portability tests failing, given that a lot of the new work done by Brad in order to support these sparse matrix computations in Arkouda came after 2.0, I'm not sure what we can do to resolve this because it's not compatible AFAIK. I may be mistaken here and @bradcray might have a better idea.

Copy link
Contributor

@bmcdonald3 bmcdonald3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good assuming you've tested in multilocale, thanks Shreyas

@ShreyasKhandekar
Copy link
Contributor Author

Yup, testing passes with multilocale. Although if we scale it up to more than 8 locales, it will run out of memory due to the inefficient implementations used for now.

Also, I think there is something up with the CI:

tests/alignment_test.py ...FFF.....                                      [  4%]
tests/bigint_agg_test.py F....                                           [  4%]
tests/bitops_test.py FFFFFFFF..                                          [  5%]
tests/categorical_test.py ...........................                    [  6%]
tests/client_dtypes_test.py ..........F                                  [  7%]
tests/client_test.py ...........                                         [  7%]

It's been stuck here for over 1.5 hours, and many tests are failing that I didn't modify in any way.

Could we restart it?

@bmcdonald3
Copy link
Contributor

Although if we scale it up to more than 8 locales, it will run out of memory

Is this on your laptop or in all environments? If we run out of memory at 8 locales, this will fail in nightly testing, which does 16 nodes on an XC

@ShreyasKhandekar
Copy link
Contributor Author

ShreyasKhandekar commented Sep 11, 2024

This was on ChapDL. I thought only the benchmarks did 16 locale runs, not the correctness tests?

@bmcdonald3
Copy link
Contributor

No, the correctness tests run at 16 locales as well

@ShreyasKhandekar
Copy link
Contributor Author

Ah, okay. I will try to reduce the problem size or the memory requirement (or both) to see if I can make it work on 16 locales then. Thanks for the catch!

@ShreyasKhandekar
Copy link
Contributor Author

Okay I think I've convinced myself that my previous tests where I was running out of memory and getting a segfault were because somehow the wrong (non-distributed) version of the sparse matrix matrix multiplication overload was being chosen. I've commented out that version so it doesn't get accidentally chosen. I've reduced the problem size anyway, just to be safe.

If the CI passes, I think we should be good to merge this.

We convert sparse matrices to a 3 tuple of pdarrays with row, cols, and
vals

Signed-off-by: Shreyas Khandekar <60454060+ShreyasKhandekar@users.noreply.github.com>
We use the MsgTuple.fromResponses function to return a list of multiple
pdarrays instead of a tupl, this minimzes the comm between the server
and client.

Signed-off-by: Shreyas Khandekar <60454060+ShreyasKhandekar@users.noreply.github.com>
Signed-off-by: Shreyas Khandekar <60454060+ShreyasKhandekar@users.noreply.github.com>
Signed-off-by: Shreyas Khandekar <60454060+ShreyasKhandekar@users.noreply.github.com>
Signed-off-by: Shreyas Khandekar <60454060+ShreyasKhandekar@users.noreply.github.com>
Signed-off-by: Shreyas Khandekar <60454060+ShreyasKhandekar@users.noreply.github.com>
Signed-off-by: Shreyas Khandekar <60454060+ShreyasKhandekar@users.noreply.github.com>
Signed-off-by: Shreyas Khandekar <60454060+ShreyasKhandekar@users.noreply.github.com>
Comment out the non ditributed version of the sparse matrix
multiplication and also reduce the problem size in the test to make sure
we don't run out of memory

Signed-off-by: Shreyas Khandekar <60454060+ShreyasKhandekar@users.noreply.github.com>
Copy link
Contributor

@jeremiah-corrado jeremiah-corrado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, glad to see this going in!

Overall, I think it would be nice to get the server-side code converted over to the new function-registration framework at some point, but I definitely don't think that needs to happen for this PR.

Other than that, I just had a few minor notes:

src/SparseMatrixMsg.chpl Show resolved Hide resolved
arkouda/sparrayclass.py Outdated Show resolved Hide resolved
src/SparseMatrix.chpl Show resolved Hide resolved
Signed-off-by: Shreyas Khandekar <60454060+ShreyasKhandekar@users.noreply.github.com>
Copy link
Member

@stress-tess stress-tess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me! thanks sheryas!

@stress-tess stress-tess added this pull request to the merge queue Sep 18, 2024
Merged via the queue into Bears-R-Us:master with commit b47e1fa Sep 18, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants