Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use index_map in distributed::matrix #1544

Merged
merged 6 commits into from
May 20, 2024

Conversation

MarcelKoch
Copy link
Member

@MarcelKoch MarcelKoch commented Feb 14, 2024

Application of #1543. Now the index_map is used to create the numbering for the non-local columns. The index_map is created during the distributed::matrix::read_distributed and also returned from that function. This can allow users to provide already separated local and non-local matrices for the read_distributed further down the line.
Also, it makes the mapping non-local -> global more easily available.

This PR also introduces:

  • making run easier to use
  • specialization of run for distributed matrices -> removes need for changes to distributed base

PR Stack:

@MarcelKoch MarcelKoch self-assigned this Feb 14, 2024
@ginkgo-bot ginkgo-bot added reg:testing This is related to testing. mod:all This touches all Ginkgo modules. labels Feb 14, 2024
@MarcelKoch
Copy link
Member Author

There is some interface breaking stuff in there, but it's minor and I can easily add the old version, if necessary.

Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The issue of interface needs to be resolved. It is in the experimental namespace, but I think we said we would still not break interface in experimental ?

*/
void read_distributed(
index_map<local_index_type, global_index_type> read_distributed(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this breaks interface

Copy link
Member

@upsj upsj Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we made no promises about interface-stability in experimental beyond "we'll try to keep changes to a minimum"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't consider that really an interface break. This has an effect only on very serious template meta programming, which I don't expect users to do. Other than that, current code will compile, it just discards the return value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually an interface break here, but not due to the return value. Now this function expects a shared_ptr<Partition> instead of a raw pointer. But this is done in #1543 if you want to discuss that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow the old comments I made dont appear anymore, so it is a bit annoying

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is indeed an unfortunate interface break. I think the version with shared_ptr should be okay, but I think users actually depend on read_distributed, so it will break existing code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the index map needs to store the partition. I've looked again into the map_to_local kernels, and there I actually need all arrays a partition stores. So it wouldn't make sense to only copy the needed data out of the partition and into the index-map.

@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from c12bac1 to a1d45ac Compare April 2, 2024 07:37
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from a1d45ac to 07f6d02 Compare April 2, 2024 17:31
@MarcelKoch MarcelKoch force-pushed the index-map-device-kernels branch 2 times, most recently from 8206bc3 to c700b1c Compare April 4, 2024 10:29
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 07f6d02 to b42ab92 Compare April 4, 2024 10:29
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from b42ab92 to 8f104fd Compare April 4, 2024 11:00
@MarcelKoch MarcelKoch added this to the Ginkgo 1.8.0 milestone Apr 5, 2024
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 8f104fd to a0824a8 Compare April 19, 2024 14:39
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 2072219 to 20e3d96 Compare May 9, 2024 07:55
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 20e3d96 to 2812123 Compare May 9, 2024 08:56
@MarcelKoch MarcelKoch requested a review from upsj May 10, 2024 09:24
Copy link

sonarcloud bot commented May 10, 2024

@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 2812123 to 52dcfc8 Compare May 13, 2024 09:41
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch 2 times, most recently from 1c0bd5f to 355e776 Compare May 13, 2024 20:50
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 355e776 to 204d7e8 Compare May 14, 2024 08:20
Base automatically changed from index-map-device-kernels to develop May 14, 2024 13:13
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 204d7e8 to 3be79ab Compare May 14, 2024 13:25
Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor gripe with the memory allocation tracking that is incomplete in index_map, maybe a fix could be included here?

@@ -5,7 +5,7 @@
"comm_pattern": "stencil",
"spmv": {
"csr-csr": {
"storage": 6564,
"storage": 6420,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not directly related to this, but I went back to check, there are some places left over in #1543 that still use std::vector instead of gko::vector or gko::array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only checked (reference|omp)/distributed for std::vector and found only 4 instances. Did you find more? (I'm updating them btw)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I checked the patch, that should be all. The rest happens inside tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to gko::vector and it didn't change the benchmark output (because the missing changes were only in omp/).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, this only reports the resting storage size, not temporary memory. But it did remind me to check for missing updates :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Maybe also providing the maximal storage might be interesting to add in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point!

Comment on lines 320 to 324
for (size_type i = 0; i < host_recv_targets->get_size(); ++i) {
recv_sizes_[host_recv_targets->get_const_data()[i]] =
host_offsets->get_const_data()[i + 1] -
host_offsets->get_const_data()[i];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels pedantic, but I think having a raw loop next to a bunch of algorithm calls is a mismatch of abstraction levels - this might be better extracted into a separate function, similar to Sean Parent's "no raw loops" suggestion (https://www.youtube.com/watch?v=W2tWOdzgXHA)

Maybe we can call it compute_recv_sizes? Then the high-level control flow is easy to follow, and if you need to know how those recv_sizes get determined, you look at the function call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the issue here is that arrays can't be used with STL. If we had begin, and end for them, this would just be std::adjacent_difference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use pointers here , but adjacent_difference is not 100% sufficient, since it leaves the first element unchanged and doesn't scatter the results. I just meant extract these 5 lines into a function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, adjacent_difference is weird, same as partial_sum. I would still argue that the issue here is just that our arrays are not easy to access. If it would read

    for (size_type i = 0; i < ecv_targets.size(); ++i) {
        recv_sizes_[recv_targets[i]] = offsets[i + 1] - offsets[i];
    }

I don't think you would have the same issue. Only because we have to clutter it with host_ prefixes, and use very verbose function names it becomes an issue.

Copy link
Member

@upsj upsj May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is mostly not about the complexity of the code (.get_const_data()[] vs []) but about mixing low-level code with high-level code. I do agree that it would be nice to have a cleaner access to arrays. If your extracted function takes pointers, you would arrive at that code though.

Maybe we could add a host_array type that is derived from an array and a custom constructor that always takes get_master(), with additional operator[] and begin/end() functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this, but maybe not exactly how you imagined it. I'm using a lambda for this, since I don't want to put it outside the function, due to it's limited use.

{std::make_tuple(gko::dim<2>{2, 1}, I<git>{1}, I<git>{0}, I<vt>{6}),
std::make_tuple(gko::dim<2>{2, 3}, I<git>{0, 0, 1}, I<git>{2, 1, 0},
{std::make_tuple(gko::dim<2>{2, 1}, I<git>{1}, I<git>{1}, I<vt>{6}),
std::make_tuple(gko::dim<2>{2, 3}, I<git>{0, 0, 1}, I<git>{1, 3, 2},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are a bit hard to review - can you add a comment to the function explaining what the tuple members represent?

Comment on lines +286 to +283
auto imap = index_map<local_index_type, global_index_type>(
exec, col_partition, comm.rank(), global_non_local_col_idxs);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite confident that it will become necessary to store this map in the matrix class in the future. I'm not doing that for now, because with the distributed PGM, there are now ways to create a non-empty matrix without this read function. I don't know how to create the mapping in the PGM case, so there would be non-empty matrices, but with empty mappings.
I didn't think this is suitable, so I don't store the mapping. But making it a member in the future would either require unnecessary copies or an interface break.
Any thoughts on this @upsj @pratikvn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a fundamental choice - do we want to consider the matrix storing the implicitly block-reordered matrix (every rank has a consecutive range of indices), or the original matrix? I'm fine with the latter, but it may complicate things as mappings global <--> local are more costly. But at the same time, I'm not sure that would ever be a performance bottleneck.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the interface break?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interface break:

auto imap = mat->read_distributed(...);

will result in an error if read_distributed returns again void. The return type could be changed to const index_map& but this would result in copies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the original matrix, we can't do anything that involves more than the local block. So, the distributed PGM would not work, also RAS would not be possible. Also I think the mapping is necessary to provide users an option to use pre-existing data, since it allows mapping global indices, which the users know, to the ginkgo non-local indices, which the user doesn't know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not entirely true - you can operate on the block-reordered indexing just fine, imagine computing an index_map after remapping each rank's indices to be consecutive. But that is still doing 90% of the work towards having a full global <--> local index map, so it's probably to better to do it fully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, just from a properties perspective, I think it makes sense for the matrix class to have an index map. Any function that creates a distributed matrix, should then also be required to update its index_map. I understand that we dont do that for PGM, right now, but I think the interface choices should not be dependent on that.

So, I think it should be okay to store an index_map in the matrix class.

TBH, I am not sure why read_distributed should return an index_map. I think we should just store it inside the class and have a public getter to access it where necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reordering question is a bit separate from the index-map question. As you said, you could still have an index map after reordering. In general, I think reordering can become quite complicated, see the fun image in petsc's user guide: https://petsc.org/release/manual/mat/#block-matrices

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now I will just remove the return type. The index map is still helpful, since it simplifies the matrix kernels, while the information necessary for PGM is still available. Thus we could add the index map as a member at a later point if we want to.

@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 3be79ab to b519e56 Compare May 17, 2024 13:42
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from b519e56 to 0d3b710 Compare May 17, 2024 15:07
@MarcelKoch MarcelKoch added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels May 17, 2024
@MarcelKoch MarcelKoch merged commit a61989f into develop May 20, 2024
11 of 15 checks passed
@MarcelKoch MarcelKoch deleted the read-distributed-with-index-map branch May 20, 2024 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. mod:all This touches all Ginkgo modules. reg:testing This is related to testing. type:distributed-functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants