Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PML UCX: add SPC instrumentation for sent/received message sizes #8066

Merged
merged 1 commit into from
Oct 8, 2020

Conversation

devreal
Copy link
Contributor

@devreal devreal commented Sep 28, 2020

Sent and received message sizes are currently only tracked for pml/ob1. This PR adds support for tracking message sizes to pml/ucx.

Signed-off-by: Joseph Schuchart schuchart@icl.utk.edu

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically the same level of functionality of PMPI, but without the PMPI. The only interest I can see is that it leaves the PMPI API available for other tools, but in exchange it has to be started manually and cannot be done automatically by the tool.

@devreal
Copy link
Contributor Author

devreal commented Oct 6, 2020

This is basically the same level of functionality of PMPI, but without the PMPI. The only interest I can see is that it leaves the PMPI API available for other tools

The question for me is: should SPC exposed through MPI_T report metrics as a best effort or should it only report ob1 internals? I would argue that any correct information is better than none and having SPC report information on the application behavior for one PML but not with another PML makes it unreliable and ultimately useless for tools. Yes, I can get the same information through PMPI but one benefit of MPI_T is to provide an easier interface to achieve the same result. As a user, I'd rather query MPI_T than write my own interposition library and I want to see similar statistics with the ucx PML as with the ob1 PML.

in exchange it has to be started manually and cannot be done automatically by the tool.

That is addressed in #8065

@bosilca
Copy link
Member

bosilca commented Oct 6, 2020

Assuming we go for the lesser capability the approach proposed here will force us to duplicate the code in all PMLs. In same time moving it up to the MPI API would double count the data for all PML that natively support such capability.

@devreal
Copy link
Contributor Author

devreal commented Oct 7, 2020

There are good reasons to keep the accounting in the PML, probably most importantly to be able to capture messages originating from collectives. Moving the accounting into the MPI layer would also mean that requests have to be tracked to not mess up counting if receives don't complete (because they are cancelled).

It might be desirable to streamline the accounting such that the PMLs provide similar high-level information for sent and received messages, potentially adding PML-specific SPCs reflecting implementation details (the details of how ob1 actually transfers messages using put/get and its control messages may not be interesting to most users and may cause confusion instead). Even then, the arguments for keeping the accounting inside the PMLs and not moving it to the MPI layer still stand. The additional code required in each PML is small and should do no harm.

@bosilca
Copy link
Member

bosilca commented Oct 8, 2020

The requests that are cancelled have that info in the status, so a correctly implemented OMPI layer can handle this case.

@bosilca bosilca merged commit a541ab9 into open-mpi:master Oct 8, 2020
@devreal devreal deleted the spc-pml-ucx branch October 3, 2022 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants