Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with overlapping vector datatype #5540

Closed
bosilca opened this issue Aug 14, 2018 · 14 comments · Fixed by #6695
Closed

Issue with overlapping vector datatype #5540

bosilca opened this issue Aug 14, 2018 · 14 comments · Fixed by #6695

Comments

@bosilca
Copy link
Member

bosilca commented Aug 14, 2018

As reported by the DKRZ folks, creating a datatype with overlapping capabilities (legal for all send communications) leads to data corruption, as the packing and unpacking function will fail to compute the correct memory layout.

A simple example of such datatype is:

MPI_Type_vector(3, 1, 0, MPI_INT, &btypes[0]);
MPI_Type_indexed(1, (int[]){1}, (int[]){0}, MPI_INT, &btypes[1]);

MPI_Type_create_struct(2, (int[]){1,1}, (MPI_Aint[]){0,12}, btypes, &dt);

All Open MPI versions are affected. However, it has been decided during the weekly call to only fix it in every release after 3.0 and obviously master.

@gpaulsen
Copy link
Member

George has a patch that's coming.

@jsquyres
Copy link
Member

Discussed on 2018-09-11 webex; @bosilca mentioned that he has a patch but he hasn't tested it properly with all the other datatype tests. Stay tuned.

@gpaulsen
Copy link
Member

Discussed at the RM meeting today, and will bring up again on tomorrow's Web-ex.

@jsquyres
Copy link
Member

Per discussion on today's Webex, we decided that this issue should be fixed in the v4.0.x series, but probably isn't worth bringing back to the v3.0.x and v3.1.x series (i.e., it's a pretty esoteric issue for highly complex datatypes).

@gpaulsen
Copy link
Member

gpaulsen commented Apr 2, 2019

@bosilca Do you have a PR? If it's partial we might be able to get help working this...

@gpaulsen
Copy link
Member

@bosilca ?

@jsquyres
Copy link
Member

@bosilca said on the Webex today that he would separate out the fix for this vector issue from a branch where he has other optimizations, and then push a PR with just the fix.

@jsquyres
Copy link
Member

@bosilca Any update on this?

@bosilca
Copy link
Member Author

bosilca commented May 22, 2019

#6695 addresses this and much more.

@gpaulsen
Copy link
Member

Added blocker label as this is affecting real app, and the fix PR #6695 can be picked to v4.0.2

@jsquyres
Copy link
Member

@bosilca Notes that this does not easily cherry-pick to the v3.1.x or v3.0.x branches. As such, @hppritcha and I agree that this will be a Known Issue on the v3.0.x / v3.1.x series: the solution will be to upgrade to >= v4.0.2.

@gpaulsen
Copy link
Member

gpaulsen commented Aug 5, 2019

Re-opening this issue until it's been cherry-picked to v4.0.x (based on the Target:v4.0.x label, which is to fix issue 5540)

@gpaulsen gpaulsen reopened this Aug 5, 2019
@gpaulsen
Copy link
Member

gpaulsen commented Aug 5, 2019

v4.0.x backport: #6863

@gpaulsen
Copy link
Member

The fix has been merged to both master and v4.0.x. Closing this Issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants