Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.0.x vader on PPC (wmb moved to end of set_header) #4937

Closed
markalle opened this issue Mar 21, 2018 · 4 comments
Closed

v3.0.x vader on PPC (wmb moved to end of set_header) #4937

markalle opened this issue Mar 21, 2018 · 4 comments
Assignees

Comments

@markalle
Copy link
Contributor

markalle commented Mar 21, 2018

This is for the v3.0.x branch.

I have some tests that fail with vader on PPC (pass on x86 due to more generous memory ordering rules there). It looks to me like one of the wmb calls has been moved. I don't have much knowledge of what vader's doing, but I'm guessing the use of the function mca_btl_vader_fbox_set_header() should boil down to

    set data
    wmb
    set header that says the data is there

but the fbox_set_header function has its wmb() call at the bottom so I think it's probably ending up as

    set data
    set header that says the data is there
    wmb

which wouldn't ensure the data is visible to the reader.

I can hit the problem using the below "maxsoak.c" testcase as
mpicc -o x maxsoak.c
mpirun -np 6 -mca pml ob1 -mca btl vader,self ./x
and the testcase will detect corruption.

For me the failure message from the testcase ends up something like

4: Invalid data: Act:525138 Exp:850 Peer:2 Datasize:32 Mult:50
I don't know the maxsoak.c testcase well, it's just something I know we didn't write so I don't have to go through any special approval process to share that code:
https://gist.github.com/markalle/a1c203297cb6af22a3fb5c24e62b2ba3

@markalle
Copy link
Contributor Author

@hjelmn This is the issue I mentioned on the phone

@markalle
Copy link
Contributor Author

ps: I tested with the wmb() call at the top of set_header() and it seemed to work

markalle added a commit to markalle/ompi that referenced this issue Mar 22, 2018
I have some tests that failed on PPC with the recent vader changes.
I'm suspicious of the set_header() function where there's a wmb()
call that looks like it boils down to
    set some data
    set the header to indicate the data is available
    wmb
and I think the wmb needs to go up a line.

More details here:
    open-mpi#4937
with a copy of the "maxsoak.c" testcase at
    https://gist.github.com/markalle/a1c203297cb6af22a3fb5c24e62b2ba3
@markalle
Copy link
Contributor Author

I made a PR with my proposed fix:
#4955

markalle added a commit to markalle/ompi that referenced this issue Mar 22, 2018
I have some tests that failed on PPC with the recent vader changes.
I'm suspicious of the set_header() function where there's a wmb()
call that looks like it boils down to
    set some data
    set the header to indicate the data is available
    wmb
and I think the wmb needs to go up a line.

More details here:
    open-mpi#4937
with a copy of the "maxsoak.c" testcase at
    https://gist.github.com/markalle/a1c203297cb6af22a3fb5c24e62b2ba3

Signed-off-by: Mark Allen <markalle@us.ibm.com>
@hjelmn
Copy link
Member

hjelmn commented Mar 29, 2018

This is fixed from what I understand.

@hjelmn hjelmn closed this as completed Mar 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants