Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to prevent the compiler from optimizing out MPIR_Breakpoint(). #6828

Merged
merged 1 commit into from
Jul 24, 2019

Conversation

awlauria
Copy link
Contributor

Signed-off-by: Austen Lauria awlauria@us.ibm.com

@awlauria
Copy link
Contributor Author

See issue #5501

@ggouaillardet
Copy link
Contributor

would it be easier/more consistent to use CFLAGS similar to what was done in #6527 ?

@rhc54
Copy link
Contributor

rhc54 commented Jul 19, 2019

I guess this goes back to what I was saying - we already merged in a change that was supposed to fix this problem, but I gather it didn't resolve it for some compilers? Is it because the configure check missed those compilers for some reason? I'd hate to wind up back in the whack-a-mole game again.

@ggouaillardet
Copy link
Contributor

@rhc54 I read your comment after I posted mine ...
That being said, it seems #5501 does not impact orte/orted/orted_submit so it might be worth giving it a try

@rhc54
Copy link
Contributor

rhc54 commented Jul 19, 2019

@ggouaillardet My apologies for not being clear. What I was trying to convey (but so poorly did) was that we had played games like this PR with compilers many times before without lasting success. Either some compilers already knew how to detect it and optimized it out, or they eventually figured it out in a future generation.

The method used in #5501 seems to have worked for a more general case. I didn't realize it doesn't cover orted_submit as I thought that was the original target, but that should be easily remedied here and would be far preferable to a re-try of the "trick the compiler" approach.

@awlauria
Copy link
Contributor Author

The issue here is the same that was addressed in #4624 for clang. But as I explained in #5501, extending out that solution for clang is just a big can of worms.

@rhc54
Copy link
Contributor

rhc54 commented Jul 19, 2019

I believe @ggouaillardet was referring to #6527 and wondering why that wouldn't resolve the case by simply adding the "unwind" flags to the orted_submit Makefile.am. Did you try that?

@awlauria
Copy link
Contributor Author

-fasynchronous-unwind-tables is a gcc specific option, using it with xlc (for example) will have no effect:

warning: 1540-5200 The option "-fasynchronous-unwind-tables" is not supported.

@awlauria
Copy link
Contributor Author

I'll hunt to see if there is an xlc equivalent we can add when -fasynchronous-unwind-tables is not there. But I fear every compiler will have it's own that will need to be factored in.

@jsquyres
Copy link
Member

bot:ompi:retest

@jsquyres
Copy link
Member

jsquyres commented Jul 19, 2019

@rhc54 and I talked about this on the phone. I actually kinda like the pure-C way in this PR (i.e., return something that is volatile) because there's no compile-time way to optimize that out. You have to run it, because some other thread may have changed the value. I also like this because it doesn't introduce yet-another compiler/linker-specific flag.

It's also a little different than the fix introduced in #6527 -- that fix was about providing unwind information rather than ensuring that the breakpoint was actually invoked (I think @ggouaillardet was actually asking if the same fix as #6527 would have a side-effect of also ensuring that the breakpoint was actually invoked).

I think there's no harm in merging this PR, but since this is a complicated, subtle issue, I'd first like to see the comment explaining a bit more about why this volatile variable exists, etc. (more than just "not let the compiler optimize out ...").

@awlauria
Copy link
Contributor Author

Another option I have looked into was moving out MPIR_Breakpoint() into its own file, and compiling that one file without optimizations (-O0). But that seems easier said than done. Adding the -O0 is easy enough, but removing the -O3 that is tacked onto every file at configure time (?) is another story - and I'm not sure of a good approach to that.

@jsquyres
Copy link
Member

Lucky for you, we already have $(CFLAGS_WITHOUT_OPTFLAGS). 😄

@jsquyres
Copy link
Member

Something went wrong in the AWS CI. See if it was a transient error...

bot:ompi:retest

@awlauria
Copy link
Contributor Author

awlauria commented Jul 19, 2019

Hmm, maybe I am not doing something right then. The file is being compiled with $(CFLAGS_WITHOUT_OPTFLAGS) but the O3 is being tacked on later.

diff --git a/orte/orted/Makefile.am b/orte/orted/Makefile.am
index 1235e51..cb57243 100644
--- a/orte/orted/Makefile.am
+++ b/orte/orted/Makefile.am
@@ -35,10 +35,12 @@ lib@ORTE_LIB_PREFIX@open_rte_la_SOURCES += \
 # The MPIR portion of the library must be built with -g, even if
 # the rest of the library has other optimization flags.
 # Use an intermediate library to isolate the debug object.
+OPTFLAGS = ""
 noinst_LTLIBRARIES += liborted_mpir.la
 liborted_mpir_la_SOURCES = \
-       orted/orted_submit.c
-liborted_mpir_la_CFLAGS = $(CFLAGS_WITHOUT_OPTFLAGS) $(DEBUGGER_CFLAGS)
+       orted/orted_submit.c \
+       orted/orted_mpir.c
+liborted_mpir_la_CFLAGS = $(CFLAGS_WITHOUT_OPTFLAGS) $(DEBUGGER_CFLAGS) -O0
 
 lib@ORTE_LIB_PREFIX@open_rte_la_LIBADD += liborted_mpir.la

And the full make line:

/bin/sh ../libtool  --tag=CC   --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../opal/include -I../ompi/include -I../oshmem/include -I../opal/mca/hwloc/hwloc1117/hwloc/include/private/autogen -I../opal/mca/hwloc/hwloc1117/hwloc/include/hwloc/autogen -I../ompi/mpiext/cuda/c   -I.. -I../orte/include -I/smpi_dev/awlauria/ompi_builds/3.1.x/ompi-3.1.x/opal/mca/event/libevent2022/libevent -I/smpi_dev/awlauria/ompi_builds/3.1.x/ompi-3.1.x/opal/mca/event/libevent2022/libevent/include -I/smpi_dev/awlauria/ompi_builds/3.1.x/ompi-3.1.x/opal/mca/hwloc/hwloc1117/hwloc/include   -DNDEBUG -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -fno-strict-aliasing -pthread -g -O0 -O3 -DNDEBUG -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -pthread -MT orted/liborted_mpir_la-orted_mpir.lo -MD -MP -MF orted/.deps/liborted_mpir_la-orted_mpir.Tpo -c -o orted/liborted_mpir_la-orted_mpir.lo `test -f 'orted/orted_mpir.c' || echo './'`orted/orted_mpir.c

The O3 is coming right after the O0 I added in the Makefile.am above...Not sure from where?

@jsquyres
Copy link
Member

Check the generated Makefile, and/or throw in some echo statements in the generated rules so that you can see to which Makefile macro the -O3 belongs.

@jjhursey
Copy link
Member

A while back I added the CFLAGS_WITHOUT_OPTFLAGS option to compile just that source file differently for MPIR (with the option that Jeff suggested).

# The MPIR portion of the library must be built with -g, even if
# the rest of the library has other optimization flags.
# Use an intermediate library to isolate the debug object.
noinst_LTLIBRARIES += liborted_mpir.la
liborted_mpir_la_SOURCES = \
orted/orted_submit.c
liborted_mpir_la_CFLAGS = $(CFLAGS_WITHOUT_OPTFLAGS) $(DEBUGGER_CFLAGS)

Maybe the optimization flag still make it into that macro or around it somewhere?

@awlauria
Copy link
Contributor Author

awlauria commented Jul 22, 2019

It seems to be added to the CLAGS argument, which to my understanding is added to every gcc .c file.

Short of removing O3 it from CFLAGS, and adding it separately to every Makefile, I am not sure the best way of removing it for these files. I am by no means a make expert, so maybe there is another way, but it's non-obvious to me.

@awlauria awlauria force-pushed the mpir_breakpoint_noop_fix branch from afcbc0c to a6501c6 Compare July 22, 2019 14:15
@awlauria
Copy link
Contributor Author

@jsquyres I updated the commit with some additional remarks. If that is good should we just merge it as a work-around for now?

@jsquyres
Copy link
Member

@awlauria I am actually unable to reproduce this behavior. Meaning: I do not see -O3 in the command line when compiling orted_submit.c. I'd like to confirm that we are actually "fixing" the right problem before merging this PR.

Can you please do the following:

$ cd top_of_ompi_build_tree
...do a full build..
$ cd orte
$ rm -f orted/liborted_mpir_la-orted_submit.lo
$ make V=1 orted/liborted_mpir_la-orted_submit.lo

and post the results here?

@awlauria
Copy link
Contributor Author

awlauria commented Jul 22, 2019

Following your steps outlined above, I see the -O3. I am on the v3.0.x
branch.

/bin/sh ../libtool  --tag=CC   --mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../opal/include -I../ompi/include -I../oshmem/include -I../opal/mca/hwloc/hwloc1117/hwloc/include/private/autogen -I../opal/mca/hwloc/hwloc1117/hwloc/include/hwloc/autogen -I../ompi/mpiext/cuda/c   -I.. -I../orte/include -I/smpi_dev/awlauria/ompi_builds/3.0.x/opal/mca/event/libevent2022/libevent -I/smpi_dev/awlauria/ompi_builds/3.0.x/opal/mca/event/libevent2022/libevent/include -I/smpi_dev/awlauria/ompi_builds/3.0.x/opal/mca/hwloc/hwloc1117/hwloc/include   -DNDEBUG -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -fno-strict-aliasing -pthread -g -O3 -DNDEBUG -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -pthread -MT orted/liborted_mpir_la-orted_submit.lo -MD -MP -MF orted/.deps/liborted_mpir_la-orted_submit.Tpo -c -o orted/liborted_mpir_la-orted_submit.lo `test -f 'orted/orted_submit.c' || echo './'`orted/orted_submit.c
libtool: compile:  gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../opal/include -I../ompi/include -I../oshmem/include -I../opal/mca/hwloc/hwloc1117/hwloc/include/private/autogen -I../opal/mca/hwloc/hwloc1117/hwloc/include/hwloc/autogen -I../ompi/mpiext/cuda/c -I.. -I../orte/include -I/smpi_dev/awlauria/ompi_builds/3.0.x/opal/mca/event/libevent2022/libevent -I/smpi_dev/awlauria/ompi_builds/3.0.x/opal/mca/event/libevent2022/libevent/include -I/smpi_dev/awlauria/ompi_builds/3.0.x/opal/mca/hwloc/hwloc1117/hwloc/include -DNDEBUG -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -fno-strict-aliasing -pthread -g -O3 -DNDEBUG -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -pthread -MT orted/liborted_mpir_la-orted_submit.lo -MD -MP -MF orted/.deps/liborted_mpir_la-orted_submit.Tpo -c orted/orted_submit.c  -fPIC -DPIC -o orted/.libs/liborted_mpir_la-orted_submit.o
mv -f orted/.deps/liborted_mpir_la-orted_submit.Tpo orted/.deps/liborted_mpir_la-orted_submit.Plo

@jsquyres
Copy link
Member

@awlauria Odd. I am totally unable to reproduce this, even on the v3.0.x branch.

How exactly are you configuring Open MPI?

Here's what I see when I rebuild that file:

$ make orted/liborted_mpir_la-orted_submit.lo V=1
/bin/sh ../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../opal/include -I../ompi/include -I../oshmem/include -I../opal/mca/hwloc/hwloc1117/hwloc/include/private/autogen -I../opal/mca/hwloc/hwloc1117/hwloc/include/hwloc/autogen -I../ompi/mpiext/cuda/c   -I.. -I../orte/include -I/home/jsquyres/git/ompi-crossover/opal/mca/event/libevent2022/libevent -I/home/jsquyres/git/ompi-crossover/opal/mca/event/libevent2022/libevent/include -I/home/jsquyres/git/ompi-crossover/opal/mca/hwloc/hwloc1117/hwloc/include   -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -fno-strict-aliasing -pthread -g -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -pthread -MT orted/liborted_mpir_la-orted_submit.lo -MD -MP -MF orted/.deps/liborted_mpir_la-orted_submit.Tpo -c -o orted/liborted_mpir_la-orted_submit.lo `test -f 'orted/orted_submit.c' || echo './'`orted/orted_submit.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../opal/include -I../ompi/include -I../oshmem/include -I../opal/mca/hwloc/hwloc1117/hwloc/include/private/autogen -I../opal/mca/hwloc/hwloc1117/hwloc/include/hwloc/autogen -I../ompi/mpiext/cuda/c -I.. -I../orte/include -I/home/jsquyres/git/ompi-crossover/opal/mca/event/libevent2022/libevent -I/home/jsquyres/git/ompi-crossover/opal/mca/event/libevent2022/libevent/include -I/home/jsquyres/git/ompi-crossover/opal/mca/hwloc/hwloc1117/hwloc/include -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -fno-strict-aliasing -pthread -g -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -pthread -MT orted/liborted_mpir_la-orted_submit.lo -MD -MP -MF orted/.deps/liborted_mpir_la-orted_submit.Tpo -c orted/orted_submit.c  -fPIC -DPIC -o orted/.libs/liborted_mpir_la-orted_submit.o
mv -f orted/.deps/liborted_mpir_la-orted_submit.Tpo
orted/.deps/liborted_mpir_la-orted_submit.Plo

@awlauria
Copy link
Contributor Author

@jsquyres I configured without --enable-debug, so maybe that is it?

@jsquyres
Copy link
Member

No, that shouldn't be it.

Send the first several lines of your config.log -- it will have the full CLI from how you invoked configure.

@awlauria
Copy link
Contributor Author

awlauria commented Jul 22, 2019

$ ./configure --with-ucx=no --prefix=/smpi_dev/awlauria/ompi_builds/3.0.x/exports

## --------- ##
## Platform. ##
## --------- ##

hostname = f10n17
uname -m = ppc64le
uname -r = 4.14.0-115.8.1.el7a.ppc64le
uname -s = Linux
uname -v = #1 SMP Thu May 9 14:45:13 UTC 2019

/usr/bin/uname -p = ppc64le
/bin/uname -X     = unknown

/bin/arch              = ppc64le

@jsquyres
Copy link
Member

Are there any env variables set when configure is invoked? (e.g., CFLAGS and the like)

@awlauria
Copy link
Contributor Author

@jsquyres is this ok now? I updated the comment to have more info.

@jsquyres
Copy link
Member

I'm sorry to be so picky, but the reason I was asking about -O3 yesterday is because you said the following in the comment:

/* MPIR_Breakpoint(). Unfortunately since -O3 is added
 * to every file via CFLAGS, there is a possibility
 * that the compiler will see this function as a NOOP and
 * optimize it out. To prevent this, add the volatile keyword
 */

But that's not actually what should be happening (i.e., -O3 should not be added to every file -- it specifically shouldn't be being added to the orte_submit.c file). So I'd like to either understand what's happening here, and/or update that comment.

@gpaulsen said today on the Webex that he was just starting a build to see what's happening with env variables, etc.

@gpaulsen
Copy link
Member

So I was able to reproduce the -O3 being added in a clean checkout / build of v3.0.x on ppc64le with gcc 4.8.5 without any relevant env variable set.
I configured like this:
> nohup ./configure --prefix=/mnt/ram/gpaulsen/ompi-install --with-ucx=no &> configure.out &
I then ran make like this:
> nohup make V=1 &> make.out &
This is installed:

 > rpm -qa |grep -e gcc -e auto
autoconf-2.69-11.el7.noarch
autofs-5.0.7-99.el7.ppc64le
gcc-c++-4.8.5-36.el7_6.2.ppc64le
gcc-4.8.5-36.el7_6.2.ppc64le
autogen-libopts-5.18-5.el7.ppc64le
gcc-gfortran-4.8.5-36.el7_6.2.ppc64le
libgcc-4.8.5-36.el7_6.2.ppc64le
automake-1.13.4-3.el7.noarch

I pushed the logs to my private fork here: https://github.com/gpaulsen/ompi/tree/data/pr6828_logs/pr6828

I know I should have used a Gist, but I don't think Gists work well with such large files.

@jsquyres
Copy link
Member

With this, I tracked down your issue.

@gpaulsen's logs confirm that -O3 is, indeed, added to their build with no additional env vars, etc.

But @gpaulsen also confirms that they're using Automake v1.13.4. According to https://www.open-mpi.org/source/building.php, Automake >= v1.15 should be used.

When compiling with Automake v1.13.3, I can replicate the -O3 behavior. Meaning: this is just behavior from older Automake. You should upgrade. (sidenote: there may be a way to make AM v1.13.x behave correctly; feel free to investigate that 😄)

That being said, I'm still amenable to the code update in this PR. But please update the comment to be a little more clear (possibly even make reference to older versions of Automake...? However you want to say it, but just be clear that when using the recomended version of Automake, the Right Things happen, but with some older versions of Automake -- e.g., v1.13.4 -- -O3 can still be used to compile orted_submit.c, and that may still cause some problems yadda yadda yadda).

@awlauria awlauria force-pushed the mpir_breakpoint_noop_fix branch from a6501c6 to f6b4351 Compare July 24, 2019 13:16
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
@awlauria awlauria force-pushed the mpir_breakpoint_noop_fix branch from f6b4351 to 00106f5 Compare July 24, 2019 13:17
@awlauria
Copy link
Contributor Author

@jsquyres I updated the comment to be more clear on what's going on here. Thank you for root-causing.

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go -- thanks for your patience, @awlauria!

@jjhursey
Copy link
Member

Thanks @awlauria for tracking that one down. We'll want to make sure this gets into the release branches once merged.

@jsquyres jsquyres merged commit 888f3ec into open-mpi:master Jul 24, 2019
@jsquyres
Copy link
Member

CI completed, so I merged.

Can someone make PRs for v3.0.x, v3.1.x, and v4.0.x?

@awlauria
Copy link
Contributor Author

Sure.

v3.0.x: #6841
v3.1.x: #6842
v4.0.x: #6843

@ggouaillardet
Copy link
Contributor

FWIW, GCC 9.1.0 reports the following warning

orted/orted_submit.c: In function 'MPIR_Breakpoint':
orted/orted_submit.c:202:12: warning: return discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
  202 |     return noop_mpir_breakpoint_ptr;
      |            ^~~~~~~~~~~~~~~~~~~~~~~~

@jsquyres
Copy link
Member

jsquyres commented Aug 5, 2019

@ggouaillardet Good catch; thanks.

@awlauria Can you make another master PR to fix this, and then add that commit to your 3 PRs?

@jsquyres
Copy link
Member

jsquyres commented Aug 5, 2019

Actually, it looks like the 3 PRs have all been merged already. I guess we'll need 4 new PR's then (one for master, one for each of the release branches).

@awlauria
Copy link
Contributor Author

@ggouaillardet thanks for reporting that. I posted #6904 to fix. @jsquyres

If that's fine I'll post to the other 3 branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants