-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timer Initialize pair issue with MAPL 2.6.4 using ifort19 in GCHP #779
Comments
Yes this is reproducible. I rebuilt from scratch with the same libraries and got exactly the same error. I then tried with ifort 18.0.5. That gives me a different error, specifically when loading the logging yaml file here. Are the libraries I am using all expected to be compatible with ifort18?
For the ifort18 run I then used the logger work-around I used for pFlogger issue 54 (comment out setting logging file in cap_options). Doing that fixes the new problem, but the run then crashes. The traceback points to the same source as the issue with ifort19. For ifort 18 the symptom is a seg fault rather than graceful fail in
I have been doing |
Returning to this issue with a bit more clarity. In my first discussion with @mathom4, I was trying to see how the start() and stop() operations on the The usual culprit at this point would then be a condition in some lower layer that causes an early return before some timer is stopped. Unfortunately the call stack implies that this early return is not reported and is thus possibly a "normal" situation. OTOH, since this has not been reported for GEOS and seems to work for GCHP with GFortran, it's a fairly unusual thing. I recommend editing line 168 of profiler/BaseProfiler.F90: MAPL/profiler/BaseProfiler.F90 Line 168 in 3e272aa
To instead have:
This should very quickly narrow where the real problem is happening. And if this works, we'll fix this an the other messages in that layer more permanently. |
Thanks @tclune. I'll try that and see if it fixes it. In the meantime yesterday I switched MAPL to v2.6.3 and that resolved the problem. I still get the |
I have not used 18 in a while (Intel makes it hard to sustain older compilers across the mandatory OS X upgrades on my laptop). That particular error sounds familiar though, and I don't think I ever came up with a workaround. If you can remind me where you are getting that error message, I can take a fresh look and suggest some possible variants. |
I should add - we're not doing anything these days that should make Intel 18 obsolete. We're just fighting against random compiler defects. |
Here is the traceback for the ifort18 issue in MAPL v2.6.3 and v2.6.4. The last version used in GCHP was v2.2.7 so this could have come in any of the versions since.
|
Here's a quick link to the location in |
And this is with the default logging.yaml file? (Sounding more and more familiar.) |
Not quite default. It is the file you gave to Seb a while back, although I had to change My understanding is the only difference from the one you use in GEOSgcm is this. |
I was able to parse your yaml file with a standalone driver using the latest yaFyaml (main branch) and ifort 18.0.5 on Linux. There is a separate problem with this yaFyaml because I accidentally made it require pFUnit which was not my intent. I'll to a hotfix for this shortly. The conclusion is that I either entered a workaround for the problem you described, or it is a bit harder to reproduce. |
Could not find anything in the commit history. Indeed the relevant file has not been touched in 7 months. But Liam already at an unrelated PR that fixes the pFUnit issue. So I'll roll out a new release and request that you try it in your code. |
Thanks @tclune. I'll look for that release. |
Ah, just saw it's already released. I'll try it and let you know how it goes. |
Unfortunately that doesn't fix it. Would changing anything in |
I only use I'm going to attach |
I haven't gotten that to work yet. We do not use yaFyaml as an external library and I'm taking a break from messing around with cmake. But I did isolate the issue to this line: https://github.com/geoschem/yaFyaml/blob/7f16059ebc95083dd1e77954296798c745f9a287/src/Lexer.F90#L316 |
Yes - definitely a compiler bug, but I can't attempt a workaround unless I can reproduce it. You don't have to use cmake. You could just add my small program within the project that is using yafyaml or even make it a subroutine that gets called from the top of an existing program. Just want to see if it is a difference in your environment vs something to do with the state of the code when it gets down in there. |
Gotcha. I added this to the very top of main
This also trips the error, but with better traceback, although to the line I already figured out was the issue:
I don't think I mentioned what libraries I'm using yet. Here they are in case it makes a difference.
I found an Intel forum discussion on this error being encountered in ifort18 here. |
Unfortunately, I cannot follow your ifort18 link. I'm in a spiral where it keeps making me register. The only versions of libraries that can possibly matter are gFTL and gFTL-shared. I'll try again with the tags you mention for those when I get onto discover. |
OK, I rebuilt using the specified versions of gFTL, gFTL-Shared and the latest yaFyaml with 18.0.5. I was able to process your yaml file. For now, the one obvious workaround to try is to replace the failing line with
And if that does not work, let's try to limit the advanced syntax to see if that helps. Replace the entire procedure with:
|
No luck with either. I'm fine with requiring ifort19+ for Intel compilers unless you want to keep pursing this. The link should work unless the VPN you have is interfering. |
OK - if you can move to 19 that would be best. I hate inflicting compiler version creep on others. These days I can more often workaround compiler defects, but still. Currently the latest GFE stack works with both 19 and 21 (don't ask about Intel's numbering here). GEOS uses 19 and I develop using 21. |
Sounds good. At this point ifort18 is more than four years old (release Jan 2017) so it seems reasonable discontinue support. Especially when gfortran is an option. |
I am running GCHP using MAPL v2.6.4 and am getting a run-time error when using ifort19.0.5 with OpenMPI 4.0.2. Strangely I have not encountered any run issues when using GNU fortran compilers.
The traceback is as follows:
The error is caught in subroutine
stop_name
(see code here).Have you seen this type of error before and do you know a fix?
The text was updated successfully, but these errors were encountered: