-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New vader SEGV possibly PR 5829 #5842
Comments
@amckinstry, can you please confirm that you have commit dfa8d3a in your master build that you saw this with? There were a few other vader related fixes that went in around #5829 |
@amckinstry - @hjelmn asked what platforms this affects. If it only affects i386, we may not have a developer to fix this. |
I don't agree with that sentiment. If we know it's broken on a platform, we need to make it right. We either fix it on that platform or we disable the functionality (i.e., vader fastboxes) on that platform. For core functionality like this, we can't just knowingly have a bad bug like this. |
@jsquyres My time is at a premium. We don't run i386 so unless I am told to fix this at work I will not spend any time on it. Just the reality right now. Easy enough to disable fast boxes on i386. I can make a PR for that. |
What evidence do we have the bug is i386? The last two have been bugs in general, but are more likely to happen on i386. Really think we should consider disabling fastboxes in general until we can go through it in more detail. I’m happy to do a review, but would love a high level description of the fastbox design first; there are some assumptions in the code I didn’t quite follow last week. |
@bwbarrett Read my comment. If this is only happening on i386 I won't have time for it. If it happens elsewhere (x86_64, power, arm64) then I can justify the time. |
@hjelmn I don't really disagree with you. Indeed, I don't care about i386, either. We're all busy -- who's got time to fix i386? Not me. But regardless as a community, we need to decide which of these to do:
In short: it would be super lame for us to keep shipping code that configures/builds/installs/runs on these problematic platforms when we seem to have indications that there's a fairly big bug in core functionality. And I think we all agree: we don't know that it's (only) i386. We need to hear two things back from @amckinstry:
|
@jsquyres Option 2 would be my preference but I don't know if we are ready to kill support a platform that was effectively killed in the early 2000's (when x86_64 came about). So, option 3 is the way to go. Once we hear back from the user we will see what the path forward is. |
@bwbarrett The original bug(s) was on multiple platforms: (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=907407) arm, i386, powerpc, including mips64el and ppc64el so not just 32-bit; i386 was just the easiest to debug. It could be seen in test case issue 46 on "arpack", but we also had hangs in "liggghts" and "elpa". This is very definitely a release blocker for Debian/Ubuntu; we need to either fix or disable functionality somehow. Whatever I can do to help, I will. |
Will look today. |
@amckinstry Which LAMMPS problem shows the issue? Is OpenMP in use? |
Version lammps-0~20180822.gitb47e49223
OpenMP not enabled I think; it appears to be stochastic, hanging about half the time, crashing the rest (with trace above). |
@amckinstry Thanks. That will help me narrow down where the problem may be. |
For the record in case of mistake the patch I am running on openmpi-3.1.2 is attached
|
I've tested the fix in #5852 |
This has been merged to v3.0.x, v3.1.x, and v4.0.x. It looks like this functionality never existed on v2.1.x or earlier. |
@amckinstry commented on #5829
Original issue was: #5638
Since that was merged to all release branches, this is a blocker on all release branches.
The text was updated successfully, but these errors were encountered: