-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sharness: update flux-sharness.sh script #316
Conversation
Problem: newer flux-sharness.sh script fails a test if it exits with extra modules loaded. Amend tests to ensure that any modules loaded in the test are unloaded at the end of the test.
Problem: test_under_flux() conditionally starts the broker with the -q option, but that option is no longer valid. N.B. flux-broker -q,--quiet was dropped in PR flux-framework/flux-core#1464. flux-sharness.sh has lagged behind the one in flux-core. Copy over the flux-core version.
LGTM. I will merge it when CI reports back. Thanks @garlick! |
Not immediately obvious why these failed on Travis.
|
Kind of looks like one or more brokers died while realoading the hwloc file. The last entry in the broker logfile is
unfortunately there's no way to gather coredumps or other fatal information on Travis right now, so this is all we have to go on... |
Another builder didn't make it that far (just as far as unloading sched module). I also noticed this:
Maybe hit by the oom-killer? |
Hmmm. That's a new test I added for Sierra but it was okay when it was committed. What are some of the recent changes in flux-core? |
There is this one in resource-hwloc: |
Yeah, flux-sched nightly build started failing about 6 days ago in the same way described here, and the commit you reference @garlick went into flux-core@master ... 6 days ago. |
Travis has libhwloc5_1.8-1 which is listed as "ancient" on the hwloc site. When I look at the changelog between there and say libhwloc5_1.11.2-3 which appears to work, waugh.... That software has a lot of churn! Should we build a newer one in travis perhaps? I could tack that onto this PR... |
Are we using a newer libhwloc in flux-core? It makes sense to use a relatively recent libhwloc, but wondering what changed as of 6 days ago that the test started failing. |
Oh, we are building 1.11.0 in travis for flux-core. We are just installing the default package in flux-sched. My thought was maybe the flag changes in that commit somehow made the old hwloc sad in combination with the test inputs that are failing... |
Yeah, we should match what we're doing in flux-core. That just makes sense, and your intuition seems good to me! |
Oh hold on, sched's Maybe I just need to remove the hwloc package from |
Problem: sched's .travis.yml is both installing the hwloc package and building it from source via flux-core's travis-dep-builder.sh script. Drop the packaged hwloc from .travis.yml.
Codecov Report
@@ Coverage Diff @@
## master #316 +/- ##
==========================================
+ Coverage 73.98% 74.06% +0.07%
==========================================
Files 49 49
Lines 9511 9511
==========================================
+ Hits 7037 7044 +7
+ Misses 2474 2467 -7
Continue to review full report at Codecov.
|
Hit this in c9.io, here's some extra output I saw on the console
This with hwloc-1.11.0. I updated to hwloc-1.11.1 and the problem goes away. shrug Probably
See hwloc 1.11.1 NEWS. |
Yeah, HWLOC_OBJ_GROUP is a new type that Sierra hwloc xml brings to the table. |
We can update flux-core Travis build script to include hwloc-1.11.1, and
check for this as min version in configure in both projects to avoid this
bug.
…On Wed, Apr 18, 2018, 9:21 PM Dong H. Ahn ***@***.***> wrote:
Yeah, HWLOC_OBJ_GROUP is a new type that Sierra hwloc xml brings to the
table.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#316 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAtSUpn-KL4xO0iwlFwOmL-NtmiD2_Wmks5tqBDBgaJpZM4Ta3Rh>
.
|
That makes sense to me. (https://github.com/flux-framework/flux-sched/blob/master/configure.ac#L45) Also somehow sched's README.md misses the hwloc dependency and this probably should be documented at here |
Problem: hwloc-1.11.0 asserts when ingesting sierra hwloc XML. hwloc-1.11.1 contains a fix for this so bump the minimum version in configure.ac. This was discussed in flux-framework/flux-sched#316.
Problem: hwloc-1.11.0 asserts when ingesting sierra hwloc XML. hwloc-1.11.1 contains a fix for this so set the minimum version to this in configure.ac.
Nice work guys! I like it when I leave work with a mystery, and it's solved by the time I check back in again! |
Restarted build after merging flux-framework/flux-core#1478 |
Looks like
|
@dongahn, one of the builders failed here
It is difficult to tell exactly why it failed:
Is this just a race condition when jobs complete before the test script reaches |
Hmm, same failure after I tacked on that commit. |
I think there is a race in the |
Hm, maybe not a race because I couldn't trigger it with an introduced |
I opened #317 for the test failure and restarted that builder. Maybe we can merge this since the failure is (likely) unrelated to this PR? |
Works for me. Thanks! |
Yes, I believe there is a race. I know how to fix this. Why don't you disable this case and merge this in and I will get to this either tonight or tomorrow. |
We went ahead and merged since the test only fails intermittently. |
Thanks. |
This PR syncs
t/sharness.d/flux-sharness.sh
with the version in flux-core.The
flux-broker --quiet
option was removed recently flux-core, but sched's version offlux-sharness.sh
was still using it, resulting in sadness. Rather than making a small change to sched's verison, I copied over core's version including some other changes.One such change is a test that any modules loaded within a sharness script are unloaded when the script exits. This necessitated trivial additions to some tests.