-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simulator: use future-based RPC API #246
Conversation
LGTM. @SteVwonder? |
Yep. LGTM too once travis gives the ok. |
Restarted build since flux-core updates required here are now merged. |
Apologies! Missed one. Pushing update. |
Sorry for the false starts here! I'm debugging these on my desktop right now but need to head out to lunch so will pick it up when I get back:
|
Yeah, unfortunately, the simulator doesn't have any unit tests. So my apologies for the not helpful at all error/testing messages. |
Codecov Report
@@ Coverage Diff @@
## master #246 +/- ##
===========================================
- Coverage 69.63% 52.32% -17.31%
===========================================
Files 25 25
Lines 5246 5248 +2
===========================================
- Hits 3653 2746 -907
- Misses 1593 2502 +909
Continue to review full report at Codecov.
|
I pushed a couple more commits: one to work with newer czmq, and one bug where an incorrect key was used to pull the start time out of the KVS (please review cfa0bc6 @SteVwonder) , but didn't get to the bottom of why jobs aren't running. We're hitting the 60s timeout on Will keep looking. I'll revert the RPC change in flux-core if I can't find this by tomorrow. |
@garlick, this key should probably have a better, less ambiguous name, but when running with the simexec module loaded, job start times (w.r.t. simulation time) are stored in "starting_time" and the wallclock time is stored in "starting-time". For this test, I believe we want the simulation time, so "starting_time" should be correct. Let me grab this PR and give it a whirl before you revert changes in flux-core. |
Ah sorry about that. |
On my desktop I backed up flux core to just before RPC changes in PR #1089 and the same tests are failing on flux-sched master. Do we have a known good flux-core? If not I'll bisect. |
Checkpointing some results:
So it seems this goes back a ways. Not sure whether the Apr 4 - May 25 failure is the same root cause, or maybe something that didn't show up in travis since the last change merged to flux-sched (Apr 13) presumably passed with flux-core master at the time. |
Presumably the flux-core that built in travis against current flux-sched master would have been at PR 1032 (Apr 13). This fails for me as well using flux-sched master (Apr 13) in the same way as the May 25 failure described above.
I do get a core file
|
Loading the modules by hand, it seems that the |
I'm running out of time, but the stack trace I posted above points to the code I changed for newer czmq. Nothing jumps out at me in that interval, though there is some reactor work in there. |
Failing tests disabled pending resolution of issue flux-framework#249.
OK, I dropped the incorrect KVS key name change, and added a commit to temporarily disable the failing tests, referencing #249, which I opened to track the root cause of the failure which is unrelated to this PR. |
CI passed, although coverage took a ding with the tests disabled, hence the red x. I think this is ready to be considered for a merge. |
Thanks! I didn't mean to dump it all on you so please let me know how I can help. |
This just updates the single rpc in flux sched for api changes proposed in flux-framework/flux-core#1089
Travis should be re-run after that gets merged.