Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAIL: t2000-fcfs.t 2 - sim: scheduled and ran all jobs #249

Closed
garlick opened this issue Jun 26, 2017 · 2 comments · Fixed by #250
Closed

FAIL: t2000-fcfs.t 2 - sim: scheduled and ran all jobs #249

garlick opened this issue Jun 26, 2017 · 2 comments · Fixed by #250

Comments

@garlick
Copy link
Member

garlick commented Jun 26, 2017

When pr #246 was submitted, travis failed against the current flux-core master.

PASS: t2000-fcfs.t 1 - sim: started successfully
FAIL: t2000-fcfs.t 2 - sim: scheduled and ran all jobs
PASS: t2000-fcfs.t 3 - jobs scheduled in correct order
ERROR: t2000-fcfs.t - exited with status 1
PASS: t2001-fcfs-aware.t 1 - sim: started successfully
FAIL: t2001-fcfs-aware.t 2 - sim: scheduled and ran all jobs
PASS: t2001-fcfs-aware.t 3 - jobs scheduled in correct order
ERROR: t2001-fcfs-aware.t - exited with status 1
PASS: t2002-easy.t 1 - sim: started successfully
FAIL: t2002-easy.t 2 - sim: scheduled and ran all jobs
FAIL: t2002-easy.t 3 - jobs scheduled in correct order
ERROR: t2002-easy.t - exited with status 1

Debugging this, the problem seems to have been introduced between flux core pr 1082 (May 31) and 1079 (May 25), not by pr #246 (see the pr for some more details).

As discussed in our meeting, I'll disable the failing tests in pr #246, and then once we get that merged and sched can compile against the flux-core master again, we can debug the other problem.

garlick added a commit to garlick/flux-sched that referenced this issue Jun 26, 2017
Failing tests disabled pending resolution of issue flux-framework#249.
@SteVwonder
Copy link
Member

SteVwonder commented Jun 27, 2017

Just a checkpoint on my progress....I tracked down why the simulator modules were not properly sending/receiving requests. Most of the requests are created in the following way:

msg = flux_msg_create (FLUX_MSGTYPE_REQUEST);
flux_msg_set_topic (msg, topic);
flux_msg_set_json (msg, Jtostr (o));

switching that to:

msg = flux_request_encode (topic, Jtostr(o));

solves the problem (most likely because the enable_route function is actually being called now).

Now I'm getting a segfault once the simulator starts, which I am currently investigating.

@garlick
Copy link
Member Author

garlick commented Jun 27, 2017

If this is a fire and forget request, you could encode and send in one go with

flux_future_t *f;
f = flux_rpc (h, topic, Jtostr (o), FLUX_NODEID_ANY, FLUX_RPC_NORESPONSE);
if (!f) {
    // handle error
}
flux_future_destroy (f);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants