Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xtriggers: only first broadcast results recorded #3275

Closed
sadielbartholomew opened this issue Aug 6, 2019 · 8 comments · Fixed by #3276
Closed

xtriggers: only first broadcast results recorded #3275

sadielbartholomew opened this issue Aug 6, 2019 · 8 comments · Fixed by #3276
Assignees
Labels
bug Something is wrong :(
Milestone

Comments

@sadielbartholomew
Copy link
Collaborator

sadielbartholomew commented Aug 6, 2019

Describe the bug

Only the first (by ordering) key-value item from the dictionary of results in the success return signature of an xtrigger gets broadcast actually gets recorded (logs, DB) as being passed on to dependent tasks, when all items are meant to be recorded as passed on. For example, if an xtrigger call returns (True, {"ONE": "one", "TWO": "two", "THREE": "three"}), only "ONE": "one" is logged as passed via an environment variable <xtrigger_name>_ONE=one.

See below for a concrete example.

NB: I have checked & observed that the problem is not present for regular (non-xtrigger) broadcasts, just for broadcasts made by any xtrigger.

Release version(s) and/or repository branch(es) affected?
This bug had been tested for, & found to present, for both 8.0a0 & 7.8.3.

Steps to reproduce the bug

  1. Set up an xtrigger which always succeeds to return a results dict with multiple items, e.g:
def succeed(*args):
    return (True, {"ONE": "one", "TWO": "two", "THREE": "three"})
  1. Apply it in a simple dummy suite, e.g the one in the docs here, with echo_1 = echo(...) replaced by succeeding = succeed(*args), where succeed.py is under <suite-dir>/lib/python.

  2. Observe the suite log to only show that the environment variable for the first key, "ONE", has been broadcast:

2019-08-06T19:43:10+01:00 INFO - [foo.20190806T1943+01] -health check settings: submission timeout=None
2019-08-06T19:43:11+01:00 INFO - Broadcast set:
	+ [foo.20190806T2043+01] [environment]succeeding_ONE=one
2019-08-06T19:43:11+01:00 INFO - [foo.20190806T2043+01] -submit-num=01, owner@host= ...
  1. Switch the ordering of the results dict & observe that the suite log display changes accordingly, e.g. after moving "TWO:"two" to the front (to be the zeroth item):
2019-08-06T19:43:57+01:00 INFO - [foo.20190806T1943+01] -health check settings: submission timeout=None
2019-08-06T19:43:58+01:00 INFO - Broadcast set:
	+ [foo.20190806T2043+01] [environment]succeeding_TWO=two
2019-08-06T19:43:58+01:00 INFO - [foo.20190806T2043+01] -submit-num=01, owner@host=...

Expected behavior
All defined key-value items specified in a (valid) results dict get logged & stored in the DB when they are passed to tasks depending on the xtrigger.

Screenshots

The 'broadcasts list' page on Cylc Review also supports this, e.g. at step 3 above it shows:

broadcasts_scr

@hjoliver
Copy link
Member

hjoliver commented Aug 6, 2019

Ouch, that's bad. I'm pretty sure existing xtrigger tests should be testing this...

@hjoliver
Copy link
Member

hjoliver commented Aug 6, 2019

Yep, tests/xtriggers/01-suite_state.t checks the job log of an xtrigger-dependent job for the results of a multi-item broadcast. So, this is mysterious...

@hjoliver
Copy link
Member

hjoliver commented Aug 6, 2019

Good news, it is only the logging that is truncated, not the broadcast itself. Running your example as described above (but with x as the xtrigger label), the job log of foo.1 shows:

$ (venv) [oliverh@drugs-and-money]$ cylc cat-log -f j sadie foo.1
...
cylc__job__inst__user_env() {
    # TASK RUNTIME ENVIRONMENT:
    export x_three x_two x_one
    x_three="THREE"
    x_two="TWO"
    x_one="ONE"
}
...

So, all of the xtrigger return values do get broadcast to the task. But the suite log does indeed only show the first.

@hjoliver
Copy link
Member

hjoliver commented Aug 6, 2019

The problem seems to be here in cylc/flow/broadcast_report.py:

98     for broadcast_change in get_broadcast_change_iter(
99             modified_settings, is_cancel):
100         msg += CHANGE_FMT % broadcast_change

Iteration stops at the first item.

But I'll leave this to you @sadielbartholomew, as you've assigned yourself and might have worked on it already!

@hjoliver
Copy link
Member

hjoliver commented Aug 6, 2019

Doh, somehow I missed that you had already put up a PR for this! (Was going through my GitHub notification emails sequentially). 😬

And you seem to have come to a different conclusion to me. Moving over to #3276 ...

@hjoliver hjoliver changed the title xtriggers: only first results key-value broadcast xtriggers: only first broadcast results logged Aug 6, 2019
@sadielbartholomew
Copy link
Collaborator Author

Thanks for looking into it @hjoliver, it seems like it is a subtle bug.

Ouch, that's bad. I'm pretty sure existing xtrigger tests should be testing this...

Good news, it is only the logging that is truncated, not the broadcast itself

Sorry for the partial misdiagnosis, I had to stop work before I got onto look at the testing aspect in #3276, at which point I will have realised it was just the logging, though to defend my conclusions before I called it a day:

  • I inspected some non-xtrigger broadcasts via the suite logs of basic suites with broadcasts, & the broadcasts there were all fine, notably I definitely saw all expected cases having >1 broadcast detailed under a INFO - Broadcast set: ... line, suggesting there was nothing wrong with the logging (hence it is puzzling if it is a problem in cylc/flow/broadcast_report.py as you suggested, but I'll look into it).
  • While investigating I had found a way (as put up in Record all broadcast values in xtrigger return signature #3276) to fix the problem in xtrigger- rather than broadcast- related code, which to me confirmed it was on the former not the latter side.

Anyway, I want to ensure I'm fixing the buggy code rather than inadvertently bodging perfectly fine code to produce the right behaviour, as based on your investigations & reporting comments seems I have likely one in #3276! I'll look into it further & talk to Matt about the logging & then make amendments there accordingly.

@hjoliver
Copy link
Member

hjoliver commented Aug 7, 2019

though to defend my conclusions before I called it a day

👍 - yes I can absolutely see how you would think, from the evidence, that this must be an xtrigger broadcast bug (not just logging)!

@sadielbartholomew
Copy link
Collaborator Author

Don't (blindly) trust the logs is probably the main take-away lesson for me here! 😇 -> skeptical 🎩

@matthewrmshin matthewrmshin changed the title xtriggers: only first broadcast results logged xtriggers: only first broadcast results recorded Aug 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants