-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-info: stream events even if job is inactive #6518
base: master
Are you sure you want to change the base?
job-info: stream events even if job is inactive #6518
Conversation
Naive question: could the client just do a FLUX_KVS_WATCH_APPEND instead of the lookup and then just cancel it when the |
oh i forgot, doing
on master the |
Perhaps my description of what I'm trying to solve isn't clear. We already know for a fact that the 'clean' event has arleady happened. The job is already done / inactive. So the issue is if you do a FLUX_KVS_WATCH_APPEND instead of a lookup on a job that is inactive/done, how do you know when to issue the cancel? You don't really know. So that's why I wanted to put the "WATCH_APPEND_ONCE" flag in on the |
Oh duh, sorry, I was thinking about watching the primary eventlog which always ends in Since the user is just passing this to
|
Ahhh, I like this idea. Lemme see how to go about this. |
215ba81
to
f6d916b
Compare
still need to add tests but went with the proposed FLUX_KVS_STREAM (still need tests & documentation) |
Nice! Just a thought, but FLUX_KVS_STREAM is a pretty big feature on its own so maybe it should be proposed in a standalone PR with its own docs and tests? Then this one could make the job-info changes? |
f6d916b
to
9d02627
Compare
Problem: If a job is inactive, all data in an eventlog will be retrieved from the KVS in a single lookup. This is because we know the data should never change after the job is inactive. If this data is very large, this lookup can be slow. In some cases, the use of something like `flux job attach` can have the appearance of a hang because the standard output response is taking so long to lookup and return. Solution: When a job is inactive and the user wants to watch a job eventlog, do not retrieve all of the data from the KVS in a single lookup. Instead, use the FLUX_KVS_STREAM flag to retrieve the data in smaller chunks. This data will be internally read and parsed no differently than when the job is active. Fixes flux-framework#6516
Problem: Internally a "job-info.eventlog-watch" is done where it used to do a "job-info.lookup". Some older variables, functions, etc. now seem to be misnamed. Rename "main_namespace_lookup" variables, functions, etc. to "main_namespace_watch".
9d02627
to
0118856
Compare
rebased and re-pushed now that #6523 is in, just tweaked some commit messages. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6518 +/- ##
==========================================
- Coverage 83.63% 83.62% -0.02%
==========================================
Files 522 522
Lines 87805 87809 +4
==========================================
- Hits 73434 73427 -7
- Misses 14371 14382 +11
|
* main namespace (main_namespace_lookup()). This is the "easy" case | ||
* main namespace (main_namespace_watch()). This is the "easy" case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering about this name change - once the guest info is copied to the primary namespace, it is static, but "watch" implies that it is changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i did go back and forth on it. I went with watch
b/c I go through job-info.eventlog-watch
instead of calling kvs.lookup
(similarly guest_namespace_watch
also goes through job-info.eventlog-watch
).
I did consider going with stream
instead, but that sort of describes internal knowledge of what job-info.eventlog-watch
is doing.
Problem: If a job is inactive, all data in an eventlog will be returned
as a single response during an eventlog watch. This is because we know for
a fact that the data should never change after the job is inactive.
If this data, such as job standard output, is very large, this lookup
can be very slow. In some cases, the use of something like
flux job attach
can have the apperance of a hang because the standard output response is
taking so long to lookup and return.
Solution:
When a job is inactive and the user wants to watch a job eventlog,
do not respond with all of the data in a single response. Instead stream
the response back just as if the job were active.
Utilize the FLUX_KVS_WATCH_APPEND_ONCE to ensure the stream ends once all
data in the KVS is streamed. Update all variables, functions, etc. from
"lookup" to "watch".
Fixes #6516
I probably need to add some new tests but wanted to throw this up here for now.
Side note on the "WATCH_APPEND_ONCE" flag, I could make this generic and apply it to all watch types (i.e. even non-appends) but didn't feel it was needed elsewhere so didn't do that.