-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-info: stream appended data, do not return all data in 1 giant blob #6516
Comments
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Dec 16, 2024
Problem: If a job is inactive, all data in an eventlog will be returned as a single response during an eventlog watch. This is because we know for a fact that the data should never change after the job is inactive. If this data, such as job standard output, is very large, this lookup can be very slow. In some cases, the use of something like `flux job attach` can have the apperance of a hang because the standard output response is taking so long to lookup and return. Solution: When a job is inactive and the user wants to watch a job eventlog, do not respond with all of the data in a single response. Instead stream the response back just as if the job were active. Utilize the FLUX_KVS_WATCH_APPEND_ONCE to ensure the stream ends once all data in the KVS is streamed. Update all variables, functions, etc. from "lookup" to "watch". Fixes flux-framework#6516
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Dec 16, 2024
Problem: If a job is inactive, all data in an eventlog will be returned as a single response during an eventlog watch. This is because we know for a fact that the data should never change after the job is inactive. If this data, such as job standard output, is very large, this lookup can be very slow. In some cases, the use of something like `flux job attach` can have the apperance of a hang because the standard output response is taking so long to lookup and return. Solution: When a job is inactive and the user wants to watch a job eventlog, do not respond with all of the data in a single response. Instead stream the response back just as if the job were active. Utilize the FLUX_KVS_WATCH_APPEND_ONCE to ensure the stream ends once all data in the KVS is streamed. Update all variables, functions, etc. from "lookup" to "watch". Fixes flux-framework#6516
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Dec 17, 2024
Problem: If a job is inactive, all data in an eventlog will be retrieved as a single response during an eventlog watch. This is because we know for a fact that the data should never change after the job is inactive. If this data, such as job standard output, is very large, this lookup can be very slow. In some cases, the use of something like `flux job attach` can have the apperance of a hang because the standard output response is taking so long to lookup and return. Solution: When a job is inactive and the user wants to watch a job eventlog, do not retrieve all of the data. Instead, retrieve the data via an internal eventlog watch, but have the eventlog watch use the new FLUX_KVS_STREAM flag. Update all variables, functions, etc. from "lookup" to "watch". Fixes flux-framework#6516
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Dec 18, 2024
Problem: If a job is inactive, all data in an eventlog will be retrieved as a single response during an eventlog watch. This is because we know for a fact that the data should never change after the job is inactive. If this data, such as job standard output, is very large, this lookup can be very slow. In some cases, the use of something like `flux job attach` can have the appearance of a hang because the standard output response is taking so long to lookup and return. Solution: When a job is inactive and the user wants to watch a job eventlog, do not retrieve all of the data. Instead, retrieve the data via an internal eventlog watch, but have the eventlog watch use the new FLUX_KVS_STREAM flag. Fixes flux-framework#6516
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Dec 20, 2024
Problem: If a job is inactive, all data in an eventlog will be retrieved from the KVS in a single lookup. This is because we know the data should never change after the job is inactive. If this data is very large, this lookup can be slow. In some cases, the use of something like `flux job attach` can have the appearance of a hang because the standard output response is taking so long to lookup and return. Solution: When a job is inactive and the user wants to watch a job eventlog, do not retrieve all of the data from the KVS in a single lookup. Instead, use the FLUX_KVS_STREAM flag to retrieve the data in smaller chunks. This data will be internally read and parsed no differently than when the job is active. Fixes flux-framework#6516
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
(Similar to the work in #6444)
When a job is completed, there is presumably no more updates to job eventlogs, therefore any watched data does a kvs-lookup on the data. This can be very slow if there is a large amount of data. For example, think of
flux job attach JOBID
on a job that is completed and has a bajillion lines of output. The kvs-lookup will return all of the standard output in 1 big reply.Instead we should stream this data no different than if the job was live to improve turnaround time. Right now
flux job attach JOBID
has the appearance of a hang in this scenario.Note that started with work in #6456 but decided that it should be split off into a separate issue.
The text was updated successfully, but these errors were encountered: