job-info: stream appended data, do not return all data in 1 giant blob #6516

chu11 · 2024-12-16T17:52:56Z

(Similar to the work in #6444)

When a job is completed, there is presumably no more updates to job eventlogs, therefore any watched data does a kvs-lookup on the data. This can be very slow if there is a large amount of data. For example, think of flux job attach JOBID on a job that is completed and has a bajillion lines of output. The kvs-lookup will return all of the standard output in 1 big reply.

Instead we should stream this data no different than if the job was live to improve turnaround time. Right now flux job attach JOBID has the appearance of a hang in this scenario.

Note that started with work in #6456 but decided that it should be split off into a separate issue.

The text was updated successfully, but these errors were encountered:

Problem: If a job is inactive, all data in an eventlog will be returned as a single response during an eventlog watch. This is because we know for a fact that the data should never change after the job is inactive. If this data, such as job standard output, is very large, this lookup can be very slow. In some cases, the use of something like `flux job attach` can have the apperance of a hang because the standard output response is taking so long to lookup and return. Solution: When a job is inactive and the user wants to watch a job eventlog, do not respond with all of the data in a single response. Instead stream the response back just as if the job were active. Utilize the FLUX_KVS_WATCH_APPEND_ONCE to ensure the stream ends once all data in the KVS is streamed. Update all variables, functions, etc. from "lookup" to "watch". Fixes flux-framework#6516

Problem: If a job is inactive, all data in an eventlog will be retrieved as a single response during an eventlog watch. This is because we know for a fact that the data should never change after the job is inactive. If this data, such as job standard output, is very large, this lookup can be very slow. In some cases, the use of something like `flux job attach` can have the apperance of a hang because the standard output response is taking so long to lookup and return. Solution: When a job is inactive and the user wants to watch a job eventlog, do not retrieve all of the data. Instead, retrieve the data via an internal eventlog watch, but have the eventlog watch use the new FLUX_KVS_STREAM flag. Update all variables, functions, etc. from "lookup" to "watch". Fixes flux-framework#6516

Problem: If a job is inactive, all data in an eventlog will be retrieved as a single response during an eventlog watch. This is because we know for a fact that the data should never change after the job is inactive. If this data, such as job standard output, is very large, this lookup can be very slow. In some cases, the use of something like `flux job attach` can have the appearance of a hang because the standard output response is taking so long to lookup and return. Solution: When a job is inactive and the user wants to watch a job eventlog, do not retrieve all of the data. Instead, retrieve the data via an internal eventlog watch, but have the eventlog watch use the new FLUX_KVS_STREAM flag. Fixes flux-framework#6516

Problem: If a job is inactive, all data in an eventlog will be retrieved from the KVS in a single lookup. This is because we know the data should never change after the job is inactive. If this data is very large, this lookup can be slow. In some cases, the use of something like `flux job attach` can have the appearance of a hang because the standard output response is taking so long to lookup and return. Solution: When a job is inactive and the user wants to watch a job eventlog, do not retrieve all of the data from the KVS in a single lookup. Instead, use the FLUX_KVS_STREAM flag to retrieve the data in smaller chunks. This data will be internally read and parsed no differently than when the job is active. Fixes flux-framework#6516

chu11 self-assigned this Dec 16, 2024

chu11 linked a pull request Dec 16, 2024 that will close this issue

job-info: stream events even if job is inactive #6518

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

job-info: stream appended data, do not return all data in 1 giant blob #6516

job-info: stream appended data, do not return all data in 1 giant blob #6516

chu11 commented Dec 16, 2024

job-info: stream appended data, do not return all data in 1 giant blob #6516

job-info: stream appended data, do not return all data in 1 giant blob #6516

Comments

chu11 commented Dec 16, 2024