-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[splash] need workaround to efficiently read job stdio output from large jobs #1420
Comments
I'm working on a PR to libkz that defers installation of the KVS watch until all the data already in the KVS has been read (with continuations). If EOF is encountered, then the watch is avoided entirely. This should allow |
I ran into something similar with the job error log streams, which aren't typically terminated with EOF. |
I was thinking of something like a KZ_FLAGS_NOFOLLOW or similar flag for kz_open that would cause kz_read to return EOF once the last "block" is consumed. |
Ok, that sounds reasonable, I was initially not understanding how the latest |
Add -n, --no-follow option to flux-wreck attach to set the "nofollow" option on all kz io streams. This will result in attach exiting after all existing output blocks have been read, so that attach will not block, even if some or all kz streams do not yet have EOF. Fixes flux-framework#1420
Add -n, --no-follow option to flux-wreck attach to set the "nofollow" option on all kz io streams. This will result in attach exiting after all existing output blocks have been read, so that attach will not block, even if some or all kz streams do not yet have EOF. Fixes flux-framework#1420
Add -n, --no-follow option to flux-wreck attach to set the "nofollow" option on all kz io streams. This will result in attach exiting after all existing output blocks have been read, so that attach will not block, even if some or all kz streams do not yet have EOF. Fixes flux-framework#1420
Add -n, --no-follow option to flux-wreck attach to set the "nofollow" option on all kz io streams. This will result in attach exiting after all existing output blocks have been read, so that attach will not block, even if some or all kz streams do not yet have EOF. Fixes flux-framework#1420
Add -n, --no-follow option to flux-wreck attach to set the "nofollow" option on all kz io streams. This will result in attach exiting after all existing output blocks have been read, so that attach will not block, even if some or all kz streams do not yet have EOF. Fixes flux-framework#1420
Add -n, --no-follow option to flux-wreck attach to set the "nofollow" option on all kz io streams. This will result in attach exiting after all existing output blocks have been read, so that attach will not block, even if some or all kz streams do not yet have EOF. Fixes flux-framework#1420
From conversation with @trws:
Scaling issues running large jobs wtih stdio redirected to a file (or interpreted by front end) have been discussed in #1406.
A workaround for this poor scaling is to avoid stdio redirection during the job (e.g. do not use
-O file
option toflux-submit
), then copy the job output from the KVS to a file afterwards. Currently this would be done with something like:This has the advantage of not competing with the big job for KVS and messaging resources, but it still does not scale well for a large number of tasks because
flux wreck attach
sets up the same set of parallel kz watchers that are used for in-job redirection.One thought is to modify
flux wreck attach
so that it walks all existing data first, and only sets up the iowatchers if the job has not reached a terminal state.The text was updated successfully, but these errors were encountered: