-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request to keep track of memory, wall time and CPU associated with output files. #110
Comments
Comment by @knoepfel on 2021-07-27 21:23:58 Heidi, we should probably have a meeting to discuss this idea. Some of the metrics are already captured by art, but it's not clear to us what exactly you're after. I'll setup a meeting. |
Comment by @knoepfel on 2021-08-17 15:51:52 Tom and I met this morning to discuss what is being asked of this proposal. After some discussion, it seemed that what is asked is just enough information persisted to the on-disk SAM metadata to identify a workflow/job that is problematic wrt timing and memory usage. After identifying a problematic job using the SAM metadata information, a user can interactively run the job to debug or profile further. At this point, only overall wall clock time and the max. memory usage would be necessary to persist to the metadata. Does that sound sensible? |
Comment by @tomjunk on 2021-08-17 16:28:35 Yes, sounds good. Though the original request was for three numbers -- memory, wall time and CPU time. This doesn't capture all bottlenecks -- for example, some jobs spend a lot of wall time waiting for files before art even starts, but it is a big help, and we cannot ask art to solve that problem. It may be possible to get the art wall time from sam_metadat_dumper's output of start_time and end_time and subtracting them, but a separate field pre-subtracted may be even more convenient. Thanks! |
This issue has been migrated from https://cdcvs.fnal.gov/redmine/issues/26068 (FNAL account required)
Originally created by @hschellman on 2021-07-23 22:58:22
Is it possible to get the memory, wall time and CPU utilization for a job written in the sam (or successor) metadata for an output file? Sounds simple at first, just dump at end of job but if you are writing multiple files to multiple streams it gets complicated as one would need to maintain a separate stats struct for each file that initializes at file open and writes to the metadata at file end. Some of this obviously exists as Art does produce metadata for files.
(I wrote the D0 sam output interface back in the days of the ancients so know you can do this if you can find the file open/close hooks). May have used FORTRAN 2 for all I know.
DUNE is hoping to really instrument our jobs and this would be a great help.
The text was updated successfully, but these errors were encountered: