-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dwalk: show where most space is used #249
Comments
More can be done, but there are a couple of existing options that work for some. Hopper can read mpiFileUtils output files, so that dwalk can be used to capture the info, and hopper can then display it. On some systems, Hopper can invoke dwalk directly. We should document these steps. dsh is handy on systems where it works. Given a file list, which it can either generate or read in via the --input option, it supports interactive viewing of a directory tree with "cd", "pwd", and "ls" like commands. The "ls" command sums up the bytes and item counts within each subdirectory, and then it prints items in descending order by size. The behavior looks like the following, which is using made up file and directory names:
The dsh tool also supports "rm" commands which lets a user delete stuff as they work through the tree. dsh requires the job launcher to forward stdin to MPI ranks, which does not work on all systems. One could take the logic in dsh and create a command-line based tool that reads from a dwalk output file. |
OMG this is amazing. I just tested it on a CORAL machine, and it works wonderfully. I’ll write up some documentation around this. Some questions:
|
Other notes:
|
Cool! I hadn't tried on CORAL yet, so glad to hear that works.
The initial flist that it uses is either fed in via --input like from the output of a previous dwalk, or the user can list a path to walk on the command line. It does not walk again after it gets this list, and if given via --input, it doesn't walk at all. With "ls" it filters that flist based on the "current working directory", summing up all files and sizes under the current working directory tree.
Yeah, that's a good idea. I would say none of the commands perfectly mimic a real shell. Also things like tab-completion and the up/down arrows don't work. Nor does shell wildcarding, or variables, or things like "cd ~/".
I added a regex to "ls" but not "rm". Forget why, perhaps just testing whether I could get it to work. In "ls", the regex can be used to filter a subset of items. It uses POSIX regex rather than shell regex. That could be changed.
It waits for the file system operations to complete, then it returns. The "rm" command invokes mfu_flist_unlink() behind the scenes, and this output will be printed as it runs (I think).
Yes, I have dsh remove entries from the flist which the user has removed. One can save this updated list to a file with the --output option. To write the file, one has to exit the session cleanly with "exit". I also found that a common use case is to take a file as input, remove some stuff, shut down go to lunch, come back and want to start up and remove more stuff. To accommodate this, I created a --file option. This option works like --input and --output combined, where it reads its input from the named file and writes output back to the same file. |
Good idea. Right now, it just prints the commands you can run, if you try something it doesn't understand. It doesn't document the commands in any detail.
Yeah, that because regex is missing in rm right now. You might get something interesting with |
related to #50 |
When users run into file system quotas, they often don't understand where the bulk of their files are. Having a printout to show disk usage is just the sort of thing dwalk should be able to do (or a tool on top of dwalk).
Here is an example tool that does the same thing: https://ownyourbits.com/2018/03/25/analyze-disk-usage-with-dutree/
Hopper has this ability as well.
The text was updated successfully, but these errors were encountered: