Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dwalk: show where most space is used #249

Open
gonsie opened this issue Apr 2, 2019 · 6 comments
Open

dwalk: show where most space is used #249

gonsie opened this issue Apr 2, 2019 · 6 comments

Comments

@gonsie
Copy link
Contributor

gonsie commented Apr 2, 2019

When users run into file system quotas, they often don't understand where the bulk of their files are. Having a printout to show disk usage is just the sort of thing dwalk should be able to do (or a tool on top of dwalk).

Here is an example tool that does the same thing: https://ownyourbits.com/2018/03/25/analyze-disk-usage-with-dutree/
Hopper has this ability as well.

@adammoody
Copy link
Member

adammoody commented Apr 3, 2019

More can be done, but there are a couple of existing options that work for some.

Hopper can read mpiFileUtils output files, so that dwalk can be used to capture the info, and hopper can then display it. On some systems, Hopper can invoke dwalk directly. We should document these steps.

dsh is handy on systems where it works. Given a file list, which it can either generate or read in via the --input option, it supports interactive viewing of a directory tree with "cd", "pwd", and "ls" like commands. The "ls" command sums up the bytes and item counts within each subdirectory, and then it prints items in descending order by size.

The behavior looks like the following, which is using made up file and directory names:

srun -n 24 dsh /some/path

/ >>:
cd /some/path

/some/path >>:
ls
--------------------------
    Bytes    Items Path
 12.64 GB  22.24 k /some/path
--------------------------
  4.12 GB   1.00   file1
  4.02 GB   1.00   file2
  4.02 GB   1.00   file3
 95.97 MB   1.00   file4
 74.06 MB   3.31 k dir1
 72.05 MB   2.99 k dir2
. . .

/some/path >>:
cd dir1

/some/path/dir1 >>:
ls
--------------------------
    Bytes    Items Path
 74.06 MB   3.31 k /some/path/dir1
--------------------------
 25.85 MB 700.00   dir10
 12.65 MB 507.00   dir20
 11.48 MB 290.00   dir30
  8.36 MB   1.58 k dir40
  7.54 MB 230.00   dir50
. . .

The dsh tool also supports "rm" commands which lets a user delete stuff as they work through the tree.

dsh requires the job launcher to forward stdin to MPI ranks, which does not work on all systems.

One could take the logic in dsh and create a command-line based tool that reads from a dwalk output file.

@gonsie
Copy link
Contributor Author

gonsie commented Apr 10, 2019

OMG this is amazing. I just tested it on a CORAL machine, and it works wonderfully. I’ll write up some documentation around this.

Some questions:

  • When does the initial dwalk get called, when I cd into some mount point? For each ls command?
  • Which tools work as expected to their bash counterparts? For example, I don’t think ls can take a directory option. Maybe we should rename the operation if it’s syntax is different.
  • Wolud rm *.core work to remove things in my CWD?
  • When does each command return to the prompt?
  • Does ls listings get updated after an rm command?

@gonsie
Copy link
Contributor Author

gonsie commented Apr 10, 2019

Other notes:

  • we should have a help command... or some documentation within this.
  • I could not get rm ./*.core to work. Maybe it’s because I’ve done something weird to my environment?

@adammoody
Copy link
Member

adammoody commented Apr 10, 2019

OMG this is amazing. I just tested it on a CORAL machine, and it works wonderfully. I’ll write up some documentation around this.

Cool! I hadn't tried on CORAL yet, so glad to hear that works.

Some questions:

  • When does the initial dwalk get called, when I cd into some mount point? For each ls command?

The initial flist that it uses is either fed in via --input like from the output of a previous dwalk, or the user can list a path to walk on the command line. It does not walk again after it gets this list, and if given via --input, it doesn't walk at all. With "ls" it filters that flist based on the "current working directory", summing up all files and sizes under the current working directory tree.

  • Which tools work as expected to their bash counterparts? For example, I don’t think ls can take a directory option. Maybe we should rename the operation if it’s syntax is different.

Yeah, that's a good idea. I would say none of the commands perfectly mimic a real shell. Also things like tab-completion and the up/down arrows don't work. Nor does shell wildcarding, or variables, or things like "cd ~/".

  • Wolud rm *.core work to remove things in my CWD?

I added a regex to "ls" but not "rm". Forget why, perhaps just testing whether I could get it to work. In "ls", the regex can be used to filter a subset of items. It uses POSIX regex rather than shell regex. That could be changed.

  • When does each command return to the prompt?

It waits for the file system operations to complete, then it returns. The "rm" command invokes mfu_flist_unlink() behind the scenes, and this output will be printed as it runs (I think).

  • Does ls listings get updated after an rm command?

Yes, I have dsh remove entries from the flist which the user has removed. One can save this updated list to a file with the --output option. To write the file, one has to exit the session cleanly with "exit".

I also found that a common use case is to take a file as input, remove some stuff, shut down go to lunch, come back and want to start up and remove more stuff. To accommodate this, I created a --file option. This option works like --input and --output combined, where it reads its input from the named file and writes output back to the same file.

@adammoody
Copy link
Member

Other notes:

  • we should have a help command... or some documentation within this.

Good idea. Right now, it just prints the commands you can run, if you try something it doesn't understand. It doesn't document the commands in any detail.

  • I could not get rm ./*.core to work. Maybe it’s because I’ve done something weird to my environment?

Yeah, that because regex is missing in rm right now. You might get something interesting with ls ./.*.core or maybe it's ls .*\.core? Note I've got .* instead of just * since it's using POSIX regex. We'll obviously want to change to shell regex here.

@gonsie
Copy link
Contributor Author

gonsie commented Apr 15, 2019

related to #50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants