-
Notifications
You must be signed in to change notification settings - Fork 38
Add docs for GPU saturation tool #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
To sync with ethz
preview available: https://docs.tds.cscs.ch/241 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
I have some suggested changes, and I have tried to add some extra information that might have been missing in earlier reviews.
ENV DEBIAN_FRONTEND=noninteractive | ||
|
||
RUN apt-get update \ | ||
&& apt-get install -y wget rsync rclone vim git htop nvtop nano \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is nano
needed here?
``` | ||
As you can see from the above example, gssr can easily be installed with a `RUN pip install gssr` command. | ||
|
||
Once your `ContainerFile` is ready, you can build it on any Alps platforms with the following commands to create a container with label `mycontainer`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs have a guide on how to build containers on Alps, that you could like to.
For more information about building containers on Alps, see our [Podman guide][ref-building-containers].
|
||
## Create CSCS configuration for Container | ||
|
||
The next step is to tell CSCS container engine solution where your container is and how you would like to run it. To do so, you will have to create a`{label}.toml` file in your `$HOME/.edf` directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the existing documentation for the EDF file format, to make your life easier.
Find sections to link to here: https://docs.cscs.ch/software/container-engine/
* [Quickstart Guide][ref-gssr-quickstart] | ||
* [Container Guide][ref-gssr-containers] | ||
|
||
This tool will produce time-series and heatmaps of the profiled metric values. Here is an example of one set of plots generated by the tool from the application Megatron-LLM from EPFL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guidance on including images has been updated:
https://docs.cscs.ch/contributing/#screenshots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too follow up - the images are attractive and suggest that the tool is capable of providing diverse feedback.
Maybe you could add a brief documentation about the type of feedback provided, and use the images to illustrate this?
@cerlane what's the status of this PR? Do you think you'll have time to implement some of the suggested changes or would you like any help in getting it merged? |
Co-authored-by: Ben Cumming <bcumming@cscs.ch>
Co-authored-by: Ben Cumming <bcumming@cscs.ch>
Co-authored-by: Ben Cumming <bcumming@cscs.ch>
preview available: https://docs.tds.cscs.ch/241 |
Start again using a branch from #231