From 62357209ad49bf573a36729c6029d19a24dff1e2 Mon Sep 17 00:00:00 2001 From: Michael Zingale Date: Sun, 8 Dec 2024 16:16:54 -0500 Subject: [PATCH] add kronos docs --- sphinx_docs/source/olcf-kronos.rst | 48 ++++++++++++++++++++++++++++++ sphinx_docs/source/olcf.rst | 1 + 2 files changed, 49 insertions(+) create mode 100644 sphinx_docs/source/olcf-kronos.rst diff --git a/sphinx_docs/source/olcf-kronos.rst b/sphinx_docs/source/olcf-kronos.rst new file mode 100644 index 0000000..754b67a --- /dev/null +++ b/sphinx_docs/source/olcf-kronos.rst @@ -0,0 +1,48 @@ +Archiving Data on Kronos +======================== + +`Kronos `_ +is the mass storage system at OLCF. Each user has a directory of the form: + +.. code:: bash + + /nl/kronos/olcf//users/ + +and data can be transferred there using standard Unix commands. + +.. note:: + + You need to be logged into ``dtn.olcf.ornl.gov`` to access kronos. It is + not visible directly from Frontier or Andes. + +A submission / shell script pair that automates the transfer of data is available in +`workflow/job_scripts/hpss `_ as: + +* ``olcf_kronos.submit`` : the slurm submission script +* ``kronos_process.sh`` : a BASH script that finds output and automates the archiving. + +You submit the job from the directory containing the plotfiles you wish to archive. +It will then: + +* tar up the diagnostic files, inputs, and other metadata into a file with the + date-stamp in the file name and copy that to kronos + +* find all of the plotfiles and tar them directly to kronos. If the tar is successful, + it will move the plotfile into a ``plotfiles/`` subdirectory and add a ``.processed`` + file so the script knows it was archived already. + +* find the checkpoint files matching a pattern (currently defaults to every 5000 steps) + and archive those in the same fashion, moving them to a ``checkfiles/`` subdirectory + once archived. + +* loop, looking for new output files + +By default, it will not transfer the last file, in case it is actively being written to. + +.. tip:: + + The ``olcf_kronos_once.submit`` can be used to just transfer without the loop + waiting for new files. + + + diff --git a/sphinx_docs/source/olcf.rst b/sphinx_docs/source/olcf.rst index 9898da4..f7e8015 100644 --- a/sphinx_docs/source/olcf.rst +++ b/sphinx_docs/source/olcf.rst @@ -9,4 +9,5 @@ Working at OLCF olcf-compilers olcf-workflow olcf-jupyter + olcf-kronos olcf-andes