Skip to content
This repository was archived by the owner on Aug 14, 2024. It is now read-only.

Commit 4fb83a1

Browse files
committed
started cleaning up text and adding glossary entries
1 parent 6344909 commit 4fb83a1

File tree

2 files changed

+43
-22
lines changed

2 files changed

+43
-22
lines changed

_episodes/08-cluster.md

Lines changed: 25 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,15 @@ keypoints:
1313
- "`nohup <command> &` prevents `<command>` from exiting when you log off."
1414
---
1515

16-
Right now we have a reasonably effective pipeline that scales nicely on our local computer.
17-
However, for the sake of this course,
18-
we'll pretend that our workflow actually takes significant computational resources
19-
and needs to be run on a cluster.
16+
Right now we have a reasonably effective pipeline that scales nicely on our
17+
local computer. However, for the sake of this course, we'll pretend that our
18+
workflow actually takes significant computational resources and needs to be
19+
run on a [HPC cluster][ref-hpc-cluster].
2020

2121
> ## HPC cluster architecture
2222
>
23-
> Most HPC clusters are run using a scheduler.
24-
> The scheduler is a piece of software that handles which compute jobs are run on which compute nodes and where.
23+
> Most HPC clusters are run using a [scheduler][ref-scheduler].
24+
> The scheduler is a piece of software that decides when a job will run, and on which nodes.
2525
> It allows a set of users to share a shared computing system as efficiently as possible.
2626
> In order to use it, users typically must write their commands to be run into a shell script
2727
> and then "submit" it to the scheduler.
@@ -32,20 +32,24 @@ and needs to be run on a cluster.
3232
> (# of students, time allotted, etc.).
3333
{: .callout}
3434

35-
Normally, moving a workflow to be run by a cluster scheduler requires a lot of work.
36-
Batch scripts need to be written, and you'll need to monitor and babysit the status of each of your jobs.
37-
This is especially difficult if one batch job depends on the output from another.
38-
Even moving from one cluster to another (especially ones using a different scheduler)
39-
requires a large investment of time and effort - all the batch scripts from before need to be rewritten.
35+
Normally, moving a workflow to be run by a cluster scheduler requires a lot
36+
of work. Batch scripts need to be written, and you'll need to monitor and
37+
babysit the status of each of your jobs. This is especially difficult if one
38+
batch job depends on the output from another. Even moving from one cluster to
39+
another (especially ones using a different scheduler) requires a large
40+
investment of time and effort - all the batch scripts from before need to be
41+
rewritten.
4042

41-
Snakemake does all of this for you.
42-
All details of running the pipeline through the cluster scheduler are handled by Snakemake -
43-
this includes writing batch scripts, submitting, and monitoring jobs.
44-
In this scenario, the role of the scheduler is limited to ensuring each Snakemake rule
45-
is executed with the resources it needs.
43+
Snakemake does all of this for you. All details of running the pipeline
44+
through the cluster scheduler are handled by Snakemake - this includes
45+
writing batch scripts, submitting, and monitoring jobs. In this scenario, the
46+
role of the scheduler is limited to ensuring each Snakemake rule is executed
47+
with the resources it needs.
4648

47-
We'll explore how to port our example Snakemake pipeline by example
48-
Our current Snakefile is shown below:
49+
We'll explore how to port our example Snakemake pipeline by example. Our
50+
current Snakefile is shown below:
51+
52+
FIXME: update to match new sample code
4953

5054
```python
5155
# our zipf analysis pipeline
@@ -263,4 +267,7 @@ In the meantime, let's dissect the command we just ran.
263267
> You can unlock the directory with `snakemake --unlock`.
264268
{: .challenge}
265269

270+
[ref-hpc-cluster]: {{ relative_root_path }}/reference#hpc-cluster
271+
[ref-scheduler]: {{ relative_root_path }}/reference#scheduler
272+
266273
{% include links.md %}

reference.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,10 @@ frequent word, etc.: the rank-frequency distribution is an inverse relation
1414
(source: [Wikipedia][zipf]).
1515

1616
Build File
17-
: A build file describes all the steps required to execute or build your code or data.
18-
The format of the build file depends on the build system being used. Snakemake build files are called Snakefiles, and use Python 3 as the definition language.
17+
: A build file describes all the steps required to execute or build your code
18+
or data. The format of the build file depends on the build system being used.
19+
Snakemake build files are called Snakefiles, and use Python 3 as the
20+
definition language.
1921

2022
Dependency
2123
: A file that is needed to build a target. In Snakemake, dependencies are
@@ -26,8 +28,10 @@ Target
2628

2729
Rule
2830
: Describes how to create outputs from inputs. Dependencies between rules are handled
29-
implicitly by matching filenames of inputs to outputs. A rule can also contain no inputs or outputs, in which case it simply specifies a command that can be run manually.
30-
Snakemake rules are composed of inputs, outputs, and an action.
31+
implicitly by matching filenames of inputs to outputs. A rule can also
32+
contain no inputs or outputs, in which case it simply specifies a command
33+
that can be run manually. Snakemake rules are composed of inputs, outputs,
34+
and an action.
3135

3236
Default Target
3337
: The first rule in a Snakefile defines the *default target*. This is the target
@@ -56,7 +60,17 @@ Wildcard
5660
Snakemake matches each wildcard to the regular expression `.+`, although additional
5761
constraints can be specified. See [the documentation][docs-wildcard] for details.
5862

63+
HPC Cluster
64+
: An HPC cluster is a collection of many separate servers (computers), called
65+
nodes, which are connected via a fast interconnect.
66+
67+
Scheduler
68+
: A job scheduler is a computer application for controlling unattended
69+
background program execution of jobs.
70+
(source: [Wikipedia][wiki-scheduler]).
71+
5972
{% include links.md %}
6073

6174
[zipf]: https://en.wikipedia.org/wiki/Zipf%27s_law
75+
[wiki-scheduler]: https://en.wikipedia.org/wiki/Job_scheduler
6276
[docs-wildcard]: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#wildcards

0 commit comments

Comments
 (0)