started cleaning up text and adding glossary entries

DC23 · DC23 · commit 4fb83a14fdcf · 2019-11-29T15:05:57.000+11:00
diff --git a/_episodes/08-cluster.md b/_episodes/08-cluster.md
@@ -13,15 +13,15 @@ keypoints:
 - "`nohup <command> &` prevents `<command>` from exiting when you log off."
 ---
 
-Right now we have a reasonably effective pipeline that scales nicely on our local computer.
-However, for the sake of this course,
-we'll pretend that our workflow actually takes significant computational resources
-and needs to be run on a cluster.
+Right now we have a reasonably effective pipeline that scales nicely on our
+local computer. However, for the sake of this course, we'll pretend that our
+workflow actually takes significant computational resources and needs to be
+run on a [HPC cluster][ref-hpc-cluster].
 
 > ## HPC cluster architecture
 >
-> Most HPC clusters are run using a scheduler.
-> The scheduler is a piece of software that handles which compute jobs are run on which compute nodes and where.
+> Most HPC clusters are run using a [scheduler][ref-scheduler].
+> The scheduler is a piece of software that decides when a job will run, and on which nodes.
 > It allows a set of users to share a shared computing system as efficiently as possible.
 > In order to use it, users typically must write their commands to be run into a shell script
 > and then "submit" it to the scheduler.
@@ -32,20 +32,24 @@ and needs to be run on a cluster.
 > (# of students, time allotted, etc.).
 {: .callout}
 
-Normally, moving a workflow to be run by a cluster scheduler requires a lot of work.
-Batch scripts need to be written, and you'll need to monitor and babysit the status of each of your jobs.
-This is especially difficult if one batch job depends on the output from another.
-Even moving from one cluster to another (especially ones using a different scheduler)
-requires a large investment of time and effort - all the batch scripts from before need to be rewritten.
+Normally, moving a workflow to be run by a cluster scheduler requires a lot
+of work. Batch scripts need to be written, and you'll need to monitor and
+babysit the status of each of your jobs. This is especially difficult if one
+batch job depends on the output from another. Even moving from one cluster to
+another (especially ones using a different scheduler) requires a large
+investment of time and effort - all the batch scripts from before need to be
+rewritten.
 
-Snakemake does all of this for you.
-All details of running the pipeline through the cluster scheduler are handled by Snakemake -
-this includes writing batch scripts, submitting, and monitoring jobs.
-In this scenario, the role of the scheduler is limited to ensuring each Snakemake rule
-is executed with the resources it needs.
+Snakemake does all of this for you. All details of running the pipeline
+through the cluster scheduler are handled by Snakemake - this includes
+writing batch scripts, submitting, and monitoring jobs. In this scenario, the
+role of the scheduler is limited to ensuring each Snakemake rule is executed
+with the resources it needs.
 
-We'll explore how to port our example Snakemake pipeline by example
-Our current Snakefile is shown below:
+We'll explore how to port our example Snakemake pipeline by example. Our
+current Snakefile is shown below:
+
+FIXME: update to match new sample code
 
 ```python
 # our zipf analysis pipeline
@@ -263,4 +267,7 @@ In the meantime, let's dissect the command we just ran.
 > You can unlock the directory with `snakemake --unlock`.
 {: .challenge}
 
+[ref-hpc-cluster]: {{ relative_root_path }}/reference#hpc-cluster
+[ref-scheduler]: {{ relative_root_path }}/reference#scheduler
+
 {% include links.md %}
diff --git a/reference.md b/reference.md
@@ -14,8 +14,10 @@ frequent word, etc.: the rank-frequency distribution is an inverse relation
 (source: [Wikipedia][zipf]).
 
 Build File
-: A build file describes all the steps required to execute or build your code or data.
-The format of the build file depends on the build system being used. Snakemake build files are called Snakefiles, and use Python 3 as the definition language.
+: A build file describes all the steps required to execute or build your code
+or data. The format of the build file depends on the build system being used.
+Snakemake build files are called Snakefiles, and use Python 3 as the
+definition language.
 
 Dependency
 : A file that is needed to build a target. In Snakemake, dependencies are
@@ -26,8 +28,10 @@ Target
 
 Rule
 : Describes how to create outputs from inputs. Dependencies between rules are handled
-implicitly by matching filenames of inputs to outputs. A rule can also contain no inputs or outputs, in which case it simply specifies a command that can be run manually.
-Snakemake rules are composed of inputs, outputs, and an action.
+implicitly by matching filenames of inputs to outputs. A rule can also
+contain no inputs or outputs, in which case it simply specifies a command
+that can be run manually. Snakemake rules are composed of inputs, outputs,
+and an action.
 
 Default Target
 : The first rule in a Snakefile defines the *default target*. This is the target
@@ -56,7 +60,17 @@ Wildcard
 Snakemake matches each wildcard to the regular expression `.+`, although additional
 constraints can be specified. See [the documentation][docs-wildcard] for details.
 
+HPC Cluster
+: An HPC cluster is a collection of many separate servers (computers), called
+nodes, which are connected via a fast interconnect.
+
+Scheduler
+: A job scheduler is a computer application for controlling unattended
+background program execution of jobs.
+(source: [Wikipedia][wiki-scheduler]).
+
 {% include links.md %}
 
 [zipf]: https://en.wikipedia.org/wiki/Zipf%27s_law
+[wiki-scheduler]: https://en.wikipedia.org/wiki/Job_scheduler
 [docs-wildcard]: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#wildcards