samtools · cmdcolin · Dec 18, 2022 · Dec 19, 2022
diff --git a/CRAMv2.1.tex b/CRAMv2.1.tex
@@ -694,7 +694,7 @@ \subsubsection*{Encoding tag values}
 keys composed of the two letter tag abbreviation followed by the tag type as defined 
 in the SAM specification, for example `OQZ' for `OQ:Z'. The three bytes form a 
 big endian integer and are written as ITF8. For example, 3-byte representation 
-of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are intepreted as the integer 0x004F515A. 
+of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are interpreted as the integer 0x004F515A. 
 The integer is finally written as ITF8.
 
 \begin{tabular}{|l|l|l|>{\raggedright}p{160pt}|}
@@ -1640,7 +1640,7 @@ \subsubsection*{BYTE\_ARRAY\_LEN }
 
 \subsubsection*{BYTE\_ARRAY\_STOP }
 
-Byte arrays are captured as a sequence of bytes teminated by a special stop byteFor 
+Byte arrays are captured as a sequence of bytes terminated by a special stop byteFor 
 example this could be a golomb encoding. The parameter for BYTE\_ARRAY\_STOP are 
 listed below:
 

diff --git a/CRAMv3.tex b/CRAMv3.tex
@@ -850,7 +850,7 @@ \subsubsection*{Tag values}
 The encodings used for different tags are stored in a map.
 The key is 3 bytes formed from the BAM tag id and type code, matching the TD dictionary described above.
 Unlike the Data Series Encoding Map, the key is stored in the map as an ITF8 encoded integer, constructed using $(char1<<16) + (char2<<8) + type$.
-For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are intepreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}.
+For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are interpreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}.
 
 \begin{tabular}{|l|l|l|>{\raggedright}p{160pt}|}
 \hline

diff --git a/SAMtags.tex b/SAMtags.tex
@@ -494,7 +494,7 @@ \subsection{Base modifications}
 
 Following the base modification codes is a recommended but optional `{\tt .}' or `{\tt ?}' describing how skipped seq bases of the stated base type should be interpreted by downstream tools.
 When this flag is `{\tt ?}' there is no information about the modification status of the skipped bases provided.
-When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilites below a threshold to provide a more compact modification tag.}
+When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilities below a threshold to provide a more compact modification tag.}
 
 This is then followed by a comma separated list of how many seq bases of the stated base type to skip, stored as a delta to the last and starting with 0 as the first (or next) base, starting from the uncomplemented 5' end of the {\sf SEQ} field.
 This number series is comparable to the numbers in an {\tt MD} tag,

diff --git a/VCFv4.4.tex b/VCFv4.4.tex
@@ -607,12 +607,12 @@ \subsubsection{Genotype fields}
  \end{tabular}
 
  \item PSO (List of integers): List of phase set ordinals.
- For each phase-set name, defines the order in which variants are encountered when traversing a derivate chromosome.
+ For each phase-set name, defines the order in which variants are encountered when traversing a derivative chromosome.
  The missing value '$.$' should be used when the corresponding PSO value is missing.
  For each phase-set name, PSO should be defined if any allele with that phase-set name on any record is symbolic structural variant or in breakpoint notation.
  Variants in breakpoint notation must have the same PSL and PSO on both records.
 
- Without explicitly specifying the derivate chromosome traversal order, multiple derivate chromosome reconstructions are possible.
+ Without explicitly specifying the derivative chromosome traversal order, multiple derivative chromosome reconstructions are possible.
  Take for example this tandem duplication in a triploid organism with SNVs (ID/QUAL/FILTER columns removed for clarity):
 
  \vspace{0.5em}
@@ -829,7 +829,7 @@ \section{INFO keys used for structural variants}
  \item BFB - breakage fusion bridge
  \item DOUBLEMINUTE - Double minute
 \end{itemize}
-The sematics of other $EVENTTYPE$ values is implementation-defined.
+The semantics of other $EVENTTYPE$ values is implementation-defined.
 The use of $EVENT$ is not restricted to structural variation and can also be used to associate non-symbolic alleles.
 Such linking is useful for scenarios such as kataegis or when there is variant position ambiguity in segmentally duplicated regions.
 
@@ -2555,7 +2555,7 @@ \subsection{Changes between VCFv4.4 and VCFv4.3}
 \item Added tandem repeat support ($<$CNV:TR$>$, RN, RUS, RUL, RB, CIRB, RUC, CIRUC, RUB)
 \item Redefined INFO CN as allele-specific copy number and FORMAT CN as total copy number.
 \item Redefined INFO and FORMAT CN to support non-integer copy numbers.
-\item Added support for phasing and derivate chromosome reconstruction in the presence of SVs (PSL, PSO, PSQ)
+\item Added support for phasing and derivative chromosome reconstruction in the presence of SVs (PSL, PSO, PSQ)
 \item Added SVCLAIM to disambiguate copy number based $<$DEL$>$ and $<$DUP$>$ variants from breakpoint based ones.
 \item Conceptually separated variant detection and interpretation.
 \item Added EVENTTYPE/EVENT to enable the multiple records encoding complex genomic rearrangements to be grouped together.

diff --git a/crypt4gh.tex b/crypt4gh.tex
@@ -271,7 +271,7 @@ \subsection{File Structure}
 \draw (header packet.four split south) to (data encryption packet.north west);
 \draw (header packet.five split south) to (data encryption packet.north east);
 \node (data encryption packet notes) at (data encryption packet -| file notes) [notes] {
- \textbf{Data Encyption Packet (plain-text)} \\
+ \textbf{Data Encryption Packet (plain-text)} \\
  Stores $K_{data}$
 };
 

diff --git a/refget.md b/refget.md
@@ -571,7 +571,7 @@ Key to generating reproducible checksums is the normalisation algorithm applied
 - VMC
  - VMC requires sequence to be a string of IUPAC codes for either nucelotide or protein sequence
 
-Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorthim would require a new checksum identifier to be used.
+Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorithm would require a new checksum identifier to be used.
 
 ### Checksum Choice
 

diff --git a/test/SAMtags/README.md b/test/SAMtags/README.md
@@ -2,7 +2,7 @@ Mm and Ml auxiliary tags
 ========================
 
 The purpose of these test files is to test parsing of the Mm and Ml
-tags. These succint Mm and Ml tags are present in the .sam files,
+tags. These succinct Mm and Ml tags are present in the .sam files,
 with a more human readable expanded form in the .txt files.
 Developers should check whether their implementation is able to
 convert between the two forms.

diff --git a/test/SAMtags/parse_mm.pl b/test/SAMtags/parse_mm.pl
@@ -73,7 +73,7 @@ sub rc {
 
  my $i = 0; # I^{th} bosition in sequence
  foreach my $delta (split(",", $pos)) {
- # Skip $delta occurences of $base
+ # Skip $delta occurrences of $base
  do {
  $delta-- if ($base eq "N" || $base eq $seq[$i]);
  $i++;

diff --git a/test/sam/README.md b/test/sam/README.md
@@ -234,7 +234,7 @@ CIGAR
 - Reads entirely consisting of insertions (no bases on ref)
  - At pos 1; every base is prior to start of ref
 - Neighbouring matching ops, eg 1D1D, 10M10M
-- (Cicular genomes? needs more work.)
+- (Circular genomes? needs more work.)
 - Very large CIGAR strings (BAM has a 64K limit so tools that parse
  SAM into in-memory BAM may fail).
 
@@ -403,7 +403,7 @@ Aux
  - General syntax
  - Other types (including case change variants of above; I, z, etc)
  - Aux tag not 2 chars
- - Aux tag occuring multiple times
+ - Aux tag occurring multiple times
 
 
 Todo

diff --git a/test/sam/compare_sam.pl b/test/sam/compare_sam.pl
@@ -85,8 +85,8 @@
  # Validate MD and NM only if partialmd & 'file' set, otherwise
  # discard it. Ie:
  #
- # 1: if file 1 has NM/MD keep in file 2, othewise discard from file2
- # 2: if file 2 has NM/MD keep in file 1, othewise discard from file1
+ # 1: if file 1 has NM/MD keep in file 2, otherwise discard from file2
+ # 2: if file 2 has NM/MD keep in file 1, otherwise discard from file1
  # 3: if file 1 and file 2 both have NM/MD keep, otherwise discard.
  if (exists $opts{partialmd}) {
  if ($opts{partialmd} & 2) {