diff --git a/CRAMv2.1.tex b/CRAMv2.1.tex index b6ebcac33..e4bf97f0b 100644 --- a/CRAMv2.1.tex +++ b/CRAMv2.1.tex @@ -694,7 +694,7 @@ \subsubsection*{Encoding tag values} keys composed of the two letter tag abbreviation followed by the tag type as defined in the SAM specification, for example `OQZ' for `OQ:Z'. The three bytes form a big endian integer and are written as ITF8. For example, 3-byte representation -of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are intepreted as the integer 0x004F515A. +of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are interpreted as the integer 0x004F515A. The integer is finally written as ITF8. \begin{tabular}{|l|l|l|>{\raggedright}p{160pt}|} @@ -1640,7 +1640,7 @@ \subsubsection*{BYTE\_ARRAY\_LEN } \subsubsection*{BYTE\_ARRAY\_STOP } -Byte arrays are captured as a sequence of bytes teminated by a special stop byteFor +Byte arrays are captured as a sequence of bytes terminated by a special stop byteFor example this could be a golomb encoding. The parameter for BYTE\_ARRAY\_STOP are listed below: diff --git a/CRAMv3.tex b/CRAMv3.tex index 8d833e098..3f61db181 100644 --- a/CRAMv3.tex +++ b/CRAMv3.tex @@ -850,7 +850,7 @@ \subsubsection*{Tag values} The encodings used for different tags are stored in a map. The key is 3 bytes formed from the BAM tag id and type code, matching the TD dictionary described above. Unlike the Data Series Encoding Map, the key is stored in the map as an ITF8 encoded integer, constructed using $(char1<<16) + (char2<<8) + type$. -For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are intepreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}. +For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are interpreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}. \begin{tabular}{|l|l|l|>{\raggedright}p{160pt}|} \hline diff --git a/SAMtags.tex b/SAMtags.tex index 7e50de58c..e19ec290b 100644 --- a/SAMtags.tex +++ b/SAMtags.tex @@ -494,7 +494,7 @@ \subsection{Base modifications} Following the base modification codes is a recommended but optional `{\tt .}' or `{\tt ?}' describing how skipped seq bases of the stated base type should be interpreted by downstream tools. When this flag is `{\tt ?}' there is no information about the modification status of the skipped bases provided. -When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilites below a threshold to provide a more compact modification tag.} +When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilities below a threshold to provide a more compact modification tag.} This is then followed by a comma separated list of how many seq bases of the stated base type to skip, stored as a delta to the last and starting with 0 as the first (or next) base, starting from the uncomplemented 5' end of the {\sf SEQ} field. This number series is comparable to the numbers in an {\tt MD} tag, diff --git a/VCFv4.4.tex b/VCFv4.4.tex index 055d5b9f9..b435e0e6d 100644 --- a/VCFv4.4.tex +++ b/VCFv4.4.tex @@ -607,12 +607,12 @@ \subsubsection{Genotype fields} \end{tabular} \item PSO (List of integers): List of phase set ordinals. - For each phase-set name, defines the order in which variants are encountered when traversing a derivate chromosome. + For each phase-set name, defines the order in which variants are encountered when traversing a derivative chromosome. The missing value '$.$' should be used when the corresponding PSO value is missing. For each phase-set name, PSO should be defined if any allele with that phase-set name on any record is symbolic structural variant or in breakpoint notation. Variants in breakpoint notation must have the same PSL and PSO on both records. - Without explicitly specifying the derivate chromosome traversal order, multiple derivate chromosome reconstructions are possible. + Without explicitly specifying the derivative chromosome traversal order, multiple derivative chromosome reconstructions are possible. Take for example this tandem duplication in a triploid organism with SNVs (ID/QUAL/FILTER columns removed for clarity): \vspace{0.5em} @@ -829,7 +829,7 @@ \section{INFO keys used for structural variants} \item BFB - breakage fusion bridge \item DOUBLEMINUTE - Double minute \end{itemize} -The sematics of other $EVENTTYPE$ values is implementation-defined. +The semantics of other $EVENTTYPE$ values is implementation-defined. The use of $EVENT$ is not restricted to structural variation and can also be used to associate non-symbolic alleles. Such linking is useful for scenarios such as kataegis or when there is variant position ambiguity in segmentally duplicated regions. @@ -2555,7 +2555,7 @@ \subsection{Changes between VCFv4.4 and VCFv4.3} \item Added tandem repeat support ($<$CNV:TR$>$, RN, RUS, RUL, RB, CIRB, RUC, CIRUC, RUB) \item Redefined INFO CN as allele-specific copy number and FORMAT CN as total copy number. \item Redefined INFO and FORMAT CN to support non-integer copy numbers. -\item Added support for phasing and derivate chromosome reconstruction in the presence of SVs (PSL, PSO, PSQ) +\item Added support for phasing and derivative chromosome reconstruction in the presence of SVs (PSL, PSO, PSQ) \item Added SVCLAIM to disambiguate copy number based $<$DEL$>$ and $<$DUP$>$ variants from breakpoint based ones. \item Conceptually separated variant detection and interpretation. \item Added EVENTTYPE/EVENT to enable the multiple records encoding complex genomic rearrangements to be grouped together. diff --git a/crypt4gh.tex b/crypt4gh.tex index 8d40a7ee8..0655a906d 100644 --- a/crypt4gh.tex +++ b/crypt4gh.tex @@ -271,7 +271,7 @@ \subsection{File Structure} \draw (header packet.four split south) to (data encryption packet.north west); \draw (header packet.five split south) to (data encryption packet.north east); \node (data encryption packet notes) at (data encryption packet -| file notes) [notes] { - \textbf{Data Encyption Packet (plain-text)} \\ + \textbf{Data Encryption Packet (plain-text)} \\ Stores $K_{data}$ }; diff --git a/refget.md b/refget.md index 180df2655..f6e9a2d48 100644 --- a/refget.md +++ b/refget.md @@ -571,7 +571,7 @@ Key to generating reproducible checksums is the normalisation algorithm applied - VMC - VMC requires sequence to be a string of IUPAC codes for either nucelotide or protein sequence -Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorthim would require a new checksum identifier to be used. +Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorithm would require a new checksum identifier to be used. ### Checksum Choice diff --git a/test/SAMtags/README.md b/test/SAMtags/README.md index d268a6cbd..ababb4e47 100644 --- a/test/SAMtags/README.md +++ b/test/SAMtags/README.md @@ -2,7 +2,7 @@ Mm and Ml auxiliary tags ======================== The purpose of these test files is to test parsing of the Mm and Ml -tags. These succint Mm and Ml tags are present in the .sam files, +tags. These succinct Mm and Ml tags are present in the .sam files, with a more human readable expanded form in the .txt files. Developers should check whether their implementation is able to convert between the two forms. diff --git a/test/SAMtags/parse_mm.pl b/test/SAMtags/parse_mm.pl index b8cd21928..b723533c3 100755 --- a/test/SAMtags/parse_mm.pl +++ b/test/SAMtags/parse_mm.pl @@ -73,7 +73,7 @@ sub rc { my $i = 0; # I^{th} bosition in sequence foreach my $delta (split(",", $pos)) { - # Skip $delta occurences of $base + # Skip $delta occurrences of $base do { $delta-- if ($base eq "N" || $base eq $seq[$i]); $i++; diff --git a/test/sam/README.md b/test/sam/README.md index fe9624d61..73b4a79d0 100644 --- a/test/sam/README.md +++ b/test/sam/README.md @@ -234,7 +234,7 @@ CIGAR - Reads entirely consisting of insertions (no bases on ref) - At pos 1; every base is prior to start of ref - Neighbouring matching ops, eg 1D1D, 10M10M -- (Cicular genomes? needs more work.) +- (Circular genomes? needs more work.) - Very large CIGAR strings (BAM has a 64K limit so tools that parse SAM into in-memory BAM may fail). @@ -403,7 +403,7 @@ Aux - General syntax - Other types (including case change variants of above; I, z, etc) - Aux tag not 2 chars - - Aux tag occuring multiple times + - Aux tag occurring multiple times Todo diff --git a/test/sam/compare_sam.pl b/test/sam/compare_sam.pl index d89e315a6..23b67d3a4 100755 --- a/test/sam/compare_sam.pl +++ b/test/sam/compare_sam.pl @@ -85,8 +85,8 @@ # Validate MD and NM only if partialmd & 'file' set, otherwise # discard it. Ie: # - # 1: if file 1 has NM/MD keep in file 2, othewise discard from file2 - # 2: if file 2 has NM/MD keep in file 1, othewise discard from file1 + # 1: if file 1 has NM/MD keep in file 2, otherwise discard from file2 + # 2: if file 2 has NM/MD keep in file 1, otherwise discard from file1 # 3: if file 1 and file 2 both have NM/MD keep, otherwise discard. if (exists $opts{partialmd}) { if ($opts{partialmd} & 2) {