Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some typos #692

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CRAMv2.1.tex
Original file line number Diff line number Diff line change
Expand Up @@ -694,7 +694,7 @@ \subsubsection*{Encoding tag values}
keys composed of the two letter tag abbreviation followed by the tag type as defined
in the SAM specification, for example `OQZ' for `OQ:Z'. The three bytes form a
big endian integer and are written as ITF8. For example, 3-byte representation
of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are intepreted as the integer 0x004F515A.
of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are interpreted as the integer 0x004F515A.
The integer is finally written as ITF8.

\begin{tabular}{|l|l|l|>{\raggedright}p{160pt}|}
Expand Down Expand Up @@ -1640,7 +1640,7 @@ \subsubsection*{BYTE\_ARRAY\_LEN }

\subsubsection*{BYTE\_ARRAY\_STOP }

Byte arrays are captured as a sequence of bytes teminated by a special stop byteFor
Byte arrays are captured as a sequence of bytes terminated by a special stop byteFor
example this could be a golomb encoding. The parameter for BYTE\_ARRAY\_STOP are
listed below:

Expand Down
2 changes: 1 addition & 1 deletion CRAMv3.tex
Original file line number Diff line number Diff line change
Expand Up @@ -850,7 +850,7 @@ \subsubsection*{Tag values}
The encodings used for different tags are stored in a map.
The key is 3 bytes formed from the BAM tag id and type code, matching the TD dictionary described above.
Unlike the Data Series Encoding Map, the key is stored in the map as an ITF8 encoded integer, constructed using $(char1<<16) + (char2<<8) + type$.
For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are intepreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}.
For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are interpreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}.

\begin{tabular}{|l|l|l|>{\raggedright}p{160pt}|}
\hline
Expand Down
2 changes: 1 addition & 1 deletion SAMtags.tex
Original file line number Diff line number Diff line change
Expand Up @@ -494,7 +494,7 @@ \subsection{Base modifications}

Following the base modification codes is a recommended but optional `{\tt .}' or `{\tt ?}' describing how skipped seq bases of the stated base type should be interpreted by downstream tools.
When this flag is `{\tt ?}' there is no information about the modification status of the skipped bases provided.
When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilites below a threshold to provide a more compact modification tag.}
When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilities below a threshold to provide a more compact modification tag.}

This is then followed by a comma separated list of how many seq bases of the stated base type to skip, stored as a delta to the last and starting with 0 as the first (or next) base, starting from the uncomplemented 5' end of the {\sf SEQ} field.
This number series is comparable to the numbers in an {\tt MD} tag,
Expand Down
8 changes: 4 additions & 4 deletions VCFv4.4.tex
Original file line number Diff line number Diff line change
Expand Up @@ -607,12 +607,12 @@ \subsubsection{Genotype fields}
\end{tabular}

\item PSO (List of integers): List of phase set ordinals.
For each phase-set name, defines the order in which variants are encountered when traversing a derivate chromosome.
For each phase-set name, defines the order in which variants are encountered when traversing a derivative chromosome.
The missing value '$.$' should be used when the corresponding PSO value is missing.
For each phase-set name, PSO should be defined if any allele with that phase-set name on any record is symbolic structural variant or in breakpoint notation.
Variants in breakpoint notation must have the same PSL and PSO on both records.

Without explicitly specifying the derivate chromosome traversal order, multiple derivate chromosome reconstructions are possible.
Without explicitly specifying the derivative chromosome traversal order, multiple derivative chromosome reconstructions are possible.
Take for example this tandem duplication in a triploid organism with SNVs (ID/QUAL/FILTER columns removed for clarity):

\vspace{0.5em}
Expand Down Expand Up @@ -829,7 +829,7 @@ \section{INFO keys used for structural variants}
\item BFB - breakage fusion bridge
\item DOUBLEMINUTE - Double minute
\end{itemize}
The sematics of other $EVENTTYPE$ values is implementation-defined.
The semantics of other $EVENTTYPE$ values is implementation-defined.
The use of $EVENT$ is not restricted to structural variation and can also be used to associate non-symbolic alleles.
Such linking is useful for scenarios such as kataegis or when there is variant position ambiguity in segmentally duplicated regions.

Expand Down Expand Up @@ -2555,7 +2555,7 @@ \subsection{Changes between VCFv4.4 and VCFv4.3}
\item Added tandem repeat support ($<$CNV:TR$>$, RN, RUS, RUL, RB, CIRB, RUC, CIRUC, RUB)
\item Redefined INFO CN as allele-specific copy number and FORMAT CN as total copy number.
\item Redefined INFO and FORMAT CN to support non-integer copy numbers.
\item Added support for phasing and derivate chromosome reconstruction in the presence of SVs (PSL, PSO, PSQ)
\item Added support for phasing and derivative chromosome reconstruction in the presence of SVs (PSL, PSO, PSQ)
\item Added SVCLAIM to disambiguate copy number based $<$DEL$>$ and $<$DUP$>$ variants from breakpoint based ones.
\item Conceptually separated variant detection and interpretation.
\item Added EVENTTYPE/EVENT to enable the multiple records encoding complex genomic rearrangements to be grouped together.
Expand Down
2 changes: 1 addition & 1 deletion crypt4gh.tex
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ \subsection{File Structure}
\draw (header packet.four split south) to (data encryption packet.north west);
\draw (header packet.five split south) to (data encryption packet.north east);
\node (data encryption packet notes) at (data encryption packet -| file notes) [notes] {
\textbf{Data Encyption Packet (plain-text)} \\
\textbf{Data Encryption Packet (plain-text)} \\
Stores $K_{data}$
};

Expand Down
2 changes: 1 addition & 1 deletion refget.md
Original file line number Diff line number Diff line change
Expand Up @@ -571,7 +571,7 @@ Key to generating reproducible checksums is the normalisation algorithm applied
- VMC
- VMC requires sequence to be a string of IUPAC codes for either nucelotide or protein sequence

Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorthim would require a new checksum identifier to be used.
Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorithm would require a new checksum identifier to be used.

### Checksum Choice

Expand Down
2 changes: 1 addition & 1 deletion test/SAMtags/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Mm and Ml auxiliary tags
========================

The purpose of these test files is to test parsing of the Mm and Ml
tags. These succint Mm and Ml tags are present in the .sam files,
tags. These succinct Mm and Ml tags are present in the .sam files,
with a more human readable expanded form in the .txt files.
Developers should check whether their implementation is able to
convert between the two forms.
Expand Down
2 changes: 1 addition & 1 deletion test/SAMtags/parse_mm.pl
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ sub rc {

my $i = 0; # I^{th} bosition in sequence
foreach my $delta (split(",", $pos)) {
# Skip $delta occurences of $base
# Skip $delta occurrences of $base
do {
$delta-- if ($base eq "N" || $base eq $seq[$i]);
$i++;
Expand Down
4 changes: 2 additions & 2 deletions test/sam/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ CIGAR
- Reads entirely consisting of insertions (no bases on ref)
- At pos 1; every base is prior to start of ref
- Neighbouring matching ops, eg 1D1D, 10M10M
- (Cicular genomes? needs more work.)
- (Circular genomes? needs more work.)
- Very large CIGAR strings (BAM has a 64K limit so tools that parse
SAM into in-memory BAM may fail).

Expand Down Expand Up @@ -403,7 +403,7 @@ Aux
- General syntax
- Other types (including case change variants of above; I, z, etc)
- Aux tag not 2 chars
- Aux tag occuring multiple times
- Aux tag occurring multiple times


Todo
Expand Down
4 changes: 2 additions & 2 deletions test/sam/compare_sam.pl
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,8 @@
# Validate MD and NM only if partialmd & 'file' set, otherwise
# discard it. Ie:
#
# 1: if file 1 has NM/MD keep in file 2, othewise discard from file2
# 2: if file 2 has NM/MD keep in file 1, othewise discard from file1
# 1: if file 1 has NM/MD keep in file 2, otherwise discard from file2
# 2: if file 2 has NM/MD keep in file 1, otherwise discard from file1
# 3: if file 1 and file 2 both have NM/MD keep, otherwise discard.
if (exists $opts{partialmd}) {
if ($opts{partialmd} & 2) {
Expand Down