Skip to content

Commit

Permalink
#642 recommended but not required meta-information structured header …
Browse files Browse the repository at this point in the history
…field order
  • Loading branch information
d-cameron committed Aug 22, 2022
1 parent 6100896 commit e1acf3f
Showing 1 changed file with 13 additions and 9 deletions.
22 changes: 13 additions & 9 deletions VCFv4.4.draft.tex
Original file line number Diff line number Diff line change
Expand Up @@ -122,18 +122,21 @@ \subsection{Meta-information lines}
\verb|##|\emph{key}\verb|=<|\emph{key}\verb|=|\emph{value}\verb|,|\emph{key}\verb|=|\emph{value}\verb|,|\emph{key}\verb|=|\emph{value}\verb|,|\ldots\verb|>|
\end{quote}
All structured lines require an ID which must be unique within their type, i.e., within all the meta-information lines with the same ``\verb|##|\emph{key}\verb|=|'' prefix.
For all of the structured lines (\verb|##INFO|, \verb|##FORMAT|, \verb|##FILTER|, etc.) described in this specification, extra fields can be included after the default fields.
For all of the structured lines (\verb|##INFO|, \verb|##FORMAT|, \verb|##FILTER|, etc.) described in this specification, optional fields can be included.
For example:
\begin{verbatim}
##INFO=<ID=ALLELEID,Number=A,Type=String,Description="Allele ID",Source="ClinVar",Version="20220804">
\end{verbatim}
In the above example, the extra fields of ``Source'' and ``Version'' are provided.
In the above example, the optional fields of ``Source'' and ``Version'' are provided.
The values of optional fields must be written as quoted strings, even for numeric values.
Other structured lines not defined by this specification may also be used; the only default field for such lines is the required \verb|ID| field.
Other structured lines not defined by this specification may also be used; the only required field for such lines is the required \verb|ID| field.

It is recommended in VCF and required in BCF that the header includes tags describing the reference and contigs backing the data contained in the file.
These tags are based on the SQ field from the SAM spec; all tags are optional (see the VCF example above).

To aid human readability, the order of fields should be ID, Number, Type, Description, then any optional fields.
Implementation must not rely on the order of the fields within structured lines and are not required to preserve field ordering.

Meta-information lines are optional, but if they are present then they must be completely well-formed.
Other than \verb|##fileformat|, they may appear in any order.
Note that BCF, the binary counterpart of VCF, requires that all entries are present.
Expand All @@ -150,7 +153,7 @@ \subsubsection{File format}


\subsubsection{Information field format}
INFO fields are described as follows (first four keys are required, source and version are recommended):
INFO meta-information lines are structured lines with require fields of ID, Number, Type, and Description, and Source and Version recommended optional fields:

\begin{verbatim}
##INFO=<ID=ID,Number=number,Type=type,Description="description",Source="source",Version="version">
Expand All @@ -177,29 +180,31 @@ \subsubsection{Information field format}
Source and Version values likewise must be surrounded by double-quotes and specify the annotation source (case-insensitive, e.g.\ \verb|"dbsnp"|) and exact version (e.g.\ \verb|"138"|), respectively for computational use.

\subsubsection{Filter field format}
FILTERs that have been applied to the data are described as follows:
FILTER meta-information lines are structured lines with require fields of ID and Description that define the possible content of the FILTER column in the VCF records:

\begin{verbatim}
##FILTER=<ID=ID,Description="description">
\end{verbatim}

\subsubsection{Individual format field format}
Genotype fields specified in the FORMAT field are described as follows:
FORMAT meta-information lines are structured lines with require fields of ID, Number, Type, and Description that define the possible content of the per-sample/genotype columns in the VCF records:

\begin{verbatim}
##FORMAT=<ID=ID,Number=number,Type=type,Description="description">
\end{verbatim}

Possible Types for FORMAT fields are: Integer, Float, Character, and String (this field is otherwise defined precisely as the INFO field).
The Number field is defined as per the INFO Number field.

\subsubsection{Alternative allele field format} \label{altfield}
Symbolic alternate alleles are described as follows:
ALT meta-information lines are structured lines with require fields of ID and Description that describe the possible symbolic alternate alleles in the ALT column of the VCF records:

\begin{verbatim}
##ALT=<ID=type,Description="description">
\end{verbatim}

\noindent \textbf{Structural Variants} \newline
In symbolic alternate alleles for imprecise structural variants, the ID field indicates the type of structural variant, and can be a colon-separated list of types and subtypes.
In symbolic alternate alleles for structural variants, the ID field indicates the type of structural variant, and can be a colon-separated list of types and subtypes.
ID values are case sensitive strings and must not contain whitespace, commas or angle brackets.
The first level type must be one of the following:
\begin{itemize}
Expand Down Expand Up @@ -232,7 +237,6 @@ \subsubsection{Alternative allele field format} \label{altfield}
##ALT=<ID=M,Description="IUPAC code M = A/C">
\end{verbatim}


\subsubsection{Assembly field format}
Breakpoint assemblies for structural variations may use an external file:
\begin{verbatim}
Expand Down

0 comments on commit e1acf3f

Please sign in to comment.