Skip to content

Latest commit

 

History

History
168 lines (90 loc) · 4.45 KB

appendix.rst

File metadata and controls

168 lines (90 loc) · 4.45 KB

Appendix

Query operators

Various query forms have operators for use with field values. Available operators are:

  • =
    • Exact match (case-insensitive).
  • contains
    • Match to a partial string (case-insensitive), e.g. searching for clonal complex 'contains' st-11 would return all STs belonging to the ST-11 complex.
  • starts with
    • Match to values that start with the search term (case-insensitive).
  • ends with
    • Match to values that end with the search term (case-sensitive).
  • >
    • Greater than the search term.
  • >=
    • Greater than or equal the search term.
  • <
    • Less than the search term.
  • <=
    • Less than or equal the search term.
  • NOT
    • Match to values that do not equal the search term (case-insensitive).
  • NOT contain
    • Match to values that do not contain the search term (case-insensitive).

Sequence tag flags

Sequences tagged in the sequence bin can have features indicated by specific flags. The presence of these flags can be queried. These are a superset of :ref:`flags available for allele sequences <allele_sequence_flags>`. Available flags are:

  • alternative start codon
    • A start codon other than ATG, GTG, or TTG is used. This can be the case with some yeasts.
  • ambiguous read
    • Genome sequence contains ambiguous nucleotides in coding sequence.
  • apparent misassembly
    • Sequence has a region of very high identity to existing allele in one region but looks completely different in another.
  • atypical
    • Catch-all term for a sequence that is unusual compared to other alleles of locus.
  • contains IS element
    • Coding sequence is interrupted by insertion sequence.
  • downstream fusion
    • No stop codon present resulting in translation continuing.
  • frameshift
    • Frameshift in sequence relative to other alleles, not resulting in internal stop codon.
  • indel
    • Insertion/deletion in sequence that is uncommon compared to other alleles.
  • internal stop codon
    • Frameshift in sequence relative to other alleles, resulting in internal stop codon.
  • no start codon
    • No apparent start codon in immediate vicinity of usual start.
  • no stop codon
    • No stop codon in immediate vicinity of usual stop.
  • phase variable: off
    • Coding sequence has a homopolymeric run with a frameshift resulting in a stop codon preventing complete translation.
  • truncated
    • Coding sequence is unusually short resulting in a truncated protein (not the same as running off the end of a contig).
  • upstream fusion
    • No apparent start codon in immediate vicinity of usual start, likely due to a gene fusion (sequence is transcribed together with upstream coding sequence).

Allele sequence flags

Sequences can be flagged with specific attributes - these are searchable when doing a sequence attribute query. These are a subset of :ref:`flags available for tagged sequences <sequence_tag_flags>`. These are mainly for use with whole genome MLST type data. Multiple flags can be selected by Ctrl-clicking the list. Available flags are:

  • alternative start codon
    • A start codon other than ATG, GTG, or TTG is used. This can be the case with some yeasts.
  • atypical
    • Catch-all term for a sequence that is unusual compared to other alleles of locus.
  • contains IS element
    • Coding sequence is interrupted by insertion sequence.
  • downstream fusion
    • No stop codon present resulting in translation continuing.
  • frameshift
    • Frameshift in sequence relative to other alleles, not resulting in internal stop codon.
  • indel
    • Insertion/deletion in sequence that is uncommon compared to other alleles.
  • internal stop codon
    • Frameshift in sequence relative to other alleles, resulting in internal stop codon.
  • no start codon
    • No apparent start codon in immediate vicinity of usual start.
  • no stop codon
    • No stop codon in immediate vicinity of usual stop.
  • phase variable: off
    • Coding sequence has a homopolymeric run with a frameshift resulting in a stop codon preventing complete translation.
  • truncated
    • Coding sequence is unusually short resulting in a truncated protein (not the same as running off the end of a contig).
  • upstream fusion
    • No apparent start codon in immediate vicinity of usual start, likely due to a gene fusion (sequence is transcribed together with upstream coding sequence).