Skip to content

Commit

Permalink
update schema #27 #65
Browse files Browse the repository at this point in the history
  • Loading branch information
cimendes committed Jan 18, 2022
1 parent f92768d commit ccc05ab
Show file tree
Hide file tree
Showing 8 changed files with 440 additions and 702 deletions.
31 changes: 0 additions & 31 deletions schema/AMR Tool Output Comparison - Labels and terms redux.csv

This file was deleted.

38 changes: 38 additions & 0 deletions schema/PHA4GE AMR Gene & Variant Specification.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
Interface Label,Required/Optional,Definition,Ontology,Value Type,Example,Guidance,Values
Analysis Software Name,Required,"A name of a computer package, application, method or function used for the analysis of data.",OBI:0002894,String,amrfinder,Not typically included in an AMR prediction output. The user may have to provide the value.,
Analysis Software Version,Required,A form or variant of software used to analyze data.,OBI:0002896,String,1.2.5,Not typically included in an AMR prediction output. The user may have to provide the value. Semantic versioning is strongly recommended.,
Gene Name,Required,"The name of a gene, (typically) assigned by a person and/or according to a naming scheme. It may contain white space characters and is typically more intuitive and readable than a gene symbol. It (typically) may be used to identify similar genes in different species and to derive a gene symbol.",OBI:0002878,String,type A-1 chloramphenicol O-acetyltransferase,The long name for the gene/protein. Gene names or gene product names are acceptable. ,
Gene Symbol,Required,The short name of a gene; a single word that does not contain white space characters. It is typically derived from the gene name.,OBI:0002877,String,"catA1, blaOXA-101","Acceptable values may represent gene symbols or gene product symbols. ""Gene symbol"" and ""allele"" (based on amino acid variations) may be used interchangeably due to nuances in resistance gene family nomenclature. Parsers should include values from the ""gene symbol"" and ""allele"" fields in this category.",
Genetic Variation Type,Required,The class of genetic variation detected.,,Enums,Gene presence detected,,"['Gene presence detected', 'Mutation variation detected', 'Both gene presence and mutation variation detected']"
Input File Name,Required,The name of a file containing molecular sequence data.,OBI:0002874,String,ERR3581801,Tools can have multiple input files (genomic/amino acid). Multiple input files will be accommodated by their corresponding parser.,
Reference Accession,Required,An identifier that specifies an individual sequence record in a public sequence repository.,OBI:0002885,String,NF000491.1,,
Reference Database Name,Required,An identifier of a biological or bioinformatics database.,OBI:0002883,String,"NCBI, ResFinder",,
Reference Database Version,Required,An identifier tracking the state of a database as it changes over time.,OBI:0002884,String,,,
Antimicrobial Agent,Optional,"A substance that kills or slows the growth of microorganisms, including bacteria, viruses, fungi and protozoans.",CHEBI:33281,String,CHLORAMPHENICOL,This should describe a specific agent. Standardized terms from the ChEBI ontology should be used. Find terms using https://www.ebi.ac.uk/ols/ontologies/chebi. Can be left blank if not provided.,
Coverage (%),Optional,The percentage of the reference sequence covered by the sequence of interest.,OBI:0002880,Float,90,"The sequences can refer to reads and genome, nucleotides and genes, or amino acids and proteins. This value should be normalized and expressed as a percentage. Do not include the ""%"" sign in the value. Either ""% Coverage (breadth)"" of ""Coverage (breadth) is required.",
Coverage (depth),Optional,The average number of reads representing a given nucleotide in the reconstructed sequence.,GENEPIO:0000092,Float,56,The value is expressed as a fold value e.g. 56x.,
Coverage (ratio),Optional,The ratio of the reference sequence covered by the sequence of interest.,OBI:0002881,String,450/500,"The sequences can refer to reads and genome, nucleotides and genes, or amino acids and proteins. This value should not be normalized and expressed as the ratio of actual positions being compared e.g. 450/500. Either ""% Coverage (breadth)"" of ""Coverage (breadth) is required.",
Drug Class,Optional,"In an antimicrobial context, a drug class is a set of antibiotic molecules, including antibiotic/adjuvant combination medications, with similar chemical structures, molecular targets, and/or modes and mechanisms of action.",ARO:3005165,String,Phenicol,Standardized terms from the ChEBI ontology should be used. Find terms using https://www.ebi.ac.uk/ols/ontologies/chebi. Can be left blank if not provided.,
Input Gene Length,Optional,The length (number of positions) of a target gene sequence submitted by a user.,OBI:0002891,Int,657,,
Input Gene Start,Optional,The position of the first nucleotide in a gene sequence being analyzed (input gene sequence).,OBI:0002977,Int,18,"The value should be biologically relevant, and should not start with 0. Can be left blank if not provided.",
Input Gene Stop,Optional,The position of the last nucleotide in a gene sequence being analyzed (input gene sequence).,OBI:0002978,Int,921,"The value should be biologically relevant, and should not start with 0. Can be left blank if not provided.",
Input Protein Length,Optional,The length (number of positions) of a protein target sequence submitted by a user.,OBI:0002892,Int,219,,
Input Protein Start,Optional,The position of the first amino acid in a protein sequence being analyzed (input protein sequence).,OBI:0002975,Int,6,"The value should be biologically relevant, and should not start with 0. Can be left blank if not provided.",
Input Protein Stop,Optional,The position of the last amino acid in a protein sequence being analyzed (input protein sequence).,OBI:0002976,Int,307,"The value should be biologically relevant, and should not start with 0. Can be left blank if not provided.",
Input Sequence ID,Optional,An identifier of molecular sequence(s) or entries from a molecular sequence database.,GENEPIO:0001800,String,DAAGAT010000041.1,Can be left blank if not provided.,
Nucleotide mutation,Optional,The nucleotide sequence change(s) detected in the sequence being analyzed compared to a reference.,,String,NC_000023.10:g.33038255C>A,"Use HGVS syntax; c. (coding) g.(genomic), or r. (ribosomal) mutations should be included here",
Nucleotide mutation interpretation,Optional,The description of the HGVS encoded nucleotide mutation(s) for clinical interpretation.,,String,C --> A at the position aligning to 33038255 in the reference genome,generated computationally from HGVS string,
Predicted Phenotype,Optional,A characteristic of an organism that is predicted rather than directly measured or observed.,,String,Expected to contribute to antimicrobial resistance,"For all AMR tools expects ""contributes to resistance""",
Predicted Phenotype Confidence Level,Optional,The level of confidence in a predicted phenotype.,,String,"Low, Medium, High","Confidence that we have that the mutation contributes to the phenotype
",
Protein mutation,Optional,The protein sequence change(s) detected in the sequence being analyzed compared to a reference.,,String,LRG_199p1:p.Trp24Cys,Use HGVS syntax; p. (protein) mutations should be included here; use 3 letter code preferred for amino acids,
Protein mutation interpretation,Optional,The description of the HGVS encoded protein mutation(s) for clinical interpretation.,,String,amino acid Trp24 is changed to a Cys,generated computationally from HGVS string,
Reference Gene Length,Optional,The length (number of positions) of a gene reference sequence retrieved from a database.,OBI:0002888,Int,657,,
Reference Gene Start,Optional,The position of the first nucleotide in a reference gene sequence (sequence being used for comparison).,OBI:0002981,Int,1,"The value should be biologically relevant, and should not start with 0. Can be left blank if not provided.",
Reference Gene Stop,Optional,The position of the last nucleotide in a reference sequence (sequence being used for comparison).,OBI:0002982,Int,900,"The value should be biologically relevant, and should not start with 0. Can be left blank if not provided.",
Reference Protein Length,Optional,The length (number of positions) of a protein reference sequence retrieved from a database.,OBI:0002889,Int,219,,
Reference Protein Start,Optional,The position of the first amino acid in a reference protein sequence (sequence being used for comparison).,OBI:0002979,Int,1,"The value should be biologically relevant, and should not start with 0. Can be left blank if not provided.",
Reference Protein Stop,Optional,The position of the last amino acid in a reference protein sequence (sequence being used for comparison).,OBI:0002980,Int,300,"The value should be biologically relevant, and should not start with 0. Can be left blank if not provided.",
Resistance mechanism,Optional,"Antibiotic resistance mechanisms evolve naturally via natural selection through random mutation, but it could also be engineered by applying an evolutionary stress on a population.",ARO:1000002,String,target alteration,Standardized terms from the ChEBI ontology should be used. Find terms using https://www.ebi.ac.uk/ols/ontologies/aro. Can be left blank if not provided.,
Strand Orientation,Optional,The orientation of a genomic element on the double-stranded molecule.,OBI:0002876,String,+,"Values should be sense or antisense, or the corresponding short form ""+"" or ""-"". The terms ""positive"" and ""negative"" should be avoided. Can be left blank if not provided.",
Sequence Identity,Optional,Sequence identity is the number (%) of matches (identical characters) in positions from an alignment of two molecular sequences.,OBI:0002882,Float,1,"The sequences can refer to reads and genome, nucelotides and genes, or amino acids and proteins. The value should be expressed as a percentage. Do not include the ""%"" sign in the value.",
Loading

0 comments on commit ccc05ab

Please sign in to comment.