A searchable and exportable tab-separated table of the following metadata is now available.
🔹 italics = potential considerations
metadata | definition | reference of definition[<url_to_definition>] | expected unit of measurement | example | sources (where this or similar matadata field is mentioned) | |
---|---|---|---|---|---|---|
samp_name | A local identifier or name that for the material sample used for extracting nucleic acids, and subsequent sequencing. It can refer either to the original material collected or to any derived sub-samples. It can have any format, but we suggest that you make it concise, unique and consistent within your lab, and as informative as possible. INSDC requires every sample name from a single Submitter to be unique. | MIXS:0001107 | free text with identifier | e.g. ISDsoil1 | GSC MIxS/MIGS Bacteria (“GSC MIXS: MIGSBacteria”) | |
seq_meth | Sequencing machine used. Where possible the term should be taken from the OBI list of DNA sequencers (http://purl.obolibrary.org/obo/OBI_0400103) | MIXS:0000050 | <name_of_seq_machine>[ontology] | e.g. 454 Genome Sequencer FLX [OBI:0000702] | GSC MIxS/MIGS Bacteria (“GSC MIXS: MIGSBacteria”), ENA Metadata Validation: Instrument (“ENA Metadata Validation: Instrument”) | |
lib_layout | Specify whether to expect single, paired, or other configuration of reads | MIXS:0000041 | free text string | e.g. single-end | GSC MIxS/MIGS Bacteria (“GSC MIXS: MIGSBacteria”) | |
lib_source | The lib_source specifies the type of source material that is being sequenced | Link to permitted values | Free text from selected list of values | e.g. GENOMIC, METAGENOMIC, TRANSCRIPTOMIC, etc. | ENA Metadata Validation: Source (“ENA Metadata Validation: Source”) | |
lib_strategy | Sequencing technique intended for this library | Link to permitted values | Free text from selected list of values | e.g. WGS, WGA, Amplicon, etc. | ENA Metadata Validation: Strategy (“ENA Metadata Validation: Strategy”) | |
lib_selection | Whether any method was used to select and/or enrich the material being sequenced | Link to permitted values | Free text from selected list of values | e.g. RANDOM, PCR, cDNA_oligo_dT etc. | ENA Metadata Validation: Selection (“ENA Metadata Validation: Selection”) | |
nucl_acid_ext | A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample | MIXS:0000037 | PMID, DOI, URL | e.g. https://doi.org/10.1038/s41586-020-2192-1 | GSC MISAG [gsc_migs_bacteria]The FAIR Cookbook (Rocca-Serra et al. 2022) | |
nucl_acid_amp | A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids | MIXS:0000038 | PMID, DOI, URL | e.g. https://phylogenomics.me/protocols/16s-pcr-protocol/ | GSC MISAG [gsc_migs_bacteria] | |
spots | Number of reads in the library (sequencing depth), assigned at submission | Link to submission of genomes | integer value + unit of measurement | e.g. 32,283,453 OR 32.3M | Adapted from NCBI-SRA (Leinonen et al. 2011) | |
bases | Number of base pairs (nucleotides) in the library, assigned at submission | Link to submission of genomes | integer value + unit of measurement | e.g. 6,400,000 or 6.4M | Adapted from NCBI-SRA (Leinonen et al. 2011) | |
average_length | As bases divided by spots, calculated by user | Link to use in databse | Integer value calculated by tool | e.g. 198 | Calculated as basepairs_count/sequence_count | |
sequence_count_qual (optional) | Number of reads in the library (sequencing depth) after quality filtering | Link to resource to calculate | integer value + % | SRA-Tinder (NCBI Hackathons) | ||
basepairs_count_qual (optional) | Number of base pairs (nucleotides) in the library after quality filtering | Link to resource to calculate | integer value + % | SRA-Tinder (NCBI Hackathons) | ||
checksum | Hash value for data integrity | Link to ENA MD5 Checksum | string with checksum | e.g. MD5: cbc41d0e49636872a765b950cb7f410a | Data transfer and data integrity |
metadata | definition | reference of definition[<url_to_definition>] | expected unit of measurement | example | sources (where this or similar matadata field is mentioned) | |
---|---|---|---|---|---|---|
run_ref | Comma separated list of run accession(s) (optional) | Link to reference | run_accession in the format SRR, ERR or DRR | e.g. RUN_REF accession = “ERR178314” | ENA Submitting Transcriptome Assemblies (“ENA Submitting Transcriptome Assemblies”) | |
Locus | Physical location of a gene on a chromosome (assigned at submission) | Link to example of use | free text string | e.g. “GAAA01000000 93507 rc mRNA”, “11q1.4-q2.1” | Adapted from ENA Submitting Transcriptome Assemblies (“ENA Submitting Transcriptome Assemblies”) | |
Description | Free text description of the transcriptome assembly (optional) | Link to reference | Free text string | e.g. TSA: Latimeria chalumnae voucher 08118, transcriptome shotgun assembly | Adapted from ENA Submitting Transcriptome Assemblies (“ENA Submitting Transcriptome Assemblies”) | |
DB Link | BioProject/Biosample/SRA assigned at submission | Link to study submission | PID | e.g. PRJNA1035062 | Adapted from ENA Submitting Transcriptome Assemblies (“ENA Submitting Transcriptome Assemblies”) | |
read_length/spot_lenght | Number of nucleotides in a transcript. Sequences should be greater than 200 bp in length. | Link to reference | integer value | e.g. 36 | Adapted from ENA Submitting Transcriptome Assemblies (“ENA Submitting Transcriptome Assemblies”) | |
Seq quality | The accuracy and reliability of the nucleotide sequence information obtained from RNA sequencing | Link to reference | integer value | Adapted from ENA Submitting Transcriptome Assemblies (“ENA Submitting Transcriptome Assemblies”) |
The publications describing the reasons for formation of The minimum information about a genome sequence (MIGS) (Field et al. 2008), Advances and Challenges in Metatranscriptomic Analysis (Shakya et al. 2019) can be found online. We also highly encourage the readers of this GitHub to read The FAIR cookbook recipes (Rocca-Serra et al. 2022), and look for examples of submitted Transcriptomic samples in Transcriptome Shotgun Assembly Sequence Database (“Transcriptome Shotgun Assembly Sequence Database”).
“ENA Metadata Validation: Instrument.” https://ena-docs.readthedocs.io/en/latest/submit/reads/webin-cli.html#instrument.
“ENA Metadata Validation: Selection.” https://ena-docs.readthedocs.io/en/latest/submit/reads/webin-cli.html#selection.
“ENA Metadata Validation: Source.” https://ena-docs.readthedocs.io/en/latest/submit/reads/webin-cli.html#source.
“ENA Metadata Validation: Strategy.” https://ena-docs.readthedocs.io/en/latest/submit/reads/webin-cli.html#strategy.
“ENA Submitting Transcriptome Assemblies.” https://ena-docs.readthedocs.io/en/latest/submit/assembly/transcriptome.html.
Field, D., G. Garrity, T. Gray, N. Morrison, J. Selengut, P. Sterk, T. Tatusova, et al. 2008. “The Minimum Information about a Genome Sequence (MIGS) Specification.” Nature Biotechnology. 2008. https://doi.org/10.1038/nbt1360.
“GSC MIXS: MIGSBacteria.” https://genomicsstandardsconsortium.github.io/mixs/0010003/.
Leinonen, R., H. Sugawara, M. Shumway, and International Nucleotide Sequence Database Collaboration. 2011. “The Sequence Read Archive.” Nucleic Acids Research 39 (Database issue): D19–21. https://doi.org/10.1093/nar/gkq1019.
NCBI Hackathons. “SRA-Tinder: A Tool to Discover Related Sequence Read Archive (SRA) Experiments.” https://github.com/NCBI-Hackathons/SRA_Tinder.
Rocca-Serra, Philippe, Alasdair J G Gray, Alejandra Delfin Rossaro, Andrea Splendiani, Andrea Zaliani, Andreas Pippow, Anne Cambon-Thomsen, et al. 2022. “The FAIR Cookbook.” https://github.com/FAIRplus/the-fair-cookbook/.
Shakya, Migun, Christopher Quince, James H Campbell, and Zamin K Yang. 2019. “Advances and Challenges in Metatranscriptomic Analysis.” Frontiers in Genetics 10 (September): 904. https://doi.org/10.3389/fgene.2019.00904.
“Transcriptome Shotgun Assembly Sequence Database.” https://www.ncbi.nlm.nih.gov/genbank/tsa/.