-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Dan Fornika edited this page Oct 14, 2018
·
2 revisions
- Samples collected from marine environments around the world
- Water sample collected
- Filtered to isolate viral component and bacteria (+viruses) component
- Selected for dsDNA viruses only
- Fractions underwent shotgun metagenomic sequencing
- Assembled into contigs
- Generated RPKM values
- Compared to metabolomics databases (MetaCyc, COG, KEGG, 3 more) using FAST - combined the rpkm values for all enzymes in pathway with set of rules (specific to each pathway)
- Output was RPKM value for enriched pathways for every sample
- EBI Link: https://www.ebi.ac.uk/metagenomics/studies/ERP001736
- Metadata Link: https://www.ebi.ac.uk/ena/submit/tara-oceans-checklist
- RPKM - After the contigs are assembled, the reads were aligned back to the contigs - this was used to generate RPKM (essentially a normalized genomic abundance metric - reads per kilobase of transcript per million mapped reads)
- Sample IDs (
df_MASTERTABLE
SAMPLE
field):- Sample IDs that start with 'c' are include both bacterial and viral
- Sample IDs that start with 'ERR' are from samples that were passed through a finer filter, so are viral only
- Type (
df_MASTERTABLE
TYPE
field)-
SINGLE
includes only one fraction -
MULTI
includes the viral and bacterial fraction data in a single analysis
-
Collection Date information exists - Simon Rao
- See comment below for more details
- It's okay to make the data public
- Tara Oceans Project - project that underwent the sampling expedition: http://ocean-microbiome.embl.de/companion.html
- International consortium of oceanography/marine biologists - made a standardized sample collection process (data is comparable)
- First expedition - photic samples - didn’t sample very deep
- PathwayTools - prediction engine - need licence - made MetaCyc identifiers from this
- PathoLogic - has harmonized names
- New idea - metabolically functional genes encoded in viruses - more widespread than imagined before
- Talked about this paper: http://www.pnas.org/content/108/39/E757.short
- Cyanobacteria normally have fast turn-over - slow down and halt photosynthesis in response to viruses (sequester them and protect neighbouring cells) - virus carries genes that are part of the photosystem - overcomes the defence mechanism and promotes photosynthesis, cellular division
- Pathway tools - KEGG Atlas - have diagrams for metabolism - recommended using these
- Envisions this turning into a manuscript - Nature Scientific Data publication
- Heatmap with distribution of pathways good starting point (something similar to KEGG atlas ideal though)
- Want to be able to do things like compare samples in Indian Ocian to x Ocean
- Pathways by location heatmap
- Metaviriome - attracted to certain pathways - want to visualize the pathways that are affected
MetaCyc Notes
- Reference database of enzymes and metabolic pathways
- Mostly small molecule pathways (but updated versions add macromolecular metabolic pathways)
- Tool PathoLogic uses to predict metabolic networks of organism with annotated genome files - generates Pathway/Genome databases - BioCyc stores the databases generated by SRI
- Used to generate organism-specific pathway/genome databases
- Curated from experimentally validated results/academic papers
- Map
- Data points are plotted to the map with latitude and longitude values
- Want to be able to query by location, depth, other metadata (temperature, salinity, etc)
- Clicking on a sample should pull up data on sample information, pathway information, etc. (Want some figures to make data visual - likely by metabolic category) and link out to MetaCyc information
- Analysis Functionality
- Want differential comparison of metabolic pathway activation for samples with given set of characteristics
- Want to have a way to filter out pathways that are generally present everywhere
- Interactive KEGG Atlas-like Visualization (if time permits)