- 13:30-14:15 Internal archiving
- What should we keep (clean up)?
- How should we manage this?
- 16:45-17:30 External archiving: submissions to public repositories
- What should we archive?
- Where should we archive? (GEO vs ArrayExpress)
- How can we improve the process?
- How to reduce the effort required?
- Monday 20th Feb 2016 @ 12:30am
- Monday 13th Feb 2017 @ 11:00am
- Keeping track of path to archive
- Add path into Redmine when archiving projects
- Report to generate from Redmine: all archived projects with their link to archive
- Readme file inside project folder as well as uploaded into Redmine
- Software file with version at bottom of all reports as well as software version of the generation of fastq
- Every project runs in a Docker container?
- What do we keep? All analysis or only the last one? Need for cleanup stage.
- Don't want to keep several copies of large files (e.g. bam)
- Internal archive to record what has been done or use existing external ones?
- Lab notebook does not always record everything
- Add extra comments when samples submitted? Maybe better at project level, metadata same as plate layout in Redmine but not always the same version as the one in LIMS
- How should we gather the metadata for each project?
- When? As early as possible
- Do we need an extra database for metadata or in spreadsheet attached to Redmine?
- Wet-lab from bench scientist
- Sequencing library from Genomics
- Analysis from Bioinformatics
- At submission when scientists should know everything.
- Generate submission spreadsheet using a script need to be more generic and robust,
- and in version control
- How to motivate Bioinformatics Core to follow good practice?
- Set up a check list for some else to verify a project status?
- i.e. checking that all the files are present and well-documented, not that the analysis is "correct"
- Store metadata locally and automatically generate the GEO submission. OR Create GEO submission at the end of every project.
- submission does not get made public automatically
- Guidance on preferences for submitting to GEO vs ArrayExpress
- ArrayExpress more stringent, able to submit BAM files
- GEO easier to import into Bioconductor (GEOquery library)?
- What would we prefer as analysts downloading someone else's data; fastq or bam?
- Some funders / projects might demand restricted access to their data
- e.g. data where patients are potentially identify-able
- might need submission to E.G.A
- How to find the fastq based on sample names?