Serratus is an Open-Science project. Our aim is to unlock all of Earth's viruses in public data, freely and with 100% transparency.
We welcome all scientists and developers to contribute.
Our data is free and available for everyone. If you require assistance with accessing or using Serratus data, please let use know we would love to help. We are set up and capable to run assemblies and retrieve full-length viruses where available. We ask for nothing in return.
Below are a few outlines of things which need development. Skill-sets listed are all optional, the most important trait is to be a "self-starter" and passionate to learn. We're all volunteers and will help one another to get there.
The Serratus
command line interface is currently a mixture of scripts, jupyter notebooks and tweaking terraform configuration files. The goal is to create a stream-lined command-line interface for running Serratus
and writing the associated documentation. The outcome will be a readily deployable workflow.
Skills: BASH scripting, written communication, AWS systems, Terraform deployment.
Our web interface: www.serratus.io , is under active development. We are looking for new and intereting ways to display and make sense of this ocean of data. In particular we're developing an "RdRP" focused characterization tool called palmID
.
Skills: JavaScript, R, Docker, SQL, UX Design.
The Serratus
data spans all of Earth, all domains of life, and thousands of ecological niches. We are trying to make sense of the global distribution of RNA viruses using a "Data-Driven" philosophy (unsupervised). The goal is to compile available virus-host data and use it for the computational annotation of novel sequences. You can get an idea of how this will work roughly in the palmID
web interface
Skills: SQL/Graph databases, Virus-host modeling, Ecoinformatics, R, Machine Learning.
There are thounsands of novel viruses and virus families uncovered through our RdRP search. If you would like to focus on one particular group of viruses, characterize them and make sense of the data, we can help :)
Skills: Virology domain-knowledge, phylogenetics, bioinformatics, writing.
At the heart of Serratus
is finding ways to optimize sequence-search at a massive scale. If you're keen to get into the "guts" of a bunch of software and look for ways to speed things up, this is always a goal.
Skills: C/C++, sequence search, AWS systems, optimization, wizardry.
Everything we didn't think of. If you have a cool idea and think it can fit in, we're happy to hear it and work together.
Skim this document and email us to say hello and join our Slack!
- Development is done through
git
. - Data is hosted on a public AWS S3 bucket.
- Deployment of Serratus resources can be done on a personal AWS account.
- Experiments are run using Jupyter Notebooks.
- Contact the team: join our Slack: (type
/join #serratus
) or emailartem AT rRNA DOT ca
To find and solve an open development problem see our Project Page. This is a prioritized list of "Open Tasks" that need to be done, "Tasks in Progress" which are currently being worked on by others, "Code Review" and "Completed Tasks".
Also you can browse all tasks which are organized as "Issues" on github.
Feel free to comment on any issue, even those you're not assigned to if you have a helpful suggestion.
If you'd like to work on a given task, simply add a comment saying this to the issue and it will be "Assigned" to you.
If you have an idea you'd like to develop, would like to run an experiment or require additional documentation, let the other developers know what you're doing by creating an "Issue" on Github. The general template to include initially is:
### Problem / Objective
< Briefly outline a problem you are solving / the research objectives and hypothesis you are testing >
### Proposed Solution / Methods
< How are you planning on solving the problem / experimental design to test the hypothesis >
### Additional Resources
< Outline any additional information you require to do this task or resources you'll need access to >
There is no formal structure to the Serratus team, everyone is encouraged to take full ownership of the components they develop. For publications authorship is determined by the ICMJE Guidelines. Specific emphasis is placed on that authors must be directly involved in the collaboration.
To achieve our objective of providing high quality CoV sequence data to the global research effort, Serratus ensures:
- All software development is open-source and freely available (GPLv3)
- We adopt the INSDC Release Policy (2002) for Serratus data, databases and derivative analyses
- All data generated, raw and processed, will be freely available in the public domain in accordance with the Bermuda Principles of the Human Genome Project.