Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up AWS megatests #80

Closed
4 tasks
ewels opened this issue Nov 7, 2020 · 3 comments
Closed
4 tasks

Set up AWS megatests #80

ewels opened this issue Nov 7, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@ewels
Copy link
Member

ewels commented Nov 7, 2020

AWS megatests is now running nicely and we’re trying to set up all (most) nf-core pipelines to run a big dataset. We need to identify a set of public data to run benchmarks for the pipeline.

The idea is that this will run automatically for every release of the nf-core/hic pipeline. The results will then be publicly accessible from s3 and viewable through the website: https://nf-co.re/hic/results - this means that people can manually compare differences in output between pipeline releases if they wish.

We need a dataset that is as “normal” as possible, mouse or human, sequenced relatively recently and with a bunch of replicates etc. It can be a fairly large project

I'm hoping that @nservant can help here, but suggestions from anyone and everyone are more than welcome! ✋🏻

In practical terms, once decided we need to:

  • Upload the FastQ files to s3: s3://nf-core-awsmegatests/hic/input_data/ (I can help with this)
  • Update test_full.config to work with these file paths
  • Check .github/workflows/awsfulltest.yml (should be no changes required I think?)
  • Merge, and try running the dev branch manually
@ewels ewels added the enhancement New feature or request label Nov 7, 2020
@nservant
Copy link
Collaborator

nservant commented Jan 26, 2021

Hi @ewels

I think we could go for this dataset which are a reference in the field.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96107

However, the sequencing depth is huge ... we are talking about 2.5 Billions reads per sample.
See https://www.ncbi.nlm.nih.gov/sra?term=SRX2636666

So I would only select one of the sra file for test ...
Which sequencing depth sounds reasonnable for aws testing ?

N

@nservant
Copy link
Collaborator

@nservant
Copy link
Collaborator

nservant added a commit to nservant/nf-core-hic that referenced this issue Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants