Skip to content

Commit

Permalink
Merge pull request #228 from CDCgov/add-bs-pkgs-support-ick4
Browse files Browse the repository at this point in the history
Add support for custom metadata validation & custom BioSample packages
  • Loading branch information
jessicarowell authored Dec 18, 2024
2 parents b3f11e0 + 570fe24 commit 1ffdc5f
Show file tree
Hide file tree
Showing 12 changed files with 1,204 additions and 1,573 deletions.
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ mamba install -c bioconda nextflow
### 5. Update the default submissions config file with your NCBI username and password, and run the following nextflow command to execute the scripts with default parameters and the local run environment:
```
# update this config file (you don't have to use vim)
vim bin/config_files/default_config.yaml
vim conf/submission_config.yaml
# test command for virus reads
nextflow run main.nf -profile test,<singularity|docker|conda> --virus
```
Expand All @@ -80,6 +80,20 @@ nextflow run main.nf -profile <docker|singularity> --species bacteria --submissi
```
Refer to the wiki for more information on input parameters and use cases

### 7. Custom metadata validation and custom BioSample package

TOSTADAS defaults to Pathogen.cl.1.0 (Pathogen: clinical or host-associated; version 1.0) NCBI BioSample package for submissions to the BioSample repository. You can submit using a different BioSample package by doing the following:
1. Change the package name in the `conf/submission_config.yamlsubmissions`. Choose one of the available [NCBI BioSample packages](https://www.ncbi.nlm.nih.gov/biosample/docs/packages/).
2. Add the necessary fields for your BioSample package to your input Excel file.
3. Add those fields as keys to the JSON file (`assets/custom_meta_fields/example_custom_fields.json`) and provide key info as needed.
replace_empty_with: TOSTADAS will replace any empty cells with this value (Example application: NCBI expects some value for any mandatory field, so if empty you may want to change it to "Not Provided".)
new_field_name: TOSTADAS will replace the field name in your metadata Excel file with this value. (Example application: you get weekly metadata Excel files and they specify 'animal_environment' but NCBI expects 'animal_env'; you can specify this once in the JSON file and it will changed on every run.)

**Submit to a custom BioSample package**
```
nextflow run main.nf -profile <docker|singularity> --species virus --submission --annotation --genbank true --sra true --biosample true --output_dir <path/to/output/dir/> --meta_path <path/to/metadata_file.xlsx> --submission_config <path/to/submission_config.yaml> --custom_fields_file <path/to/metadata_custom_fields.json>
```

## Get in Touch
If you need to report a bug, suggest new features, or just say “thanks”, [open an issue](https://github.com/CDCgov/tostadas/issues/new/choose) and we’ll try to get back to you as soon as possible!

Expand Down
6 changes: 0 additions & 6 deletions assets/custom_meta_fields/example_custom_fields.json
Original file line number Diff line number Diff line change
@@ -1,19 +1,13 @@
{
"test_field_1": {
"type": "String ",
"samples": ["Fl0004", "IL0005", "FL0015", "FL00234", 8],
"replace_empty_with": "not populated",
"new_field_name": "new_field_name"
},
"test_field_2": {
"type": "float",
"samples": ["Fl0004"],
"replace_empty_with": "",
"new_field_name": "new_field_name2"
},
"test_field_3": {
"type": "Boolean",
"samples": ["All ", "any random sample name"],
"replace_empty_with": "",
"new_field_name": ""
}
Expand Down
15 changes: 15 additions & 0 deletions assets/custom_meta_fields/onehealth_biosample_pkg.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"strain": {
"type": "String ",
"replace_empty_with": "Not Provided",
"new_field_name": "strain"
},
"source_type": {
"replace_empty_with": "Not Provided",
"new_field_name": "source_type"
},
"animal_environment": {
"replace_empty_with": "",
"new_field_name": "animal_env"
}
}
Binary file modified assets/metadata_template.xlsx
Binary file not shown.
Binary file not shown.
43 changes: 0 additions & 43 deletions bin/config_files/default_config.yaml

This file was deleted.

137 changes: 0 additions & 137 deletions bin/config_files/seqsender_main_config.yaml

This file was deleted.

Loading

0 comments on commit 1ffdc5f

Please sign in to comment.