Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom metadata validation & custom BioSample packages #228

Merged
merged 20 commits into from
Dec 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
40b0777
updates to add custom fields to Attributes in BioSample submission.xm…
Nov 21, 2024
c45928a
some corrections to metadata validation to get the custom fields to a…
Nov 21, 2024
e82b8f3
ignore the test fields in case they are accidentally submitte as cust…
Nov 21, 2024
cac359b
change sex to host_sex and age to host_age in metadata_template.xlsx
Nov 21, 2024
5e0b515
fixing custom metadata functionality
Dec 4, 2024
57e3cde
Remove extraneous Unnamed column after import
Dec 4, 2024
be95743
Rewrite the custom fields script to correct field name changes
Dec 6, 2024
1527a5d
Fix code to skip submissions where reqd files not provided
Dec 6, 2024
d6ff671
correction to update_submission main.nf to match new optional params
Dec 6, 2024
8ff83df
ignore JSON keys that are not used and fix error file write location
Dec 6, 2024
7ec624d
don't print org_id if not provided
Dec 9, 2024
51d4ea3
Update README to add section for custom metadata and update JSON
Dec 9, 2024
64fb7f2
move example submission config file to conf/ folder and remove bin/co…
Dec 9, 2024
1670d79
add example templates for One Health Enterics BioSample package
Dec 9, 2024
9c29a38
add better handling for JSON Vs. Excel file discrepancies
Dec 16, 2024
f36ce41
add error handling for empty input metadata spreadsheets
Dec 17, 2024
d3ffe27
correction to code that checks metadata file custom fields against th…
Dec 17, 2024
62e8c03
Merge branch 'dev' into add-bs-pkgs-support-ick4
Dec 18, 2024
d439b85
Check for duplicate columns, read_excel automatically makes them uniq…
Dec 18, 2024
570fe24
Add better error logging for the connect function which sometimnes er…
Dec 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ mamba install -c bioconda nextflow
### 5. Update the default submissions config file with your NCBI username and password, and run the following nextflow command to execute the scripts with default parameters and the local run environment:
```
# update this config file (you don't have to use vim)
vim bin/config_files/default_config.yaml
vim conf/submission_config.yaml
# test command for virus reads
nextflow run main.nf -profile test,<singularity|docker|conda> --virus
```
Expand All @@ -80,6 +80,20 @@ nextflow run main.nf -profile <docker|singularity> --species bacteria --submissi
```
Refer to the wiki for more information on input parameters and use cases

### 7. Custom metadata validation and custom BioSample package

TOSTADAS defaults to Pathogen.cl.1.0 (Pathogen: clinical or host-associated; version 1.0) NCBI BioSample package for submissions to the BioSample repository. You can submit using a different BioSample package by doing the following:
1. Change the package name in the `conf/submission_config.yamlsubmissions`. Choose one of the available [NCBI BioSample packages](https://www.ncbi.nlm.nih.gov/biosample/docs/packages/).
2. Add the necessary fields for your BioSample package to your input Excel file.
3. Add those fields as keys to the JSON file (`assets/custom_meta_fields/example_custom_fields.json`) and provide key info as needed.
replace_empty_with: TOSTADAS will replace any empty cells with this value (Example application: NCBI expects some value for any mandatory field, so if empty you may want to change it to "Not Provided".)
new_field_name: TOSTADAS will replace the field name in your metadata Excel file with this value. (Example application: you get weekly metadata Excel files and they specify 'animal_environment' but NCBI expects 'animal_env'; you can specify this once in the JSON file and it will changed on every run.)

**Submit to a custom BioSample package**
```
nextflow run main.nf -profile <docker|singularity> --species virus --submission --annotation --genbank true --sra true --biosample true --output_dir <path/to/output/dir/> --meta_path <path/to/metadata_file.xlsx> --submission_config <path/to/submission_config.yaml> --custom_fields_file <path/to/metadata_custom_fields.json>
```

## Get in Touch
If you need to report a bug, suggest new features, or just say “thanks”, [open an issue](https://github.com/CDCgov/tostadas/issues/new/choose) and we’ll try to get back to you as soon as possible!

Expand Down
6 changes: 0 additions & 6 deletions assets/custom_meta_fields/example_custom_fields.json
Original file line number Diff line number Diff line change
@@ -1,19 +1,13 @@
{
"test_field_1": {
"type": "String ",
"samples": ["Fl0004", "IL0005", "FL0015", "FL00234", 8],
"replace_empty_with": "not populated",
"new_field_name": "new_field_name"
},
"test_field_2": {
"type": "float",
"samples": ["Fl0004"],
"replace_empty_with": "",
"new_field_name": "new_field_name2"
},
"test_field_3": {
"type": "Boolean",
"samples": ["All ", "any random sample name"],
"replace_empty_with": "",
"new_field_name": ""
}
Expand Down
15 changes: 15 additions & 0 deletions assets/custom_meta_fields/onehealth_biosample_pkg.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"strain": {
"type": "String ",
"replace_empty_with": "Not Provided",
"new_field_name": "strain"
},
"source_type": {
"replace_empty_with": "Not Provided",
"new_field_name": "source_type"
},
"animal_environment": {
"replace_empty_with": "",
"new_field_name": "animal_env"
}
}
Binary file modified assets/metadata_template.xlsx
Binary file not shown.
Binary file not shown.
43 changes: 0 additions & 43 deletions bin/config_files/default_config.yaml

This file was deleted.

137 changes: 0 additions & 137 deletions bin/config_files/seqsender_main_config.yaml

This file was deleted.

Loading