-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Check & update of configuration defaults (first part) #532
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having all these examples will be really helpful to people getting started with the pipeline (in the CUBI environment)!
The config yamls generated will have quite long lines, though ;) but I'd rather have very long comments providing information and context than no information at all.
features: | ||
path: /data/cephfs-1/work/projects/cubit/current/static_data/annotation/GENCODE/19/GRCh37/gencode.v19.annotation.gtf | ||
|
||
# Step Configuration ============================================================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're at it, we can correct the headers, which all just say "Step Configuration"
# static_data_config: | ||
# cosmic: | ||
# path: /data/cephfs-1/work/projects/cubit/current/static_data/db/COSMIC/v90/GRCh38/CosmicAll.vcf.gz | ||
# dbnsfp: | ||
# path: /data/cephfs-1/work/projects/cubit/current/static_data/db/dbNSFP/3.5/GRCh38/dbNSFP.txt.gz | ||
# dbsnp: | ||
# path: /data/cephfs-1/work/projects/cubit/current/static_data/db/dbSNP/b147/GRCh38/common_all_20160407.vcf.gz | ||
# reference: | ||
# path: /fast/work/groups/cubi/projects/biotools/static_data/reference/GRCh38.d1.vd1/GRCh38.d1.vd1.fa | ||
# features: | ||
# path: /fast/work/groups/cubi/projects/biotools/static_data_by_ref/GRCh38/annotation/GENCODE/36/gencode.v36.primary_assembly.annotation.gtf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just the GRCh38 version for static_data_config
defaults, correct? That should be mentioned in the description.
@@ -43,6 +43,9 @@ class TargetCoverageReportEntry(SnappyModel): | |||
- name: IDT_xGen_V1_0 | |||
pattern: "xGen Exome Research Panel V1\\.0*" | |||
path: "path/to/targets.bed" | |||
|
|||
Bed file for many Agilent exome panels can be found in | |||
/fast/work/groups/cubi/projects/biotools/static_data/exome_panel/Agilent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing comment (not the one you added) seem to be incorrect, as I'm quite sure the path will be mapped to the name "IDT_xGen_V1_0"
not "default"
;)
And I'm also not sure whether the pattern actually matches what the description claims.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here: the pattern is a correct (albeit strange) regular expression, and a biomedsheet entry matching such pattern is associated with the name, which is finally mapped to a path.
path_baits: str | ||
""" | ||
Different exome panels cannot be accomodated here, because the selection method used for coverage is not used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that the target_coverage
sub-step will pick the name of the panel from the Samplesheet, and then map it to some actual files using this weird mapping scheme implemented in the __init__.py
code. This allows multiple exome kits within the same dataset.
It is not the case with the path_baits
in the somatic/mbcs bit. Here, we directly point to the exome kit bed file. There is no querying the samplesheet.
First attempt. Only ngs_mapping & somatic_variant calling are used.
IMPORTANT NOTE: The default config is now out of sync with the STAR environment version. Is that OK to change the STAR version in the wrapper's environment, or should I create another PR?