docs: Check & update of configuration defaults (first part) #532

ericblanc20 · 2024-07-05T13:46:49Z

First attempt. Only ngs_mapping & somatic_variant calling are used.

IMPORTANT NOTE: The default config is now out of sync with the STAR environment version. Is that OK to change the STAR version in the wrapper's environment, or should I create another PR?

github-actions · 2024-07-05T13:48:24Z

Please format your Python code with ruff: make fmt
Please check your Python code with ruff: make check
Please format your Snakemake code with snakefmt: make snakefmt

You can trigger all lints locally by running make lint

tedil

I think having all these examples will be really helpful to people getting started with the pipeline (in the CUBI environment)!
The config yamls generated will have quite long lines, though ;) but I'd rather have very long comments providing information and context than no information at all.

tedil · 2024-07-05T15:03:14Z

snappy_pipeline/apps/tpls/project_config.yaml

+  features:
+    path: /data/cephfs-1/work/projects/cubit/current/static_data/annotation/GENCODE/19/GRCh37/gencode.v19.annotation.gtf
+
+# Step Configuration ==============================================================================


While we're at it, we can correct the headers, which all just say "Step Configuration"

tedil · 2024-07-05T15:04:31Z

snappy_pipeline/apps/tpls/project_config.yaml

+# static_data_config:
+#   cosmic:
+#     path: /data/cephfs-1/work/projects/cubit/current/static_data/db/COSMIC/v90/GRCh38/CosmicAll.vcf.gz
+#   dbnsfp:
+#     path: /data/cephfs-1/work/projects/cubit/current/static_data/db/dbNSFP/3.5/GRCh38/dbNSFP.txt.gz
+#   dbsnp:
+#     path: /data/cephfs-1/work/projects/cubit/current/static_data/db/dbSNP/b147/GRCh38/common_all_20160407.vcf.gz
+#   reference:
+#     path: /fast/work/groups/cubi/projects/biotools/static_data/reference/GRCh38.d1.vd1/GRCh38.d1.vd1.fa
+#   features:
+#     path: /fast/work/groups/cubi/projects/biotools/static_data_by_ref/GRCh38/annotation/GENCODE/36/gencode.v36.primary_assembly.annotation.gtf


These are just the GRCh38 version for static_data_config defaults, correct? That should be mentioned in the description.

tedil · 2024-07-05T15:06:36Z

snappy_pipeline/workflows/ngs_mapping/model.py

@@ -43,6 +43,9 @@ class TargetCoverageReportEntry(SnappyModel):
      - name: IDT_xGen_V1_0
        pattern: "xGen Exome Research Panel V1\\.0*"
        path: "path/to/targets.bed"
+
+    Bed file for many Agilent exome panels can be found in
+    /fast/work/groups/cubi/projects/biotools/static_data/exome_panel/Agilent


The existing comment (not the one you added) seem to be incorrect, as I'm quite sure the path will be mapped to the name "IDT_xGen_V1_0" not "default" ;)
And I'm also not sure whether the pattern actually matches what the description claims.

I'm not sure what you mean here: the pattern is a correct (albeit strange) regular expression, and a biomedsheet entry matching such pattern is associated with the name, which is finally mapped to a path.

tedil · 2024-07-05T15:07:29Z

snappy_pipeline/workflows/ngs_mapping/model.py

    path_baits: str
+    """
+    Different exome panels cannot be accomodated here, because the selection method used for coverage is not used.


What exactly does this mean?

It means that the target_coverage sub-step will pick the name of the panel from the Samplesheet, and then map it to some actual files using this weird mapping scheme implemented in the __init__.py code. This allows multiple exome kits within the same dataset.
It is not the case with the path_baits in the somatic/mbcs bit. Here, we directly point to the exome kit bed file. There is no querying the samplesheet.

coveralls · 2024-07-24T16:18:28Z

coverage: 85.869% (+0.07%) from 85.8%
when pulling 4fe223b on 528-check-and-update-config-defaults
into 17c1a87 on main.

docs: Check & update of configuration defaults (first part)

a7712da

ericblanc20 requested a review from tedil July 5, 2024 13:46

ericblanc20 linked an issue Jul 5, 2024 that may be closed by this pull request

Check and update config defaults #528

Open

8 tasks

tedil reviewed Jul 5, 2024

View reviewed changes

ericblanc20 added 4 commits July 23, 2024 11:47

fix: make linting and continuous integration tests happy

f117400

docs: typos & explicit genome release

7f3fd69

docs: Added description of somatic variants config

f524bdb

style: make linting happy

4fe223b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Check & update of configuration defaults (first part) #532

docs: Check & update of configuration defaults (first part) #532

ericblanc20 commented Jul 5, 2024

github-actions bot commented Jul 5, 2024

tedil left a comment

tedil Jul 5, 2024

tedil Jul 5, 2024

tedil Jul 5, 2024

ericblanc20 Jul 23, 2024

tedil Jul 5, 2024

ericblanc20 Jul 5, 2024

coveralls commented Jul 24, 2024

docs: Check & update of configuration defaults (first part) #532

Are you sure you want to change the base?

docs: Check & update of configuration defaults (first part) #532

Conversation

ericblanc20 commented Jul 5, 2024

github-actions bot commented Jul 5, 2024

tedil left a comment

Choose a reason for hiding this comment

tedil Jul 5, 2024

Choose a reason for hiding this comment

tedil Jul 5, 2024

Choose a reason for hiding this comment

tedil Jul 5, 2024

Choose a reason for hiding this comment

ericblanc20 Jul 23, 2024

Choose a reason for hiding this comment

tedil Jul 5, 2024

Choose a reason for hiding this comment

ericblanc20 Jul 5, 2024

Choose a reason for hiding this comment

coveralls commented Jul 24, 2024