Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrangler in sif #54

Merged
merged 12 commits into from
Jan 16, 2024
Merged

Wrangler in sif #54

merged 12 commits into from
Jan 16, 2024

Conversation

alfredsimkin
Copy link
Contributor

Incorporating Charlie's edits to put miptools wrangler snakemake in a sif file

@@ -1,6 +1,6 @@
configfile: 'variant_calling.yaml'
output_folder='/opt/analysis'
log_folder=config['output_directory']+'/run_settings'
log_folder='/opt/analysis/run_settings'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense - since snakemake is now inside the sif, run settings are now getting sent to a sif file internal location

@@ -43,6 +43,7 @@ singularity_bindings="-B $project_resources:/opt/project_resources
-B $species_resources:/opt/species_resources
-B $wrangler_directory:/opt/data
-B $output_directory:/opt/analysis
-B /home/charlie/projects/MIPTools_wrangler_in_sif/snakemake:/opt/snakemake
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This location probably shouldn't be hard coded. Maybe it gets changed in a later commit?

@@ -1,3 +1,9 @@
########################################################
# README
# this file uses variant_calling.yaml for its parameters
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping variant_calling and check_run_stats with the same yaml file makes sense - variant_calling depends on the outputs of check_run_stats to only perform variant calling on the haplotypes that mapped to correct locations of genome

@@ -0,0 +1,61 @@
#################################################
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is a subset of the variant_calling.sh script (only doing the stats part, for people who don't want to commit to a week and a half of variant calling but still want to get the basic run stats that variant calling performs at the beginning)

input_sample_sheet_directory: /home/charlie/projects/miptools_data/miptools_test-data/test_data

input_sample_sheet_name: sample_list.tsv
input_sample_sheet: /home/charlie/projects/miptools_data/miptools_test-data/test_data/sample_list.tsv
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice fix - decreases the number of variables a user has to enter into the yaml file

@@ -43,6 +55,7 @@ singularity_bindings="-B $project_resources:/opt/project_resources
-B $output_folder:/opt/analysis
-B $input_sample_sheet_directory:/opt/input_sample_sheet_directory
-B $fastq_dir:/opt/data
-B /home/charlie/projects/MIPTools_wrangler_in_sif/snakemake:/opt/snakemake
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping this path isn't hardcoded into the final shell script. Will look for a future commit that removes it

eval $(parse_yaml wrangler_by_sample.yaml)

function parse_sample_sheet_directory {
readarray -d "/" -t strarr <<< "$input_sample_sheet"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool Unix function for splitting up a string into an array. I'll have to try and remember this, thanks for sharing it

@@ -43,7 +43,6 @@ singularity_bindings="-B $project_resources:/opt/project_resources
-B $output_folder:/opt/analysis
-B $input_sample_sheet_directory:/opt/input_sample_sheet_directory
-B $fastq_dir:/opt/data
-B /home/charlie/projects/MIPTools_wrangler_in_sif/snakemake:/opt/snakemake
Copy link
Contributor Author

@alfredsimkin alfredsimkin Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent - I see this is where the hardcoded shell script bindings are getting taken back out. I also get why you added them in the first place now - allowed you to test changes to snakemake without having to build a new sif


rule copy_files:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following why the rule "copy_files" is not needed anymore. I think you (Charlie) explained this to me but I can't remember it. Is it because the setup run snakemake script already takes care of the copying?

setup_snakefile='setup_run.smk',
finish_snakefile='finish_run.smk',
input_configfile='wrangler_by_sample.yaml',
in_scripts='scripts'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the rule 'extract_by_arm" not need any input, or just not need these old copy_files inputs? github parsed the changes to look like no input is needed

script for this. Input is an arms file and a sample sheet. Output is an arms
file with rearranged columns and a two column file with names of all mips
and names of all samples (with no pairing between columns of any given row).
'''
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes - this rule probably isn't needed if it got moved to the setup_run.smk script

mem_mb=40000,
time_min=1440,
nodes=20
threads: 20
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where the copy_files rule got moved to, I see now. It looks like github swapped the file contents of the setup and finish_run steps somehow, and showed the changes that would convert one file into the other as if they had been made line by line in each file (instead of swapping out the entire contents of both files)

@alfredsimkin alfredsimkin merged commit ded8e51 into master Jan 16, 2024
@alfredsimkin alfredsimkin deleted the wrangler_in_sif branch January 16, 2024 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants