Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the profile correctly handle snakemake groups? #25

Closed
SilasK opened this issue Aug 30, 2019 · 16 comments
Closed

Can the profile correctly handle snakemake groups? #25

SilasK opened this issue Aug 30, 2019 · 16 comments

Comments

@SilasK
Copy link

SilasK commented Aug 30, 2019

Snakemake-Profiles/doc#9

@percyfal
Copy link
Collaborator

percyfal commented Sep 3, 2019

Hi,

I would need to take a look at this as I yet have to use groups. Reading through the issue you referenced, it seems you need to process job_properties for 'type'? I'll set up a simple test case and see what errors occur and work my way on from there.

@SilasK
Copy link
Author

SilasK commented Sep 3, 2019

I checked for the type. But then how can we set the name, memory, thread arguments.
Maybe you want to move this issue to the Snakemake-Profiles/doc repository, as it is a general issue.

@SilasK SilasK closed this as completed Nov 2, 2019
@jdblischak
Copy link

Has anyone found a workaround for naming group jobs? I searched this repo and couldn't find anything. I like to name my rules and Slurm output files using the name of the rule. Here are various things I attempted:

I do this in slurm-submit.py:

SBATCH_DEFAULTS = """
job-name=smk-{rule}-{wildcards}
output=logs/{rule}/{rule}-{wildcards}-%j.out
"""

The advice in Snakemake-Profiles/doc#9 is:

You should check for job_properties["type"] to be single to see if it is an ordinary job. For groups, this will be group.

So I tried adding the following hack to slurm-submit.py:

if job_properties["type"] == "single":
    SBATCH_DEFAULTS = """
    job-name=smk-{rule}-{wildcards}
    output=logs/{rule}/{rule}-{wildcards}-%j.out
    """
elif job_properties["type"] == "group":
    SBATCH_DEFAULTS = """
    job-name=smk-{groupid}-{wildcards}
    output=logs/{groupid}/{groupid}-{wildcards}-%j.out
    """
else:
    SBATCH_DEFAULTS = """
    """

This failed because it doesn't understand {groupid}. This is strange since it does understand {rule}, which also comes from job_properties. It turns out that rule is specifically added:

if hasattr(job, "rule"):
_variables.update(dict(rule=job.rule))

So I did the same thing with groupid, making it replace rule when submitting a group job:

    if hasattr(job, "groupid"):
        _variables.update(dict(rule=job.groupid))

This fixed the naming issues with the job name and output file passed to Slurm. However, then it failed because Snakemake sums the threads for the group jobs. I was trying to run 50 jobs per group, which requested 50 CPUs, so the jobs failed because none of the compute nodes have 50 CPUs. This is apparently a known issue:

snakemake/snakemake#872 (comment)

In summary: If you're trying to use group jobs with Slurm, warning that it's not going to be straightforward.

@SilasK
Copy link
Author

SilasK commented Apr 22, 2021

I use this solution for the naming of group jobs. https://github.com/metagenome-atlas/clusterprofile/blob/91b294cfe1ae65c1d3099309284e7b92a8e19df9/%7B%7Bcookiecutter.profile_name%7D%7D/scheduler.py#L28

However, be aware that the resource definition is another challenge.

@jdblischak
Copy link

@SilasK Thanks for sharing your solution! I also like the organization of your scripts and parameters.

<rant>I find the Slurm profile in this repo to be very confusing. I'm jumping back and forth between various Python scripts, and never know which variables are available in which context. Honestly I am frustrated that --cluster-config is being deprecated. It wasn't perfect, but it got the job done. I am heavily invested in Snakemake, and am very grateful for all its enabled me to get done over the years, but this new --profile setup is a hard sell to new users shopping around for a pipeline tool. "Oh you want to submit your rules to your cluster? OK, here's a pile of Python scripts for you to maintain." </rant>

@SilasK
Copy link
Author

SilasK commented Apr 23, 2021

I understand that @johanneskoester develops snakemake mainly for the cloud. (This is a implicit request for better HPC support)

I think the --profile flag is a good idea to cluster some of snakemake's command-line options. Unfortunately, not all profiles are well maintained.
I also don't understand all the argparse in many of the cluster submission scripts. They are nowhere used, as far as I understand.
With my generic cluster profile I tried to make a template that can be pre-set by the developer and can work on all main cluster systems. It works not so bad for my pipeline. The use of the key mapping e.g. resource to command line makes it reasonable modulable.

Is --cluster-config deprecated since v6? I didn't noticed.

The only thing I need the --cluster-config for it to define different queues/partitions. Ideally I should have a submission script that chooses automatically the best queue.

@jdblischak
Copy link

From https://snakemake.readthedocs.io/en/stable/executing/cli.html#CLUSTER:

--cluster-config, -u

A JSON or YAML file that defines the wildcards used in ‘cluster’for specific rules, instead of having them specified in the Snakefile. For example, for rule ‘job’ you may define: { ‘job’ : { ‘time’ : ‘24:00:00’ } } to specify the time for rule ‘job’. You can specify more than one file. The configuration files are merged with later values overriding earlier ones. This option is deprecated in favor of using –profile, see docs.

Default: []

I interpret this as we shouldn't be using it for new pipelines.

And to be clear, I do like some of the new features enabled by specifying the resources directly in the Snakefile as opposed to a separate config file. For example, the ability to request more resources on restart attempts is extremely helpful. It lets me run the majority of jobs with lower memory resources, and then the few that require more can be restarted with more memory. My frustration is that it's no longer as easy to accomplish tasks like setting default parameters for all jobs or specifying less common arguments (i.e. things other than threads and memory). I don't feel like I am trying to accomplish anything unique to me (naming jobs and slurm log files with rule information, specifying slurm job requirements per rule in addition to threads and memory), and I worry that I am re-inventing the wheel. At this point I feel like my options are: 1) incorporate --cluster-config and hope it doesn't get completely removed for a while, or 2) dive into these Python scripts with pdb to develop a better mental model of how they interact and how I can configure them.

@percyfal
Copy link
Collaborator

Hi @jdblischak and @SilasK,

thanks for the comments. I'll try to answer or address the points you raised as best I can.

I'll start by addressing the cluster-config comments. The readthedocs section on cluster-config states:

Cluster Configuration (deprecated)

While still being possible, cluster configuration has been deprecated by the introduction of Profiles.

This has been the case since - what, version 5.*? - but apparently the option is still available. That it still remains probably reflects the fact that many users still rely on it in favor of profiles, mainly(?) because it makes it easy to set default values and the simple mapping to rule names (in short, simplicity). My personal solution to setting defaults has been to rely on schemas that define rule resources, but that's probably not an easy sell for everyone. I'm frankly not sure what the best solution is, but if there is a generic solution to be had, the place for that discussion should be in a snakemake issue - maybe as a general comment on why cluster-config is deprecated in the first place? I have been looking at how rule configuration is implemented in the lsf profile, but due to lack of time decided to stick with the current setup for now.

Which brings me to maintenance. @SilasK I don't know if your comment concerning maintenance was aimed at this particular profile or profiles in general, but suffice to say that I'm finding it difficult to find the time to work on improvements in particular and on the profile in general. The lack of a generic solution for how to configure rule resources has also been an impediment.

Similarly, the comment on argparse @SilasK - did that refer to the slurm profile?

Finally, I'm sorry to hear you find the profile confusing @jdblischak . If there are any specifics you want discuss, feel free to open a separate issue.

Cheers,

Per

@SilasK
Copy link
Author

SilasK commented Apr 27, 2021

I just wanted to add that using default-resources in the profile is maybe a better solution to set default resources.

@ percyfal Yes I'm speaking for profile sin general, but yes in the slurm profile there is argparse used that I don't understand what it's for.
@ percyfal What would be needed for a general solution? I would say all cluster systems have the same resources: cpus and memory, time, and log files.

@jdblischak You say you want to:

  • naming jobs and slurm log files with rule information
    I think my solution gives fou this.
  • specifying slurm job requirements per rule in addition to threads and memory
    What is more? You can also define string based resource keywords in newer version of snakemake

@jdblischak
Copy link

@percyfal First let me start by thanking you for all your work to create and maintain this profile! It helped me get up and running quickly.

My frustration is more aimed at the fact that all of your efforts are needed at all. Interfacing with a job scheduler has always been a key feature for me when using Snakemake, so offloading it to separate repos with individual maintainers seems fragile. What if Slurm introduces a breaking change? Even if it gets quickly fixed in this repository, how will all the users of this profile update their projects? They can't pip install the latest version, and it's likely they made various changes to customize the profile for their needs, so they'd have to merge it manually. And this is assuming they even know about this repo. If the person found the Snakemake pipeline from a URL in a scientific paper, they will likely have little idea on how to update it.

I'm frankly not sure what the best solution is, but if there is a generic solution to be had, the place for that discussion should be in a snakemake issue - maybe as a general comment on why cluster-config is deprecated in the first place?

As may be obvious, I was a heavy Snakemake user in 2015-16, and have recently returned to it. The decision to deprecate functionality is a tough one, so I assume they had good reasons. Looking at the changelog, support for profiles was first added in Sep 2017 in 4.1.0, and the deprecation notice for cluster-config was added to the docs in Jan 2020.

However, I haven't been able to find much documentation of this decision. The GitHub Issues only go back to 2019 (when it migrated), the old Bitbucket repo appears to no longer have the old Issues, and I couldn't find anything in the Google Group.

In general I would expect the documentation on how to use profiles for an HPC to either be in the official docs or the Snakemake-profiles, but both are scant, as has been noted by others, e.g. in Snakemake-Profiles/doc#17

https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles
https://snakemake.readthedocs.io/en/stable/executing/cluster.html
https://github.com/Snakemake-Profiles/doc

If there are any specifics you want discuss, feel free to open a separate issue.

@percyfal Will do! Again, thanks for all your work to main this useful resource.

I just wanted to add that using default-resources in the profile is maybe a better solution to set default resources.

@SilasK Thanks for this advice. I agree, and I am going to look into using --default-resources.

What is more? You can also define string based resource keywords in newer version of snakemake

At the moment I want to add qos. How can I specify a default for all rules but then allow a rule to override it. I know I could specify a default in SBATCH_DEFAULTS in slurm-submit.py to apply the same qos to all jobs. But I'm at a loss at how to make it dynamically update if a specific rule defines qos in its resources field. I'd probably start by editing RESOURCE_MAPPING and see if the error message was informative. But this seems inefficient, especially considering that I know I am not the first person that wanted to add an additional dynamic resource like mem_mb. From #24, it seems like one solution is to use a --cluster-config file. But then what is the advantage of using --profile if I am still using the deprecated option is was supposed to replace? Specifying default arguments passed to Snakemake is convenient, but I was already effectively doing this by having a bash script that submitted Snakemake with all my preferred options.

@SilasK
Copy link
Author

SilasK commented Apr 27, 2021

Which brings me to maintenance. @SilasK
Your profile is extremely well maintained. I was speaking in general.

@percyfal
Copy link
Collaborator

I just wanted to add that using default-resources in the profile is maybe a better solution to set default resources.

Thanks @SilasK for bringing this to my attention, I hadn't noticed this option (too many options, too little time).

@ percyfal Yes I'm speaking for profile sin general, but yes in the slurm profile there is argparse used that I don't understand what it's for.

It's just used to parse and retrieve the jobscript name AFAICT - I'm saying that because it's actually not my implementation :)

@ percyfal What would be needed for a general solution? I would say all cluster systems have the same resources: cpus and memory, time, and log files.

I was more thinking about what would replace --cluster-config, if anything at all. At least that solution was unique in the sense that there was a one-to-one mapping from sections in cluster-config.yaml to rule names. While putting rule configurations in the general configuration file adds the benefit to request more resources on restarts (as @jdblischak alluded to), I'm not aware of any recommendation on how this configuration should be setup - nor should there be, I guess, as configuration files are setup in so many different ways. The alternative would be to take the approach as implemented in the lsf profile where an additional config files lsf.yaml is used to configure rule resources (but without using the --cluster-config option). I have considered this solution but decided to continue relying on --cluster-config for now. In any case, it would be nice if all profiles adopted the same strategy to make it easier for users. Maybe @mbhall88 can comment on this?

@percyfal
Copy link
Collaborator

@percyfal First let me start by thanking you for all your work to create and maintain this profile! It helped me get up and running quickly.

Thanks, that means a lot!

My frustration is more aimed at the fact that all of your efforts are needed at all. Interfacing with a job scheduler has always been a key feature for me when using Snakemake, so offloading it to separate repos with individual maintainers seems fragile. What if Slurm introduces a breaking change? Even if it gets quickly fixed in this repository, how will all the users of this profile update their projects? They can't pip install the latest version, and it's likely they made various changes to customize the profile for their needs, so they'd have to merge it manually. And this is assuming they even know about this repo. If the person found the Snakemake pipeline from a URL in a scientific paper, they will likely have little idea on how to update it.

Yes, I see your point. Your suggestion would be that the profile should preferably be installed via pip and/or conda?

At the moment I want to add qos. How can I specify a default for all rules but then allow a rule to override it. I know I could specify a default in SBATCH_DEFAULTS in slurm-submit.py to apply the same qos to all jobs. But I'm at a loss at how to make it dynamically update if a specific rule defines qos in its resources field. I'd probably start by editing RESOURCE_MAPPING and see if the error message was informative. But this seems inefficient, especially considering that I know I am not the first person that wanted to add an additional dynamic resource like mem_mb. From #24, it seems like one solution is to use a --cluster-config file. But then what is the advantage of using --profile if I am still using the deprecated option is was supposed to replace? Specifying default arguments passed to Snakemake is convenient, but I was already effectively doing this by having a bash script that submitted Snakemake with all my preferred options.

Let me think about this and get back to you later - maybe in a separate issue? Anyhow, thanks for the thoughtful comments.

@jdblischak
Copy link

Your suggestion would be that the profile should preferably be installed via pip and/or conda?

Yep! So much of this profile is boiler plate code, not the user's specific configuration. In other words, the software and the user-config is too intertwined. I'd prefer them to be separate so I could do a hypothetical conda update smk-profile-slurm to get the latest code without having to edit my project-specific config files.

Let me think about this and get back to you later - maybe in a separate issue? Anyhow, thanks for the thoughtful comments.

I found a solution that works for me. I'll open a new Issue to share. Thanks!

@mbhall88
Copy link
Member

Maybe @mbhall88 can comment on this?

The current solution I use for cluster configuration in the LSF profile is general in the sense that there is nothing specific to LSF in it. How I parse it within the profile may be LSF specific (possibly not).

Sorry, I haven't read everything in this issue. Was there anything else you wanted me to comment on @percyfal ?

@percyfal
Copy link
Collaborator

Maybe @mbhall88 can comment on this?

The current solution I use for cluster configuration in the LSF profile is general in the sense that there is nothing specific to LSF in it. How I parse it within the profile may be LSF specific (possibly not).

Sorry, I haven't read everything in this issue. Was there anything else you wanted me to comment on @percyfal ?

Not really, I was just curious to hear what your thoughts on providing support for rule/cluster config. Sorry to drag you into the discussion :-) Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants