Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download: Improved container priority and resolution #2364

Merged
merged 10 commits into from
Jul 17, 2023

Conversation

MatthiasZepper
Copy link
Member

This PR improves the code in Download that detects and resolves container images in modules and configs.

Previously, a loop was used to prioritize http:// downloads over docker URIs. For example, from

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
   'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' :
   'biocontainers/fastqc:0.11.9--0' }"

the strings 'singularity', 'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' and 'biocontainers/fastqc:0.11.9--0' were extracted into a list. Then we iterated over this list and broke the loop as soon as a match with a URL regex or Docker regex was found.

This approach was flaky, because 'singularity' or 'apptainer' are valid Docker URIs. Thus, unless it was explicitly ignored as a false positive, it prevented further evaluation and detection of the real image paths. Now, evaluation continues in any case, which should make it more resilient for further syntax updates. Furthermore, the old approach also relied on the convention that the URL would always be listed before the Docker URI.

The old approach failed entirely for early DSL2, where a separate container declaration was used:

  if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
      container "https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0"
  } else {
      container "quay.io/biocontainers/fastqc:0.11.9--0"
  }

For each container declaration, a new loop was created, such that it was not possible to prioritize the direct download over the Docker URI for early DSL2. Therefore, each image was once downloaded and once pulled.

This PR now changes the approach to first gather all Docker URIs and download URLs from all modules and configs into a joint, global list. Subsequently, a new helper function prioritize_direct_download() removes all Docker URIs from the list for which also direct download URLs are contained. Consolidating container URIs/URLs from different modules is now possible for the first time.

In addition to the improved function, a new test has been devised that runs on mock modules in the tests/data directory. So any future module standard should be reflected by a mock module in that folder in the future.

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

@MatthiasZepper MatthiasZepper changed the base branch from master to dev July 5, 2023 21:04
@nf-core nf-core deleted a comment from github-actions bot Jul 5, 2023
…w class are stateful, i.e. if the function is called without a parameter that was provided in a previous call, then it is still present.
@codecov
Copy link

codecov bot commented Jul 6, 2023

Codecov Report

Merging #2364 (e459c66) into dev (d21770d) will increase coverage by 0.04%.
The diff coverage is 97.14%.

@@            Coverage Diff             @@
##              dev    #2364      +/-   ##
==========================================
+ Coverage   72.76%   72.80%   +0.04%     
==========================================
  Files          78       78              
  Lines        8889     8888       -1     
==========================================
+ Hits         6468     6471       +3     
+ Misses       2421     2417       -4     
Impacted Files Coverage Δ
nf_core/download.py 60.21% <97.14%> (+0.73%) ⬆️

... and 2 files with indirect coverage changes

@MatthiasZepper MatthiasZepper requested a review from ewels July 6, 2023 16:07
@MatthiasZepper MatthiasZepper self-assigned this Jul 7, 2023
Copy link
Member

@mirpedrol mirpedrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only some small comments

nf_core/download.py Outdated Show resolved Hide resolved
valid_containers = list(filter(either_url_or_docker.match, container_value_defs))

if valid_containers:
cleaned_matches = cleaned_matches + valid_containers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I understand, what's the difference between cleaned_matches and valid_containers? Are they obtained with the same regex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned matches is the list (line 784) that collects all valid_containers from the loop over the raw findings (line 796).

this_container = docker_match.group(0)
break
# all implemented options exhausted. Nothing left to be done:
log.error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be warning? It doesn't stop execution, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Until 2.8, the download would exit and fail if a container image download failed. But that meant that pipelines could not be downloaded with container images, if just one image failed for whatever reason. I briefly discussed this with mashehu, and we agreed that the friendlier user experience would be to continue the download.

Like so, the user only has to manually troubleshoot one image download, in contrast to somehow obtaining 40 images or so one by one. Therefore, this changed behaviour was implemented with release 2.9.

"""
d = {}
for c in container_list:
log.info(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a message? For example:

Suggested change
log.info(c)
log.info(f"Evaluating container {c}")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehm...no, because that log.info() statement was never meant to stay in there, as it just clutters the output. Excellent catch, I have deleted it now!

That was a leftover of a debugging process, when I failed to understand why certain images showed up in the container_list even if they were not submitted to the function. This happened because the function retained the state from a previous execution, due to my oversight regarding the distinction between Class and Static methods. I assumed the behaviour of a Static method without explicitly defining it as one.)

#
@with_temporary_folder
@mock.patch("nf_core.utils.fetch_wf_config")
def test_find_container_images_modules(self, tmp_path, mock_fetch_wf_config):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will be very useful, thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome!

It will benefit me as well, as I won't be surprised by a sudden new module syntax. Once we settle on a new template syntax and modify the modules create and lint accordingly, we can also directly include one in this test to ensure that downloads can also resolve the container paths accurately.

@MatthiasZepper MatthiasZepper merged commit 0fe54b0 into nf-core:dev Jul 17, 2023
@MatthiasZepper MatthiasZepper deleted the ImprovedContainerPriority branch July 17, 2023 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants