Download: Improved container priority and resolution #2364

MatthiasZepper · 2023-07-05T21:04:22Z

This PR improves the code in Download that detects and resolves container images in modules and configs.

Previously, a loop was used to prioritize http:// downloads over docker URIs. For example, from

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
   'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' :
   'biocontainers/fastqc:0.11.9--0' }"

the strings 'singularity', 'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' and 'biocontainers/fastqc:0.11.9--0' were extracted into a list. Then we iterated over this list and broke the loop as soon as a match with a URL regex or Docker regex was found.

This approach was flaky, because 'singularity' or 'apptainer' are valid Docker URIs. Thus, unless it was explicitly ignored as a false positive, it prevented further evaluation and detection of the real image paths. Now, evaluation continues in any case, which should make it more resilient for further syntax updates. Furthermore, the old approach also relied on the convention that the URL would always be listed before the Docker URI.

The old approach failed entirely for early DSL2, where a separate container declaration was used:

  if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
      container "https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0"
  } else {
      container "quay.io/biocontainers/fastqc:0.11.9--0"
  }

For each container declaration, a new loop was created, such that it was not possible to prioritize the direct download over the Docker URI for early DSL2. Therefore, each image was once downloaded and once pulled.

This PR now changes the approach to first gather all Docker URIs and download URLs from all modules and configs into a joint, global list. Subsequently, a new helper function prioritize_direct_download() removes all Docker URIs from the list for which also direct download URLs are contained. Consolidating container URIs/URLs from different modules is now possible for the first time.

In addition to the improved function, a new test has been devised that runs on mock modules in the tests/data directory. So any future module standard should be reflected by a mock module in that folder in the future.

PR checklist

This comment contains a description of changes (with reason)
CHANGELOG.md is updated
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

…flaky.

…s solution was prone to false positives and negatives.

…ne string which I abused as comment.

…w class are stateful, i.e. if the function is called without a parameter that was provided in a previous call, then it is still present.

…s happy.

codecov · 2023-07-06T15:40:53Z

Codecov Report

Merging #2364 (e459c66) into dev (d21770d) will increase coverage by 0.04%.
The diff coverage is 97.14%.

@@            Coverage Diff             @@
##              dev    #2364      +/-   ##
==========================================
+ Coverage   72.76%   72.80%   +0.04%     
==========================================
  Files          78       78              
  Lines        8889     8888       -1     
==========================================
+ Hits         6468     6471       +3     
+ Misses       2421     2417       -4

Impacted Files	Coverage Δ
nf_core/download.py	`60.21% <97.14%> (+0.73%)`	⬆️

... and 2 files with indirect coverage changes

mirpedrol

LGTM, only some small comments

nf_core/download.py

mirpedrol · 2023-07-13T09:28:42Z

nf_core/download.py

+            valid_containers = list(filter(either_url_or_docker.match, container_value_defs))
+
+            if valid_containers:
+                cleaned_matches = cleaned_matches + valid_containers


Just so I understand, what's the difference between cleaned_matches and valid_containers? Are they obtained with the same regex?

cleaned matches is the list (line 784) that collects all valid_containers from the loop over the raw findings (line 796).

mirpedrol · 2023-07-13T09:31:24Z

nf_core/download.py

-                    this_container = docker_match.group(0)
-                    break
+            # all implemented options exhausted. Nothing left to be done:
+            log.error(


Should this be warning? It doesn't stop execution, no?

Indeed. Until 2.8, the download would exit and fail if a container image download failed. But that meant that pipelines could not be downloaded with container images, if just one image failed for whatever reason. I briefly discussed this with mashehu, and we agreed that the friendlier user experience would be to continue the download.

Like so, the user only has to manually troubleshoot one image download, in contrast to somehow obtaining 40 images or so one by one. Therefore, this changed behaviour was implemented with release 2.9.

mirpedrol · 2023-07-13T09:34:36Z

nf_core/download.py

+        """
+        d = {}
+        for c in container_list:
+            log.info(c)


Can we add a message? For example:

Suggested change

log.info(c)

log.info(f"Evaluating container {c}")

Ehm...no, because that log.info() statement was never meant to stay in there, as it just clutters the output. Excellent catch, I have deleted it now!

That was a leftover of a debugging process, when I failed to understand why certain images showed up in the container_list even if they were not submitted to the function. This happened because the function retained the state from a previous execution, due to my oversight regarding the distinction between Class and Static methods. I assumed the behaviour of a Static method without explicitly defining it as one.)

mirpedrol · 2023-07-13T09:39:10Z

tests/test_download.py

+    #
+    @with_temporary_folder
+    @mock.patch("nf_core.utils.fetch_wf_config")
+    def test_find_container_images_modules(self, tmp_path, mock_fetch_wf_config):


This test will be very useful, thanks!

You're welcome!

It will benefit me as well, as I won't be surprised by a sudden new module syntax. Once we settle on a new template syntax and modify the modules create and lint accordingly, we can also directly include one in this test to ensure that downloads can also resolve the container paths accurately.

MatthiasZepper added 5 commits July 5, 2023 17:38

Remove loop for resolution of container source priority. This is too …

56086b1

…flaky.

New helper function to prioritize direct download URLs, since previou…

60358e9

…s solution was prone to false positives and negatives.

New test allows to test specific container notations in mock modules.

d7784e3

Old DSL2 modules resolve correctly for the first time

d8fadae

Got rid of a DeprecationWarning, that was triggered due to a multi-li…

c3523c9

…ne string which I abused as comment.

MatthiasZepper changed the base branch from master to dev July 5, 2023 21:04

Changelog changes.

60c9cef

nf-core deleted a comment from github-actions bot Jul 5, 2023

MatthiasZepper added 3 commits July 6, 2023 16:35

I had never really realized that the functions in the DownloadWorkflo…

1ab228b

…w class are stateful, i.e. if the function is called without a parameter that was provided in a previous call, then it is still present.

Remove trailing whitespace from mock module so editorconfig-checker i…

5a6de12

…s happy.

Run prettier on CHANGELOG.

47388c3

MatthiasZepper requested a review from ewels July 6, 2023 16:07

MatthiasZepper self-assigned this Jul 7, 2023

MatthiasZepper requested a review from mirpedrol July 12, 2023 13:24

mirpedrol approved these changes Jul 13, 2023

View reviewed changes

Commit review suggestions by @mirpedrol.

e459c66

MatthiasZepper merged commit 0fe54b0 into nf-core:dev Jul 17, 2023

MatthiasZepper deleted the ImprovedContainerPriority branch July 17, 2023 10:48

MatthiasZepper mentioned this pull request Aug 7, 2023

Ensure compatibility of nf-core download with Tower #1899

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download: Improved container priority and resolution #2364

Download: Improved container priority and resolution #2364

MatthiasZepper commented Jul 5, 2023

codecov bot commented Jul 6, 2023 •

edited

Loading

mirpedrol left a comment

mirpedrol Jul 13, 2023

MatthiasZepper Jul 14, 2023

mirpedrol Jul 13, 2023

MatthiasZepper Jul 14, 2023

mirpedrol Jul 13, 2023

MatthiasZepper Jul 14, 2023

mirpedrol Jul 13, 2023

MatthiasZepper Jul 14, 2023

Download: Improved container priority and resolution #2364

Download: Improved container priority and resolution #2364

Conversation

MatthiasZepper commented Jul 5, 2023

PR checklist

codecov bot commented Jul 6, 2023 • edited Loading

Codecov Report

mirpedrol left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jul 6, 2023 •

edited

Loading