Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for gbasf2 projects with multiple output sub<xy> directories #122

Merged

Conversation

meliache
Copy link
Collaborator

@meliache meliache commented Sep 14, 2021

Before that, the assumption was that all outputs are saved in a final sub00 directory. But in future releases, jobs with many outputs (> 1000) can have outputs in additional sub<XY> directories.

This means that when downloading, we will also get multiple sub<XY> directories in the temporary download directory. However, currently, the b2luigi user expects that all downloaded file names will be in a common directory. Therefore, after the completed download, we move the contents of all sub* directories into the final output directory.

We can then still do the file-comparison cross-check that the download was complete by replace sub<XY>`` with a wildcard when doing the remote gb2_ds_list` command.

So far I just tested that the wildcards work for g2_ds_get and gb2_ds_list, but before merging I will run a complete gbasf2 task with this branch and see that everything still works.

Resolves #80

Before that, the assumption was that all outputs are saved in a final sub00
directory. But from gbasf2 release v5r1p3 to be released on 2021-09-16, jobs
with many outputs (> 1000) can have outputs in additional subxy directories.

This means that when downloading, we will also get multiple sub<xy> directories in
the download. However, currently, the b2luigi user expects that all downloaded
file names will be in a common directory, so after a completed download, we move
the contents of all sub* directories into the final output directory.

We can then still cross-check that the download was complete by replace sub<xy>
with a wildcard when doing the remote gb2_ds_list command

See #80
@meliache meliache added enhancement New feature or request gbasf2 Concerns the gbasf2/grid b2luigi wrapper labels Sep 14, 2021
@meliache meliache self-assigned this Sep 14, 2021
@codecov-commenter
Copy link

Codecov Report

Merging #122 (c7a7573) into main (4aaf849) will decrease coverage by 0.16%.
The diff coverage is 8.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #122      +/-   ##
==========================================
- Coverage   56.89%   56.73%   -0.17%     
==========================================
  Files          23       23              
  Lines        1494     1500       +6     
==========================================
+ Hits          850      851       +1     
- Misses        644      649       +5     
Impacted Files Coverage Δ
b2luigi/batch/processes/gbasf2.py 39.05% <8.33%> (-0.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4aaf849...c7a7573. Read the comment docs.

Instead of "/*/" for matching different sub<xy> directories, use "/sub*/", to
prevent other subdirectories from being matched (even if that should never
happen, better safe than sorry)
@meliache meliache merged commit a952a6c into main Sep 15, 2021
@meliache meliache deleted the feature/allow-for-gbasf2-jobs-with-multiple-output-subs branch September 15, 2021 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gbasf2 Concerns the gbasf2/grid b2luigi wrapper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Future proof gbasf2 batch by downloading all sub<xy> directories
2 participants