-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide support for granule wildcard patterns in data downloader #138
Conversation
* Issues/91 (podaac#92) * added citation creation tests and functionality to subscriber and downloader * added verbose option to create_citation_file command, previously hard coded * updated changelog (whoops) and fixed regression test: 1. Issue where the citation file now downloaded affected the counts 2. Issue where the logic for determining if a file modified time was changing or not was picking up the new citation file which _always_ gets rewritten to update the 'last accessed' date. * updated request to include exec_info in warning; fixed issue with params not being a dictionary caused errors * changed a warning to debug for citation file. fixed test issues * Enable debug logging during regression tests and set max parallel workflows to 2 * added output to pytest * fixed test to only look for downlaoded data files not citation file due to 'random' cmr errors when creating a citation. * added mock testing and retry on 503 * added 503 fixes Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> * fixed issues where token was not proagated to CMR queries (podaac#95) * Misc fixes (podaac#101) * added ".tiff" to default extensions to address podaac#100 * removed 'warning' message on not downloading all data to close podaac#99 * updated help documentation for start/end times to close podaac#79 * added version update, updates to CHANGELOG * added token get,delete, refresh and list operations * Revert "added token get,delete, refresh and list operations" This reverts commit 15aba90. * Update python-app.yml * updated poetry version Version matches build/test versions. * Issues/98 (podaac#107) * added token get,delete, refresh and list operations * Revert "added token get,delete, refresh and list operations" This reverts commit 15aba90. * added EDL (not cmr-token) based get, list,delete, refresh token * updated token regression tests * updates and tests for subscriber moving to EDL. * marked tests as regression test * Update subscriber/podaac_data_downloader.py Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * Update subscriber/podaac_data_subscriber.py Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * Update subscriber/podaac_access.py Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * Update subscriber/podaac_access.py Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * Update subscriber/podaac_access.py Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * added exec info to errors, cleaned up some log statements Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * Issues/109 (podaac#111) * Develop (podaac#103) * Issues/91 (podaac#92) * added citation creation tests and functionality to subscriber and downloader * added verbose option to create_citation_file command, previously hard coded * updated changelog (whoops) and fixed regression test: 1. Issue where the citation file now downloaded affected the counts 2. Issue where the logic for determining if a file modified time was changing or not was picking up the new citation file which _always_ gets rewritten to update the 'last accessed' date. * updated request to include exec_info in warning; fixed issue with params not being a dictionary caused errors * changed a warning to debug for citation file. fixed test issues * Enable debug logging during regression tests and set max parallel workflows to 2 * added output to pytest * fixed test to only look for downlaoded data files not citation file due to 'random' cmr errors when creating a citation. * added mock testing and retry on 503 * added 503 fixes Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> * fixed issues where token was not proagated to CMR queries (podaac#95) * Misc fixes (podaac#101) * added ".tiff" to default extensions to address podaac#100 * removed 'warning' message on not downloading all data to close podaac#99 * updated help documentation for start/end times to close podaac#79 * added version update, updates to CHANGELOG * added token get,delete, refresh and list operations * Revert "added token get,delete, refresh and list operations" This reverts commit 15aba90. * Update python-app.yml Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> * updated poetry version Version matches build/test versions. * Update README.md * Update podaac_data_downloader.py Fixing for issues 109 - adding capability to download by granule-name * Update Downloader.md Fixed the help file * added changelog entries, regressiont ests * added poetry lock cleanup Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> Co-authored-by: sureshshsv <45676320+sureshshsv@users.noreply.github.com> Co-authored-by: sureshshsv <suresh.vannan@jpl.nasa.gov> * added README information and updates (podaac#113) * fixed pymock issues... again * Extension regex (podaac#121) * extend -e option to handle regular expressions (podaac#115) * Develop into Main (1.12.0) (podaac#114) * Issues/91 (podaac#92) * added citation creation tests and functionality to subscriber and downloader * added verbose option to create_citation_file command, previously hard coded * updated changelog (whoops) and fixed regression test: 1. Issue where the citation file now downloaded affected the counts 2. Issue where the logic for determining if a file modified time was changing or not was picking up the new citation file which _always_ gets rewritten to update the 'last accessed' date. * updated request to include exec_info in warning; fixed issue with params not being a dictionary caused errors * changed a warning to debug for citation file. fixed test issues * Enable debug logging during regression tests and set max parallel workflows to 2 * added output to pytest * fixed test to only look for downlaoded data files not citation file due to 'random' cmr errors when creating a citation. * added mock testing and retry on 503 * added 503 fixes Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> * fixed issues where token was not proagated to CMR queries (podaac#95) * Misc fixes (podaac#101) * added ".tiff" to default extensions to address podaac#100 * removed 'warning' message on not downloading all data to close podaac#99 * updated help documentation for start/end times to close podaac#79 * added version update, updates to CHANGELOG * added token get,delete, refresh and list operations * Revert "added token get,delete, refresh and list operations" This reverts commit 15aba90. * Update python-app.yml * updated poetry version Version matches build/test versions. * Issues/98 (podaac#107) * added token get,delete, refresh and list operations * Revert "added token get,delete, refresh and list operations" This reverts commit 15aba90. * added EDL (not cmr-token) based get, list,delete, refresh token * updated token regression tests * updates and tests for subscriber moving to EDL. * marked tests as regression test * Update subscriber/podaac_data_downloader.py Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * Update subscriber/podaac_data_subscriber.py Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * Update subscriber/podaac_access.py Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * Update subscriber/podaac_access.py Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * Update subscriber/podaac_access.py Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * added exec info to errors, cleaned up some log statements Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> * Issues/109 (podaac#111) * Develop (podaac#103) * Issues/91 (podaac#92) * added citation creation tests and functionality to subscriber and downloader * added verbose option to create_citation_file command, previously hard coded * updated changelog (whoops) and fixed regression test: 1. Issue where the citation file now downloaded affected the counts 2. Issue where the logic for determining if a file modified time was changing or not was picking up the new citation file which _always_ gets rewritten to update the 'last accessed' date. * updated request to include exec_info in warning; fixed issue with params not being a dictionary caused errors * changed a warning to debug for citation file. fixed test issues * Enable debug logging during regression tests and set max parallel workflows to 2 * added output to pytest * fixed test to only look for downlaoded data files not citation file due to 'random' cmr errors when creating a citation. * added mock testing and retry on 503 * added 503 fixes Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> * fixed issues where token was not proagated to CMR queries (podaac#95) * Misc fixes (podaac#101) * added ".tiff" to default extensions to address podaac#100 * removed 'warning' message on not downloading all data to close podaac#99 * updated help documentation for start/end times to close podaac#79 * added version update, updates to CHANGELOG * added token get,delete, refresh and list operations * Revert "added token get,delete, refresh and list operations" This reverts commit 15aba90. * Update python-app.yml Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> * updated poetry version Version matches build/test versions. * Update README.md * Update podaac_data_downloader.py Fixing for issues 109 - adding capability to download by granule-name * Update Downloader.md Fixed the help file * added changelog entries, regressiont ests * added poetry lock cleanup Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> Co-authored-by: sureshshsv <45676320+sureshshsv@users.noreply.github.com> Co-authored-by: sureshshsv <suresh.vannan@jpl.nasa.gov> * added README information and updates (podaac#113) * fixed pymock issues... again Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> Co-authored-by: sureshshsv <45676320+sureshshsv@users.noreply.github.com> Co-authored-by: sureshshsv <suresh.vannan@jpl.nasa.gov> * extend -e option to handle regular expressions formerly, -e could not handle PTM_\d+ extensions without the user explicitly calling all of them. --------- Co-authored-by: mike-gangl <59702631+mike-gangl@users.noreply.github.com> Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> Co-authored-by: sureshshsv <45676320+sureshshsv@users.noreply.github.com> Co-authored-by: sureshshsv <suresh.vannan@jpl.nasa.gov> * added dcoumentation and tests for regex * converted defaults to regexes, added gtiff test --------- Co-authored-by: Peter Mao <peter.mao@gmail.com> Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> Co-authored-by: sureshshsv <45676320+sureshshsv@users.noreply.github.com> Co-authored-by: sureshshsv <suresh.vannan@jpl.nasa.gov> * closes 118. retries was never hit because range is not end inclusive. (podaac#119) * closes 118. retries was never hit ebcause range is not end inclusive. * updated test to catch now-thrown exception * added --dry-run option, docs, and test cases (podaac#124) * added --dry-run option, docs, and test cases * Update subscriber/podaac_data_downloader.py Added more elegant way of download limit application Co-authored-by: Stepheny Perez <skorper@users.noreply.github.com> --------- Co-authored-by: Stepheny Perez <skorper@users.noreply.github.com> * Issues/70 (podaac#117) * added code for updating version * added chagnelog * moved version check into __main__ instead of on import of the module * added sorting of releases from github to find latest release. * added authenticated (option) access to github API to rpevent rate limiting * separate out auth/token regression tests * Issues/127 (podaac#128) * added token sensitivity filter to remove tokens from CMR queries * added changelog updates * updated some lingering merge issues (huh?) * updated regression test * updated ubuntu versions * removed 18.04 ubuntu from workflows/actions * version and documentation updates (podaac#130) * 1.13.1 changelog and dependecny updates * fixed formatting from unsaved merges --------- Co-authored-by: Frank Greguska <Francis.Greguska@jpl.nasa.gov> Co-authored-by: Frank Greguska <89428916+frankinspace@users.noreply.github.com> Co-authored-by: sureshshsv <45676320+sureshshsv@users.noreply.github.com> Co-authored-by: sureshshsv <suresh.vannan@jpl.nasa.gov> Co-authored-by: Peter Mao <peter.mao@gmail.com> Co-authored-by: Stepheny Perez <skorper@users.noreply.github.com>
Co-authored-by: Stepheny Perez <skorper@users.noreply.github.com>
Here's some evidence that these updates still have the expected outcome after fixes caught by @skorper:
|
@jjmcnelis A few more things..
|
Thanks for your patience, @skorper. I'm out of my element.. I made edits to each of CHANGELOG.md, Downloader.md, and to the help text for the -gr option inside podaac_data_downloader.py to expand on its use with wildcard patterns. The Downloader.md links CMR Search API docs describing this wildcard search feature, which functions in exactly the same way thru our tool as it does for the REST API parameters (many more of which are supported than just Granule UR, but this one is the most useful to expose to users thru downloader tool IMO). Let me know if these updates don't meet our standards and I'll take another shot at it right away, thanks again |
Thank you @jjmcnelis ! The last thing would be to resolve merge conflicts, then I will approve 🙂 |
Shailen expressed a need for this capability in SWOTCalVal (indirectly, this seems like the most straightforward way to support selective downloading w/o a dedicated UMM field to query on).
I added 2 lines to the subscriber/podaac_data_downloader.py script to allow for CMR wildcard functionality to be supported through the existing options. This adds the wildcard pattern option to request parameters whenever the user gives a granuleur containing '*' or '?':
This supports Shailen's use case where he wants to selectively download granules by campaign (SWOTCalVal). Here are the invocations for two example cases --
WM
andTM
campaigns, based on prior knowledge of the filename convention:$ python subscriber/podaac_data_downloader.py -c SWOTCalVal_GNSS_L2_1.0 -gr 'SWOTCalVal_??_GNSS_L2_*' -d ./data/ [2023-06-14 14:12:12,727] {podaac_data_downloader.py:270} INFO - Found 2 total files to download [2023-06-14 14:12:19,628] {podaac_data_downloader.py:313} INFO - 2023-06-14 14:12:19.628547 SUCCESS: https://archive.swot.podaac.earthdata.nasa.gov/podaac-swot-ops-cumulus-protected/SWOTCalVal_GNSS_L2_1.0/SWOTCalVal_T2_GNSS_L2_Rec11_20230201T221500_20230201T232230_20230227T220903.nc [2023-06-14 14:12:21,952] {podaac_data_downloader.py:313} INFO - 2023-06-14 14:12:21.952641 SUCCESS: https://archive.swot.podaac.earthdata.nasa.gov/podaac-swot-ops-cumulus-protected/SWOTCalVal_GNSS_L2_1.0/SWOTCalVal_WM_GNSS_L2_Rec2_20220729T222100_20220730T023300_20230227T211845.nc [2023-06-14 14:12:21,952] {podaac_data_downloader.py:324} INFO - Downloaded Files: 2 [2023-06-14 14:12:21,952] {podaac_data_downloader.py:325} INFO - Failed Files: 0 [2023-06-14 14:12:21,952] {podaac_data_downloader.py:326} INFO - Skipped Files: 0 [2023-06-14 14:12:22,329] {podaac_data_downloader.py:334} INFO - END
WM
campaign, ...:$ python subscriber/podaac_data_downloader.py -c SWOTCalVal_GNSS_L2_1.0 -gr 'SWOTCalVal_WM_GNSS_L2_*' -d ./data/ [2023-06-14 14:12:29,910] {podaac_data_downloader.py:270} INFO - Found 1 total files to download [2023-06-14 14:12:35,532] {podaac_data_downloader.py:313} INFO - 2023-06-14 14:12:35.532384 SUCCESS: https://archive.swot.podaac.earthdata.nasa.gov/podaac-swot-ops-cumulus-protected/SWOTCalVal_GNSS_L2_1.0/SWOTCalVal_WM_GNSS_L2_Rec2_20220729T222100_20220730T023300_20230227T211845.nc [2023-06-14 14:12:35,532] {podaac_data_downloader.py:324} INFO - Downloaded Files: 1 [2023-06-14 14:12:35,532] {podaac_data_downloader.py:325} INFO - Failed Files: 0 [2023-06-14 14:12:35,532] {podaac_data_downloader.py:326} INFO - Skipped Files: 0 [2023-06-14 14:12:35,845] {podaac_data_downloader.py:334} INFO - END
This needs further testing by someone besides me.