Skip to content

Commit

Permalink
Merge branch 'develop' into smap_investigation
Browse files Browse the repository at this point in the history
  • Loading branch information
mike-gangl committed Nov 15, 2021
2 parents 481c3da + a9daf76 commit 531d168
Show file tree
Hide file tree
Showing 6 changed files with 62 additions and 27 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ name: Python Build

on:
push:
branches: [ main ]
branches: [ main, develop ]
pull_request:
branches: [ main ]

Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)

## [1.7.0]
### Added
- Added ability to call a process on downlaoded files. [Thank to Joe Sapp](https://github.com/sappjw)

### Changed
- Turned -e option into 'additive' mode (multiple -e options allowed.) [Thanks to Joe Sapp](https://github.com/sappjw)

### Deprecated
### Removed
### Fixed
### Security

## [1.6.1]
### Added
- added warning for more than 2k granules
Expand Down
50 changes: 28 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,36 +33,38 @@ you should now have access to the subscriber CLI:

```
$> podaac-data-subscriber -h
usage: podaac-data-subscriber [-h] -c COLLECTION -d OUTPUTDIRECTORY [-m MINUTES] [-b BBOX] [-e [EXTENSIONS [EXTENSIONS ...]]] [-ds DATASINCE] [--version] [--verbose]
usage: podaac_data_subscriber.py [-h] -c COLLECTION -d OUTPUTDIRECTORY [-sd STARTDATE] [-ed ENDDATE] [-b BBOX] [-dc] [-dydoy] [-dymd] [-dy] [--offset OFFSET] [-m MINUTES]
[-e EXTENSIONS] [--process PROCESS_CMD] [--version] [--verbose] [-p PROVIDER]
optional arguments:
-h, --help show this help message and exit
-c COLLECTION, --collection-shortname COLLECTION
The collection shortname for which you want to retrieve data.
-d OUTPUTDIRECTORY, --data-dir OUTPUTDIRECTORY
The directory where data products will be downloaded.
-sd STARTDATE, --start-date STARTDATE
The ISO date time before which data should be retrieved. For Example, --start-date 2021-01-14T00:00:00Z
-ed ENDDATE, --end-date ENDDATE
The ISO date time after which data should be retrieved. For Example, --end-date 2021-01-14T00:00:00Z
-b BBOX, --bounds BBOX
The bounding rectangle to filter result in. Format is W Longitude,S Latitude,E Longitude,N Latitude without spaces. Due to an issue with parsing
arguments, to use this command, please use the -b="-180,-90,180,90" syntax when calling from the command line. Default: "-180,-90,180,90".
-dc Flag to use cycle number for directory where data products will be downloaded.
-dydoy Flag to use start time (Year/DOY) of downloaded data for directory where data products will be downloaded.
-dymd Flag to use start time (Year/Month/Day) of downloaded data for directory where data products will be downloaded.
-dy Flag to use start time (Year) of downloaded data for directory where data products will be downloaded.
--offset OFFSET Flag used to shift timestamp. Units are in hours, e.g. 10 or -10.
-m MINUTES, --minutes MINUTES
How far back in time, in minutes, should the script look for data. If running this script as a cron, this value should be equal to or greater than how often your
cron runs (default: 60 minutes).
-b BBOX, --bounds BBOX
The bounding rectangle to filter result in. Format is W Longitude,S Latitude,E Longitude,N Latitude without spaces. Due to an issue with parsing arguments, to use
this command, please use the -b="-180,-90,180,90" syntax when calling from the command line. Default: "-180,-90,180,90\.
-e [EXTENSIONS [EXTENSIONS ...]], --extensions [EXTENSIONS [EXTENSIONS ...]]
The extensions of products to download. Default is [.nc, .h5]
-sd STARTDATE, --start-date STARTDATE
The ISO date time before which data should be retrieved. For Example, --start-date 2021-01-14T00:00:00Z
-ed ENDDATE, --end-date ENDDATE
The ISO date time after which data should be retrieved. For Example, --end-date 2021-01-14T00:00:00Z
How far back in time, in minutes, should the script look for data. If running this script as a cron, this value should be equal to or greater than how
often your cron runs (default: 60 minutes).
-e EXTENSIONS, --extensions EXTENSIONS
The extensions of products to download. Default is [.nc, .h5, .zip]
--process PROCESS_CMD
Processing command to run on each downloaded file (e.g., compression). Can be specified multiple times.
--version Display script version information and exit.
--verbose Verbose mode.
-p PROVIDER, --provider PROVIDER
Specify a provider for collection search. Default is POCLOUD.
```

One can also call the python package directly:
Expand Down Expand Up @@ -95,7 +97,8 @@ For setting up your authentication, see the notes on the `netrc` file below.

Usage:
```
usage: podaac-data-subscriber [-h] -c COLLECTION -d OUTPUTDIRECTORY [-m MINUTES] [-b BBOX] [-e [EXTENSIONS [EXTENSIONS ...]]] [-ds DATASINCE] [--version]
usage: podaac_data_subscriber.py [-h] -c COLLECTION -d OUTPUTDIRECTORY [-sd STARTDATE] [-ed ENDDATE] [-b BBOX] [-dc] [-dydoy] [-dymd] [-dy] [--offset OFFSET]
[-m MINUTES] [-e EXTENSIONS] [--version] [--verbose] [-p PROVIDER]
```

To run the script, the following parameters are required:
Expand Down Expand Up @@ -165,7 +168,7 @@ CMR token successfully deleted
No data! What gives?! oh... because i'm not using any flags, I'm only looking back 60 minutes.

```
podaac-data-subscriber -c CYGNSS_L1_CDR_V1.0 -d myData -start-date 2021-02-25T00:00:00Z
podaac-data-subscriber -c CYGNSS_L1_CDR_V1.0 -d myData --start-date 2021-02-25T00:00:00Z
2021-07-29 14:33:11.249343 SUCCESS: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/CYGNSS_L1_CDR_V1.0/cyg03.ddmi.s20210228-000000-e20210228-235959.l1.power-brcs-cdr.a10.d10.nc
...
```
Expand Down Expand Up @@ -206,7 +209,7 @@ The subscriber allows the placement of downloaded files into one of several dire
To automatically run and update a local file system with data files from a collection, one can use a syntax like the following:

```
10 * * * * podaac-data-subscriber -c VIIRS_N20-OSPO-L2P-v2.61 -d /path/to/data/VIIRS_N20-OSPO-L2P-v2.61 -e .nc .h5 -m 60 -b="-180,-90,180,90" --verbose >> ~/.subscriber.log
10 * * * * podaac-data-subscriber -c VIIRS_N20-OSPO-L2P-v2.61 -d /path/to/data/VIIRS_N20-OSPO-L2P-v2.61 -e .nc -e .h5 -m 60 -b="-180,-90,180,90" --verbose >> ~/.subscriber.log
```

Expand All @@ -232,17 +235,20 @@ podaac-data-subscriber -c VIIRS_N20-OSPO-L2P-v2.61 -d ./data -b="-180,-90,180,90

### Setting extensions

Some collections have many files. To download a specific set of files, you can set the extensions on which downloads are filtered. By default, ".nc" and ".h5" files are downloaded by default.
Some collections have many files. To download a specific set of files, you can set the extensions on which downloads are filtered. By default, ".nc", ".h5", and ".zip" files are downloaded by default.

```
-e [EXTENSIONS [EXTENSIONS ...]], --extensions [EXTENSIONS [EXTENSIONS ...]]
The extensions of products to download. Default is [.nc, .h5]
-e EXTENSIONS, --extensions EXTENSIONS
The extensions of products to download. Default is [.nc, .h5, .zip]
```

An example of the -e usage:
An example of the -e usage- note the -e option is additive:
```
podaac-data-subscriber -c VIIRS_N20-OSPO-L2P-v2.61 -d ./data -e .nc .h5
podaac-data-subscriber -c VIIRS_N20-OSPO-L2P-v2.61 -d ./data -e .nc -e .h5
```
### run a post download process

Using the `--process` option, you can run a simple command agaisnt the "just" downloaded file. This will take the format of "<command> <path/to/file>". This means you can run a command like `--process gzip` to gzip all downloaded files. We do not support more advanced processes at this time (piping, running a process on a directory, etc).


### Changing how far back the script looks for data
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
long_description = fh.read()

setup(name='podaac-data-subscriber',
version='1.6.1',
version='1.7.0',
description='PO.DAAC Data Susbcriber Command Line Tool',
url='https://github.com/podaac/data-subscriber',
long_description=long_description,
Expand Down
21 changes: 19 additions & 2 deletions subscriber/podaac_data_subscriber.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,12 @@
import os
from os import makedirs
from os.path import isdir, basename, join, splitext
import subprocess
from urllib.parse import urlencode
from urllib.request import urlopen, urlretrieve
from datetime import datetime, timedelta

__version__ = "1.6.1"
__version__ = "1.7.0"

LOGLEVEL = os.environ.get('SUBSCRIBER_LOGLEVEL', 'WARNING').upper()
logging.basicConfig(level=LOGLEVEL)
Expand Down Expand Up @@ -207,7 +208,9 @@ def create_parser():
parser.add_argument("--offset", dest="offset", help = "Flag used to shift timestamp. Units are in hours, e.g. 10 or -10.") # noqa E501

parser.add_argument("-m", "--minutes", dest="minutes", help = "How far back in time, in minutes, should the script look for data. If running this script as a cron, this value should be equal to or greater than how often your cron runs (default: 60 minutes).", type=int, default=60) # noqa E501
parser.add_argument("-e", "--extensions", dest="extensions", help = "The extensions of products to download. Default is [.nc, .h5, .zip]", default=[".nc", ".h5", ".zip"], nargs='*') # noqa E501
parser.add_argument("-e", "--extensions", dest="extensions", help = "The extensions of products to download. Default is [.nc, .h5, .zip]", default=None, action='append') # noqa E501
parser.add_argument("--process", dest="process_cmd", help = "Processing command to run on each downloaded file (e.g., compression). Can be specified multiple times.", action='append')


parser.add_argument("--version", dest="version", action="store_true",help="Display script version information and exit.") # noqa E501
parser.add_argument("--verbose", dest="verbose", action="store_true",help="Verbose mode.") # noqa E501
Expand Down Expand Up @@ -244,6 +247,7 @@ def run():

short_name = args.collection
extensions = args.extensions
process_cmd = args.process_cmd

data_path = args.outputDirectory
# You should change `data_path` to a suitable download path on your file system.
Expand Down Expand Up @@ -387,6 +391,8 @@ def run():


#filter list based on extension
if not extensions:
extensions = [".nc", ".h5", ".zip"]
filtered_downloads = []
for f in downloads:
for extension in extensions:
Expand Down Expand Up @@ -491,6 +497,16 @@ def prepare_cycles_output(data_cycles, prefix, file):
write_path = join(prefix, cycle_dir, basename(file))
return write_path

def process_file(output_path):
if not process_cmd:
return
else:
for cmd in process_cmd:
if args.verbose:
print(f'Running: {cmd} {output_path}')
subprocess.run(cmd.split() + [output_path],
check=True)

for f in downloads:
try:
for extension in extensions:
Expand All @@ -506,6 +522,7 @@ def prepare_cycles_output(data_cycles, prefix, file):
output_path = prepare_cycles_output(
cycles, data_path, f)
urlretrieve(f, output_path)
process_file(output_path)
print(str(datetime.now()) + " SUCCESS: " + f)
success_cnt = success_cnt + 1
except Exception as e:
Expand Down
2 changes: 1 addition & 1 deletion tests/test_subscriber.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def test_validate():
a = validate(["-c", "viirs", "-d", "/data", "-b=-180,-90,180,90", "-m", "100"])
assert a.minutes == 100, "should equal 100"

a = validate(["-c", "viirs", "-d", "/data", "-b=-180,-90,180,90", "-e", ".txt", ".nc"])
a = validate(["-c", "viirs", "-d", "/data", "-b=-180,-90,180,90", "-e", ".txt", "-e", ".nc"])
assert ".txt" in a.extensions
assert ".nc" in a.extensions

Expand Down

0 comments on commit 531d168

Please sign in to comment.