Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download GPM data on TACC #1

Open
mosoriob opened this issue May 7, 2021 · 5 comments
Open

Download GPM data on TACC #1

mosoriob opened this issue May 7, 2021 · 5 comments

Comments

@mosoriob
Copy link
Contributor

mosoriob commented May 7, 2021

We need to download the GPM Data

  • @dnfeldman Do we have a script for GPM data?
  • If not, @khider what is the endpoint? how we can download it?
@dnfeldman
Copy link
Collaborator

@mosoriob I have a very hacky script to download gpm data; it basically generates a curl commands for each file that needs to be downloaded and writes them into a bash script file. Since this was a (supposedly) a one-time effort, it should probably be rewritten in a more maintainable manner. In the meantime, let me know if this will suffice for now?

import datetime

earthdata_username = 'read from ENV variable'
earthdata_password = 'read from ENV variable'

def generate_download_links_for_date(input_date, download_dir):
	day_commands = []

	day_of_year = input_date.strftime("%j")
	year = input_date.strftime("%Y")
	date_str = input_date.strftime("%Y%m%d")

	start = datetime.datetime(input_date.year, input_date.month, input_date.day, 0, 0, 0)
	num_thirty_min_intervals = 24 * 2

	for i in range(num_thirty_min_intervals):
		interval_start = start + datetime.timedelta(minutes=30*i)
		interval_end = start + datetime.timedelta(minutes=30*(i+1)) - datetime.timedelta(seconds=1)

		interval_start_str = interval_start.strftime("%H%M%S")
		interval_end_str = interval_end.strftime("%H%M%S")

		minutes_str = str(30*i).zfill(4)

		url_prefix = f"https://gpm1.gesdisc.eosdis.nasa.gov/opendap/hyrax/GPM_L3/GPM_3IMERGHHE.06/{year}/{day_of_year}"
		filename = f"3B-HHR-E.MS.MRG.3IMERG.{date_str}-S{interval_start_str}-E{interval_end_str}.{minutes_str}.V06B.HDF5.nc4"
		download_url = f"{url_prefix}/{filename}"
		download_target = f"{download_dir}/{day_of_year}/{filename}"


		curl_command = f"curl -n -c ~/.urs_cookies -b ~/.urs_cookies -L --url {download_url} --create-dirs -o {download_target}"

		day_commands.append(curl_command)


	return day_commands


commands = []

date_start = datetime.datetime.strptime("2014-08-01", "%Y-%m-%d")
date_end = datetime.datetime.strptime("2014-09-01", "%Y-%m-%d")

arya_download_dir = f"/data/mint/gpm_{date_start.strftime('%Y%m%d')}_{date_end.strftime('%Y%m%d')}"

delta_days = (date_end - date_start).days

for i in range(delta_days+1):
	cur_date = date_start + datetime.timedelta(days=i)

	commands += generate_download_links_for_date(cur_date, arya_download_dir)

netrc_string = f"machine urs.earthdata.nasa.gov login {earthdata_username}  password {earthdata_password}"

with open("download_gpm.sh", "w") as f:
	f.write("#!/bin/bash\n")
	f.write(f'''rm -f .netrc && touch .netrc && echo "{netrc_string}" >> .netrc && chmod 0600 .netrc''' + "\n")
	f.write('''rm -f .urs_cookies && touch .urs_cookies''' + "\n")
	f.write("\n".join(commands))

@khider
Copy link

khider commented May 11, 2021

This is the 30min one? So the one that takes quite a bit of time, correct?

we also need CHIRPS

Endpoint: https://data.chc.ucsb.edu/products/CHIRPS-2.0/africa_6-hourly/

Do we have a script for that one as well?

@dnfeldman
Copy link
Collaborator

^ Yeah, it's the 30 min one and yeah, it usually takes a bit of time to download (~10-30s per file)

And we don't have download scripts for CHIRPS; UCSB folks were pushing data to the data catalog directly.

@khider
Copy link

khider commented May 12, 2021

Since they left the program, I'm assuming they are no longer pushing anything?

@dnfeldman
Copy link
Collaborator

yeah, it doesn't look like there has been any new activity for over a year

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants