-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cli 017 rw1 glacier locations #261
Open
jwinik
wants to merge
10
commits into
master
Choose a base branch
from
cli_017_rw1_glacier_locations
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 7 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
738a061
Initial commit
jwinik 946e0fc
renamed columns in both shapefiles
jwinik 78b9552
updated readme
jwinik 574de84
correcting S3 upload
jwinik 6a02184
updated READme
jwinik 1b11c63
Added reviewer bio
jwinik 5920be6
Updated methodology
jwinik 6c7f62a
Apply suggestions from code review
jwinik 67edfd9
Apply suggestions from code review
jwinik f4a3be1
Apply suggestions from code review
jwinik File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
## {Resource Watch Public Title} Dataset Pre-processing | ||
This file describes the data pre-processing that was done to [the Glacier Locations](http://glims.colorado.edu/glacierdata/) for [display on Resource Watch](https://resourcewatch.org/data/explore/cli017-Glacier-Extents_replacement?section=All+data&selectedCollection=&zoom=3&lat=0&lng=0&pitch=0&bearing=0&basemap=dark&labels=light&layers=%255B%257B%2522dataset%2522%253A%2522ad218d82-058b-4b8e-b790-44fb6d4b531f%2522%252C%2522opacity%2522%253A1%252C%2522layer%2522%253A%25221ab0f13b-b3cf-46fb-add5-2b802df9a9eb%2522%257D%255D&aoi=&page=1&sort=most-viewed&sortDirection=-1&topics=%255B%2522glacier%2522%255D). | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The source provided the two shapefiles in a zipped folder. | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
1. Glims Glacier Locations (points) | ||
2. Glims Glaceir Extent (polygons) | ||
|
||
Below, we describe the steps used to download the shapefiles and format them to upload to Carto. | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
1. Download the zipped folder and import the shapefiles as geopandas dataframes. | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
2. Rename columns to match Carto table and delete unnecessary columns. | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
3. Reuplod to Carto and Resource Watch. | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
``` | ||
Include any SQL or GEE code you used in a code snippet. | ||
``` | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Please see the [Python script](https://github.com/resource-watch/data-pre-processing/tree/cli_017_rw1_glacier_locations/cli_017_rw1_glacier_locations) for more details on this processing. | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
You can view the processed Glacier Locations dataset [on Resource Watch](https://resourcewatch.org/data/explore/cli017-Glacier-Extents_replacement?section=All+data&selectedCollection=&zoom=3&lat=0&lng=0&pitch=0&bearing=0&basemap=dark&labels=light&layers=%255B%257B%2522dataset%2522%253A%2522ad218d82-058b-4b8e-b790-44fb6d4b531f%2522%252C%2522opacity%2522%253A1%252C%2522layer%2522%253A%25221ab0f13b-b3cf-46fb-add5-2b802df9a9eb%2522%257D%255D&aoi=&page=1&sort=most-viewed&sortDirection=-1&topics=%255B%2522glacier%2522%255D). | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/cli_017_glacier_extent.zip), or [from the source website](http://www.glims.org/download/). | ||
|
||
###### Note: This dataset processing was done by [Jason Winik](https://www.wri.org/profile/jason-winik), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder). | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
111 changes: 111 additions & 0 deletions
111
cli_017_rw1_glacier_locations/cli_017_rw1_glacier_locations.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
import pandas as pd | ||
import geopandas as gpd | ||
import urllib | ||
import glob | ||
import requests | ||
import os | ||
import sys | ||
utils_path = os.path.join(os.path.abspath(os.getenv('PROCESSING_DIR')),'utils') | ||
if utils_path not in sys.path: | ||
sys.path.append(utils_path) | ||
import util_files | ||
import util_cloud | ||
import util_carto | ||
import logging | ||
from zipfile import ZipFile | ||
|
||
# Set up logging | ||
# Get the top-level logger object | ||
logger = logging.getLogger() | ||
for handler in logger.handlers: logger.removeHandler(handler) | ||
logger.setLevel(logging.INFO) | ||
# make it print to the console. | ||
console = logging.StreamHandler() | ||
logger.addHandler(console) | ||
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s') | ||
|
||
# name of table on Carto where you want to upload data | ||
# this should be a table name that is not currently in use | ||
dataset_name = 'cli_017_rw1_glacier_locations' #check | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
logger.info('Executing script for dataset: ' + dataset_name) | ||
# create a new sub-directory within your specified dir called 'data' | ||
# within this directory, create files to store raw and processed data | ||
data_dir = util_files.prep_dirs(dataset_name) | ||
|
||
''' | ||
Download data and save to your data directory | ||
''' | ||
# download the data from the source | ||
url = "https://www.glims.org/download/latest" | ||
raw_data_file = os.path.join(data_dir,os.path.basename(url)+'.zip') | ||
r = urllib.request.urlretrieve(url, raw_data_file) | ||
# unzip source data | ||
raw_data_file_unzipped = raw_data_file.split('.')[0] | ||
zip_ref = ZipFile(raw_data_file, 'r') | ||
zip_ref.extractall(raw_data_file_unzipped) | ||
zip_ref.close() | ||
|
||
''' | ||
Process Data | ||
''' | ||
# load in the polygon shapefile | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
shapefile = glob.glob(os.path.join(raw_data_file_unzipped,'glims_download_82381', 'glims_p*.shp')) | ||
gdf_points = gpd.read_file(shapefile[0]) | ||
gdf_extent = gpd.read_file(shapefile[1]) | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#rename columns points | ||
gdf_points.columns = ['the_geom' if x == 'geometry' else x for x in gdf_points.columns] | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#rename columns extent | ||
extent_col_change = {'length': 'glacier_length', 'geometry': 'the_geom'} | ||
gdf_extent.columns = [extent_col_change.get(x,x) for x in gdf_extent.columns] | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#remove excess extent columns | ||
columns_to_remove = ['loc_unc_x', 'loc_unc_y', 'glob_unc_x', 'glob_unc_y'] | ||
gdf_extent = gdf_extent.drop(columns_to_remove,axis = 1) | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#set the geometry of gdf_points and gdf_extent | ||
gdf_points = gdf_points.set_geometry('the_geom') | ||
gdf_extent = gdf_extent.set_geometry('the_geom') | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# save processed dataset to shapefile | ||
processed_data_points = os.path.join(data_dir, dataset_name +'_locations.shp') | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
gdf_points.to_file(processed_data_points,driver='ESRI Shapefile') | ||
|
||
processed_data_extent = os.path.join(data_dir, dataset_name +'_extent.shp') | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
gdf_extent.to_file(processed_data_extent,driver='ESRI Shapefile') | ||
|
||
processed_files = [processed_data_extent, processed_data_points] | ||
|
||
''' | ||
Upload processed data to Carto | ||
''' | ||
logger.info('Uploading processed data to Carto.') | ||
util_carto.upload_to_carto(processed_data_points, 'LINK') | ||
util_carto.upload_to_carto(processed_data_extent, 'LINK') | ||
|
||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
''' | ||
Upload original data and processed data to Amazon S3 storage | ||
''' | ||
# initialize AWS variables | ||
aws_bucket = 'wri-public-data' | ||
s3_prefix = 'resourcewatch/' | ||
|
||
logger.info('Uploading original data to S3.') | ||
# Upload raw data file to S3 | ||
|
||
# Copy the raw data into a zipped file to upload to S3 | ||
raw_data_dir = os.path.join(data_dir, dataset_name+'.zip') | ||
with ZipFile(raw_data_dir,'w') as zip: | ||
zip.write(raw_data_file, os.path.basename(raw_data_file)) | ||
#Upload raw data file to S3 | ||
uploaded = util_cloud.aws_upload(raw_data_dir, aws_bucket, s3_prefix+os.path.basename(raw_data_dir)) | ||
|
||
logger.info('Uploading processed data to S3.') | ||
# Copy the processed data into a zipped file to upload to S3 | ||
processed_data_dir = os.path.join(data_dir, dataset_name+'_edit.zip') | ||
with ZipFile(processed_data_dir,'w') as zip: | ||
zip.write(processed_files, os.path.basename(processed_files)) | ||
jwinik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Upload processed data file to S3 | ||
uploaded = util_cloud.aws_upload(processed_data_dir, aws_bucket, s3_prefix+os.path.basename(processed_data_dir)) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the RW public title?