-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inherit any existing rasterio environment during stack
#133
Comments
I just found this issue after working on a NASA deployed jupyterhub instance that is able to access data on s3 without any additional configuration - I can do As a workaround I can pass the default env into gdal_env = stackstac.DEFAULT_GDAL_ENV.updated(always=dict(session=rio.session.AWSSession(boto3.Session()))) Does that seems like something that can be upstreamed into stackstac? Happy to open a PR if so. |
Update: I just tried to use distributed with this setup and unsurprisingly the session is not picklable. |
+1 for inheriting rasterio environment! This week I came across a weird case where I needed to read data from two S3 sources, each with different access credentials (a company bucket and a NASA bucket). Unfortunately, something about the AWS credentials that I passed to I have my company AWS access credentials stored in environment variables which has never failed me but when I add separate credentials into the mix via To access the NASA data directly from S3, you can get a set of temporary S3 credentials with your Earthdata login credentials. I figured out that I could pass those credentials to I can't produce a truly reproducible example with the private bucket situation, but here is what I am seeing:
import boto3
import pystac
import rasterio
import requests
import stackstack
items = pystac.ItemCollection(...)
# the items describe image assets in a private bucket that I can access with
# AWS credentials stored in environment variables
stack = stackstac.stack(items=items)
nasa_items = pystac.ItemCollection(...)
# request AWS credentials for direct read access
netrc_creds = {}
with open(os.path.expanduser("~/.netrc")) as f:
for line in f:
key, value = line.strip().split(" ")
netrc_creds[key] = value
url = requests.get(
"https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials",
allow_redirects=False,
).headers["Location"]
creds = requests.get(
url, auth=(netrc_creds["login"], netrc_creds["password"])
).json()
nasa_stack = stackstac.stack(
items=nasa_items,
gdal_env=stackstac.DEFAULT_GDAL_ENV.updated(
always=dict(
session=rasterio.session.AWSSession(
boto3.Session(
aws_access_key_id=creds["accessKeyId"],
aws_secret_access_key=creds["secretAccessKey"],
aws_session_token=creds["sessionToken"],
region_name="us-west-2",
)
)
)
)
)
items = pystac.ItemCollection(...)
# the items describe image assets in a private bucket that I can access with
# AWS credentials stored in environment variables
stack = stackstac.stack(items=items) This fails with AWS access denied errors! Maybe I am setting up A very basic read operation using hls_tif = "s3://lp-prod-protected/HLSL30.020/HLS.L30.T15UXP.2022284T165821.v2.0/HLS.L30.T15UXP.2022284T165821.v2.0.Fmask.tif"
private_tif = "s3://private-bucket/lol.tif"
# read from NASA
with rasterio.Env(
session=boto3.Session(
aws_access_key_id=creds["accessKeyId"],
aws_secret_access_key=creds["secretAccessKey"],
aws_session_token=creds["sessionToken"],
region_name="us-west-2",
)
):
with rasterio.open(hls_tif) as src:
print(src.profile)
# read from private bucket
with rasterio.Env(
session=boto3.Session(
aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY"),
region_name="us-east-1",
)
):
with rasterio.open(private_tif) as src:
print(src.profile)
# read from NASA again
with rasterio.Env(
session=boto3.Session(
aws_access_key_id=creds["accessKeyId"],
aws_secret_access_key=creds["secretAccessKey"],
aws_session_token=creds["sessionToken"],
region_name="us-west-2",
)
):
with rasterio.open(hls_tif) as src:
print(src.profile) My workaround for now is to do all of the work in my original private bucket first, then do the work in the NASA bucket afterwards. It works but it is not a satisfying solution! |
Does it work if you have different sessions? |
In #132 I noticed the snippet:
which doesn't currently work the way you'd expect (the environment settings you've just created will be ignored at compute time), but might be a pretty intuitive way to set extra GDAL options without mucking around with
LayeredEnv
s and the defaults.We could even deprecate support for passing in a
LayeredEnv
directly, since it's far more complexity that most users would need, and erring on the side of fewer options is usually better.There's some complexity around the fact that theoretically different types of Readers are supported, though in practice this is not at all true. Nonetheless, it might be worth extending the
Reader
protocol to expose either aDEFAULT_ENV: ClassVar[LayeredEnv]
or aget_default_env() -> LayeredEnv
classmethod.Then ultimately, within
items_to_dask
, we'd pull the default env for the specified reader type, and merge it with any currently-set options (viario.env.getenv()
).The text was updated successfully, but these errors were encountered: