-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale/offset with raster:bands.{scale|offset}
#55
Comments
There are some important implications to supporting this use case.
Related to that is scaling information that is supplied in the raster file itself (depending on a format these things can have various levels of being "standard"). For example netcdf reads with GDAL ignore scaling info and return raw pixel values, while reads from the same file with netcdf library return scaled pixels. I'm not sure if TIFF has any "standard" tags like that. |
Agreed.
Does |
no, unless rasterio/gdal do it, which I'm pretty sure they don't for COG/netcdf, but who knows what other gdal drivers do or don't do
there is "disagree", but then there is also "stac does not include it but data does". |
Yeah, I'd say if rasterio+GDAL isn't applying scale+offset for the "common" data formats, we can't expect to protect against some other GDAL reader applying it under the hood. So I think the answer is "use |
issue is that we can't tell if STAC data is "just a description of a storage scheme" or "description and instruction to convert at load time". For formats like COG there is no "automatic" scaling at read-time done by GDAL, but some tiledb or what not might perform that scaling as part of the read, but it might also be useful to record that metadata in STAC. I guess having "read without further scaling" is enough to deal with those cases for now, later on if it becomes a problem we can change default based on format of the source. |
With the new Element84 Sentinel-2 collection, it was decided that
Is this the time to implement the scale/offset application? Or perhaps at least mention @gadomski's approach here? Alternatively, should this be pushed further down the stack (pun intended) to e.g. EDIT: With @gadomski's solution, there is a small caveat of:
|
@idantene my understanding is that right now this collection is in a bit of flux (this is from what I hear from other users, I have not been working with this data recently): some items have pixel scaling applied to data and some do not, and this makes data loading particularly tricky.
Not sure what you mean by that, as My preference here is to either:
Most flexible solution would have to:
Right now work is underway to refactor loading parts of |
Pinging @robbibt as it relates to "unifying odc-stac/datacube-core", and @woodcockr because of "virtual product light". |
Working with all our EO workflows the only sensible option is "possibly allow user hook to decide per-item what scaling should be used". In part because the use data provider metadata is required and is also processing version dependent (the absence of a "scale and offset applied flag" may require knowing what processing version you are dealing with (its implicit). which is where we are heading with the ODC hyperspectral native loader support currently prototyped in odc-stac but to be added to datacube-code once we get the kinks out. |
Thanks (once again) for the very detailed reply @Kirill888! All of that sounds great, and I guess I'll just wait for future updates (/subscribe 😉).
Sorry, I was being a bit laconic there. I meant that maybe |
That's correct — Element 84 is building a new Sentinel 2 L2A collection,
@idantene can you provide more detail on the desired behavior here? E.g. an example? Thanks! |
Sure @gadomski - we currently apply a similar filtering/transformation internally - but consider:
Our current approach is e.g. def fetch_sentinel2_timeseries(..., item_filter: = None):
# ...
query = stac_client.search(...)
try:
items = query.item_collection()
except Exception as e: # noqa, capture any fetching exceptions and raise them as runtime errors
raise RuntimeError from e
if item_filter is not None:
items = [item for item in items if item_filter(item)]
# ... For the time being, the only |
To me, pystac-client is meant to be a relatively general tool to fetch data from a STAC API. What you're describing (to me) feels more use-case specific, and is already supported more-or-less by the workflow you describe. Happy to continue chatting about desired behaviors, but we should probably move the discussion over to an issue on https://github.com/stac-utils/pystac-client to avoid the noise here on odc-stac. As an aside, I would recommend using item_search = Client.open(...).search(...)
items = [item for item in item_search.items() if item_filter(item)] |
The raster extension provides scale and offset information. These scales and offsets should be applied to loaded data, if available. Right now, we're having to work around it like this:
Full notebook example that includes the workaround in Cell 4 here.
The text was updated successfully, but these errors were encountered: