Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stacking along bands #19

Merged
merged 12 commits into from
Jul 30, 2019
Merged

Stacking along bands #19

merged 12 commits into from
Jul 30, 2019

Conversation

jsignell
Copy link
Member

First pass at implementing stacking. This PR does the stacking over the band dimension. I think that creating a dataset from the different bands is probably straightforward enough after you have the xarray objects. See the example notebook to try it out. Works like:

Screen Shot 2019-07-16 at 4 29 37 PM

Some notes:

  • I am currently doing a join on title keys to populate the the description on the entry.
  • I am allowing joins as long as the type matches, so I'm not checking that the resolution is the same which could result in sparse arrays.

Closes #12

@jhamman
Copy link
Collaborator

jhamman commented Jul 17, 2019

Thanks @jsignell! This looks really cool. Ping @matthewhanson and @scottyhq for some inital thoughts on the functionality here.

intake_stac/catalog.py Outdated Show resolved Hide resolved
if item['path_as_pattern'] != pattern:
raise ValueError(
'Band stacking failed because band url did not '
'generalize')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this error could also use a test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is harder to test because usually hrefs contain the band key in the url at a predictable location, but this is testing for the case where the key is not at a fixed place in the url. I would have to write a whole new broken catalog to test for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did improve the wording though to hopefully make it clearer what is going on.

Copy link
Collaborator

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just a few nits about tests/errors. I really want @matthewhanson and @scottyhq to take a look here to get their take on the api.

@scottyhq
Copy link
Collaborator

This is great, thanks for getting the ball rolling @jsignell ! I hope to have more time in the coming months to dive into intake-stac more. A few thoughts come to mind quickly looking this over.

Each STAC Item can have different band keys. I think we'll have to enumerate them when loading a Catalog to validate the inputs to stack_bands(). For example the Landsat-8 catalog has 'B1', "B2'... etc, but Sentinel-2 has "B02_10m", "B03_10m", "B04_20m" etc. - see more here https://github.com/radiantearth/stac-spec/tree/master/item-spec/examples).

Also, what should we do if two bands are requested that have different pixel postings (aka resolution)? throw an error or resample to a common grid? In the example above the posting is encoded in the name for Sentinel-2 Items, but not for Landsat8, so you'd need to either read the file metadata or have a lookup table:

Band Number µm Resolution
1 0.433–0.453 30 m
2 0.450–0.515 30 m
3 0.525–0.600 30 m
4 0.630–0.680 30 m
5 0.845–0.885 30 m
6 1.560–1.660 30 m
7 2.100–2.300 30 m
8 0.500–0.680 15 m
9 1.360–1.390 30 m
10 10.6-11.2 100 m
11 11.5-12.5 100 m

https://landsat.gsfc.nasa.gov/landsat-8/landsat-8-bands/

@jsignell
Copy link
Member Author

Each STAC Item can have different band keys.

Right the user would explicitly pass a list of the band keys. I guess if they don't pass such a list we could stack all the bands. Would that be useful?

Also, what should we do if two bands are requested that have different pixel postings (aka resolution)? throw an error or resample to a common grid?

Right now we generate a sparse grid, but not sure if that is smart or not. It is at least correct. But maybe the user would expect some warning instead? That would be harder to implement, but I guess that is the type of metadata that STAC encapsulates, so maybe not impossible.

@matthewhanson
Copy link
Collaborator

Thanks for this @jsignell , overall the functionality looks good, a couple notes:

  • in sat-stac, you can also retrieve an asset by common_name of the band. That is, if an asset contains a single band, and that single band has a common_name, such as "red", "green", "nir", "swir16", then you can get the asset information with Item.asset(key), e.g., Item.asset("red"). Being able to specify the asset by the actual asset key OR by common name would be a valuable feature

  • There are several scenarios which different assets might not be on the same grid. For instance, with wide swath SAR data the entire 'scene' may be broken up into sub-swaths that cover different areas, assets could potentially be in different projections, or as pointed out, the resolutions could be different.

One approach could be to validate that the total size in pixels of stacked bands are equal in both dimensions. This is mostly guaranteed to catch most of the problem cases above, however it also means retrieving header information from the assets in order to validate. A simpler easier approach might be just to check the resolution of the bands that are stacked (from the "eo:bands" property) and only allow equal res bands to be stacked.

For now especially I think the responsibility for ensuring assets are stackable is up to the user, and we shouldn't worry too much about including a lot of validation.

  • In the case where assets contain multiple bands, and two assets are stacked, I think this should still be possible (not sure if it is in this implementation?). So if there are 3 bands in each asset (perhaps RGB and NIR + SWIR bands) then the result should be a 6 band array.

In the next few weeks some of my other project work will be winding down and I'll finally get a chance to spend more time on this and will start playing with this library. Drew here at E84 is also working on the API interface (#11)

@jsignell
Copy link
Member Author

in sat-stac, you can also retrieve an asset by common_name of the band

I added the ability to stack using common_name as the name of the band

A simpler easier approach might be just to check the resolution of the bands that are stacked (from the "eo:bands" property) and only allow equal res bands to be stacked.

I have implemented that, but with a kwarg to disable (regrid=True) if people really do want to stack bands on different resolutions.

In the case where assets contain multiple bands, and two assets are stacked, I think this should still be possible (not sure if it is in this implementation?).

I don't quite follow what that would look like and I haven't seen an example in the test catalogs. Could you point me to one?

@jsignell
Copy link
Member Author

Does this look ready for merge or are there more cases that I need to consider?

@matthewhanson
Copy link
Collaborator

This looks good to me.

WRT to assets with multiple bands, I actually don't think there's an example catalog with assets that contain multiple bands. However, there is the potential that an asset could be a datafile containing multiple bands. I think we can push that feature until there's catalogs that require it.

@matthewhanson
Copy link
Collaborator

Oh there is one more thing @jsignell , can you update the target branch to be develop rather than master? master is the most recently released version, while develop is the most recent version of the code.

@jsignell jsignell changed the base branch from master to develop July 30, 2019 15:23
@jsignell
Copy link
Member Author

Ok I changed the target branch to develop. Looks like I don't have the rights to restart the travis job. Someone else might want to do that to check that it works as expected with the change in target branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants