Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading STAC output from satsearch library #256

Closed
scottyhq opened this issue Jan 21, 2021 · 9 comments
Closed

Reading STAC output from satsearch library #256

scottyhq opened this issue Jan 21, 2021 · 9 comments

Comments

@scottyhq
Copy link

It appears pystac cannot currently read in the results from the https://github.com/sat-utils/sat-search library. Using the validation code in the pystac docs does not return any error, but i'm guessing the sat-search json returned is in fact not valid for some reason @matthewhanson or @lossyrob ?

import satsearch #v0.3.0
import pystac #v0.5.4

bbox = [35.48, -3.24, 35.58, -3.14]
dates = '2020-07-01/2020-08-15'
URL='https://earth-search.aws.element84.com/v0'
results = satsearch.Search.search(url=URL,
                                  collections=['sentinel-s2-l2a-cogs'],
                                  datetime=dates,
                                  bbox=bbox,    
                                  sort=['-properties.datetime'])

# 18 items found
items = results.items() #satstac.itemcollection.ItemCollection
print(len(items))
items.save('my-s2-l2a-cogs.json')

# validation returns empty list
import json
from pystac.validation import validate_dict
with open('my-s2-l2a-cogs.json') as f:
    js = json.load(f)
print(validate_dict(js))

# KeyError: 'links'
cat = pystac.read_file('my-s2-l2a-cogs.json')
@scottyhq
Copy link
Author

Full traceback:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-27-8e028b95925b> in <module>
     23 
     24 # KeyError: 'links'
---> 25 cat = pystac.read_file('my-s2-l2a-cogs.json')

~/miniconda3/envs/intake-stac-gui/lib/python3.7/site-packages/pystac/__init__.py in read_file(href)
     69         by the JSON read from the file located at HREF.
     70     """
---> 71     return STACObject.from_file(href)
     72 
     73 

~/miniconda3/envs/intake-stac-gui/lib/python3.7/site-packages/pystac/stac_object.py in from_file(cls, href)
    521 
    522         if cls == STACObject:
--> 523             o = STAC_IO.stac_object_from_dict(d, href=href)
    524         else:
    525             o = cls.from_dict(d, href=href)

~/miniconda3/envs/intake-stac-gui/lib/python3.7/site-packages/pystac/serialization/__init__.py in stac_object_from_dict(d, href, root)
     35 
     36     if info.object_type == STACObjectType.CATALOG:
---> 37         return Catalog.from_dict(d, href=href, root=root)
     38 
     39     if info.object_type == STACObjectType.COLLECTION:

~/miniconda3/envs/intake-stac-gui/lib/python3.7/site-packages/pystac/catalog.py in from_dict(cls, d, href, root)
    780     @classmethod
    781     def from_dict(cls, d, href=None, root=None):
--> 782         catalog_type = CatalogType.determine_type(d)
    783 
    784         d = deepcopy(d)

~/miniconda3/envs/intake-stac-gui/lib/python3.7/site-packages/pystac/catalog.py in determine_type(stac_json)
     54         self_link = None
     55         relative = False
---> 56         for link in stac_json['links']:
     57             if link['rel'] == 'self':
     58                 self_link = link

KeyError: 'links'

@matthewhanson
Copy link
Member

So it looks like the error is because the single file STAC that sat-search saves does not have a links object, which it should because it's supposed to be a valid STAC catalog.

I can cut a 0.3.1 release for this (probably need to add stac_version as well), but to make sure PySTAC will still read it as valid, can you try adding an empty links array (and stac_version) to the input file and try to read with PySTAC again?

@scottyhq
Copy link
Author

scottyhq commented Jan 22, 2021

Thanks @matthewhanson , following the example search results in this repo
https://raw.githubusercontent.com/stac-utils/pystac/develop/tests/data-files/examples/1.0.0-beta.2/extensions/single-file-stac/examples/example-search.json

And the test file here
https://github.com/stac-utils/pystac/blob/2e566060647db2711cb98dbe3ba68a5d3f9411ca/tests/extensions/test_single_file_stac.py

I was able to determine the following must be added to sat-search results at a minimum:

{
  "links": [],
  "id": "sat-search-results",
  "stac_version": "1.0.0-beta.2",
  "description": "sat search results",  
  "stac_extensions": [
    "single-file-stac"
  ],
  "type": "FeatureCollection",
.
.
.

Note there are other validation errors showing up against 1.0.0-beta.2 so I'd suggest a test over in sat-search that does the following (example using modified search output)

sfs = pystac.read_file('my-s2-l2a-cogs.json')
sfs.validate() #ValidationError: [-180, -90, 180, 90] is not of type 'object'

I added stac_version beta.2 because the validation schema isn't found for beta.1: Exception: Could not read uri https://schemas.stacspec.org/v1.0.0-beta.1/catalog-spec/json-schema/catalog.json

@scottyhq
Copy link
Author

@lossyrob I'm still a bit perplexed on the pystac side why single-file-stac behaves differently from pystac.Catalog, specifically iterating over items contained in the search results:

test_url = 'https://raw.githubusercontent.com/stac-utils/pystac/develop/tests/data-files/examples/1.0.0-beta.2/extensions/single-file-stac/examples/example-search.json'
sfs = pystac.read_file(test_url)
sfs.validate()
print(type(sfs)) #<class 'pystac.catalog.Catalog'>

# Does not iterate Items in SFS catalog
for f in sfs.get_all_items():
    print(f.id)
    
# Get Items from 'features' attribute instead
for f in sfs.ext['single-file-stac'].features:
    print(f.id) #LC80370332018039LGN00, LC80340332018034LGN00

@lossyrob
Copy link
Member

lossyrob commented Feb 1, 2021

@scottyhq the single file stac extension was changed at some point in the spec from being an independent object (an ItemCollection, back when it was part of the core spec) to inheriting from Catalog. I don't think there was a lot of thought put into how a single file stac catalog should override the behaviors of Catalog, and there's some complexities around it that make the expected implementation a bit tricky.

E.g. - since a single file STAC is a catalog, can it contain links to child items as well? What does that mean for what get_all_items should return? When you call 'add_item', would you expect it to be stated in the links like a regular catalog, or should it go into the features?

We had a STAC call today where single file STACs got brought up and there was a question about why the inherit from catalog, and if that should be the case. I feel like this is an instance where a single-file-stac extension clashes with the core Catalog types enough to make me feel like it shouldn't be a Catalog extension.

Keeping it as an extension, I'm not sure what to do here - the core functionality of "get_all_items" looks for any child links that are items and returns them. In this case, the items are populated in an extension field, and are considered distinct. To modify get_all_items to also return the features, we could add the ability for extensions to add to the list of items that the catalog gathers - that would take a bit of refactoring that to be honest feels a bit off. But, do-able.

Perhaps a better way to approach it would be to consider a single file stac as an ItemCollection and not a Catalog, and bake ItemCollection in as an additional STACObject type in PySTAC (which it was until it was dropped from the spec). We could have conversion logic from ItemCollection -> Catalog and Catalog -> ItemCollection, which would be more straightforward. This would make PySTAC a bit out of sync with the spec (e.g. it doesn't currently include any concepts from stac-api-spec, and single-file-stac is currently a Catalog extension but the way we would work with it is not extension-like). This might be better to go into a separate library - you're already using sat-search to get at the item collection, and there's been a lot of talk about a PySTAC-based stac API client that could serve those same needs.

Given that ItemCollection is a STAC API concept, I think my ideal conclusion would be:

  • single-file-stac is removed as an extension from Catalog, and instead an extension of ItemCollection
  • Working with single-file-stac was implemented in a PySTAC-based stac API library, and not encapsulated in PySTAC
  • In that other library, there'd be logic to convert a SingleFileSTAC <-> pystac.Catalog/Collection

In the short term, though, I'm not if you are blocked without a workaround, which I think we can come up with - besides having to iterate over the features and not with get_all_items and adding things to the json, is there any other issues popping up around using PySTAC for your use case?

@scottyhq
Copy link
Author

scottyhq commented Feb 2, 2021

Thanks @lossyrob for the thoughtful details. I opened up this issue originally while trying to switch between a sat-stac dependency to pystac over in intake/intake-stac#72 , and so am mostly concerned with reading rather than writing SingleFileStac. It would be great to get @matthewhanson to weigh in on the various options, it's true that sat-stac has a separate ItemCollection class for SingleFileStac.

From a 'new user' perspective who hasn't been up to speed on the discussion, it feels like the stac search apis should return something that tools designed around the core spec should be able to consume. But if i'm following you'd need something like the sat-search library to receive a SingleFileStac response and have a .save() function to write out a valid pystac.Catalog?

No rush here really, iterating over features works, but i did have to spend a bit of time looking at the tests in this library to figure out how to do it. I expected sfs.get_all_items() to work because it is a pystac.Catalog type and inherits that method it seems. In the short term, a separate Class or Subclass that overwrites whatever methods don't work could alleviate some confusion.

@gadomski
Copy link
Member

gadomski commented Nov 8, 2022

Tying up loose ends:

  • pystac-client is now the preferred library for searching STAC APIs, replacing sat-search
  • Single file STAC is a removed concept:
    # Single File STAC is a removed concept; is being reworked as of

I think the only thing missing from @lossyrob's suggestions is a convenience method to convert between ItemCollections and Catalogs. @scottyhq is that still something that would be useful? Or can we consider this issue fixed? FYSA here's what your example looks like with pystac-client:

import pystac
import pystac_client
import json
from pystac_client import Client
from pystac import ItemCollection
from pystac.validation import validate_dict


print(pystac.__version__)
print(pystac_client.__version__)

bbox = [35.48, -3.24, 35.58, -3.14]
dates = "2020-07-01/2020-08-15"
URL = "https://earth-search.aws.element84.com/v0"
client = Client.open(URL)
results = client.search(
    collections=["sentinel-s2-l2a-cogs"],
    datetime=dates,
    bbox=bbox,
)

# 18 items found
items = results.item_collection()
print(len(items))
items.save_object("my-s2-l2a-cogs.json")

# validating an ItemCollection doesn't make sense, as there isn't a jsonschema for it.
item_collection = ItemCollection.from_file("my-s2-l2a-cogs.json")

@scottyhq
Copy link
Author

scottyhq commented Nov 9, 2022

I think the only thing missing from @lossyrob's suggestions is a convenience method to convert between ItemCollections and Catalogs. @scottyhq is that still something that would be useful? Or can we consider this issue fixed?

Thanks for checking @gadomski ! Feel free to close this issue. But I do think a Catalog <-> ItemCollection utility would be very useful. See discussion of why here : gjoseph92/stackstac#86

@gadomski
Copy link
Member

gadomski commented Nov 9, 2022

Cool, thanks for pointing me at that issue. If we decide it should be a PySTAC thing we can make a new issue to capture. 🥂

@gadomski gadomski closed this as not planned Won't fix, can't repro, duplicate, stale Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants