Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embeddings with land use land cover fields, or other attributes #84

Closed
3 tasks done
weiji14 opened this issue Dec 10, 2023 · 3 comments
Closed
3 tasks done

Embeddings with land use land cover fields, or other attributes #84

weiji14 opened this issue Dec 10, 2023 · 3 comments
Assignees
Labels
operational Coordination tasks between the sub-teams of Clay question Further information is requested

Comments

@weiji14
Copy link
Contributor

weiji14 commented Dec 10, 2023

Opening a parallel thread to #35, to ask about what other attributes are needed alongside the vector embeddings themselves.

Currently, we have implemented (or are about to implement):

But @Clay-foundation/ode, it seems like you would require more than just the embeddings and spatiotemporal metadata for the Web App?

Do we have metadata or other datasets that have already been computed for this area? There's a breakdown for the land cover given up above, for example, do we have that for each of the chips?

I'm interested in exploring the relationships between these embeddings and known datasets over the area.

Originally posted by @MaceGrim in #35 (comment)

To be clear, is this extra land cover metadata something that falls on @Clay-foundation/devseed's plate, or can @Clay-foundation/ode use the spatiotemporal metadata from #73 to find the landcover type statistics? Besides landcover type, what other attributes is worth adding to the embedding file?

@weiji14 weiji14 added question Further information is requested operational Coordination tasks between the sub-teams of Clay labels Dec 10, 2023
@danhammer
Copy link
Collaborator

I defer to @brunosan on the question:

To be clear, is this extra land cover metadata something that falls on @Clay-foundation/devseed's plate, or can @Clay-foundation/ode use the spatiotemporal metadata from #73 to find the landcover type statistics? Besides landcover type, what other attributes is worth adding to the embedding file?

We will need this metadata to do some of the dynamic visualizations we showed in the past. We can add this metadata, but it won't be nearly efficient as working within the imagery pipeline that @Clay-foundation/devseed already has.

@weiji14
Copy link
Contributor Author

weiji14 commented Dec 12, 2023

The neural network model we have does not output Land Use Land Cover (LULC), so there would need to a separate pipeline for this. Note that the original sampling we did in #28 is based on WorldCover, which are annual grids from 2020-2021, and not exact LULC statistics on the acquisition date of the satellite imagery we ran the embedding on. We could use something like DynamicWorld perhaps that has a 1-to-1 temporal match with Sentinel-2, but there is no STAC catalog for this as far as I'm aware, so it would take some a lot of time to setup. Cc @yellowcap.

In #86, I've linked the each row of embeddings to the source GeoTIFF file, so it should be possible to at least see the RGB image associated with each embedding. Looking at https://medium.com/earthrisemedia/how-we-judge-earth-observation-foundation-model-quality-part-1-intuition-building-623e527d560a, it seems that the visualization was created using https://github.com/nomic-ai/deepscatter, which expects a Parquet/Feather file with x, y and other categorical columns. The x and y columns can be derived from the current GeoParquet file's geometry column, but the categorical columns would need some work.

Not promising that we can do this by end of the year, but could you send a sample Parquet/Feather file that was used for the dynamic visualization, and we can at least see what the data inside the categorical columns should look like?

@brunosan
Copy link
Member

Closing here since the items of creating embeddings adding the source location, time and file, are all done.

We still need to get better at the last point of exploring the embeddings, but probably needs a narrower scope on a separate ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operational Coordination tasks between the sub-teams of Clay question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants