Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parquet and feather datasets from GeoPandas #196

Closed
Calychas opened this issue Apr 27, 2023 · 7 comments · Fixed by #812
Closed

Support parquet and feather datasets from GeoPandas #196

Calychas opened this issue Apr 27, 2023 · 7 comments · Fixed by #812
Labels
Community Issue/PR opened by the open-source community

Comments

@Calychas
Copy link

Calychas commented Apr 27, 2023

Description

Two datasets for feather and parquet formats based on GeoPandas, which will make loading and saving GeoDataFrames with geometries easier.

Context

When operating on geospatial data GeoPandas is of great use, however at the moment kedro-datasets plugin only supports GeoJsonDataset. It is possible to load and save parquet and feather files with geometries using standard pandas' datasets, but the geometry data needs special treatment afterwards (e.g. parsing WKB and creating GeoDataFrame manually).

Possible Implementation

Take an implementation of existing geopandas.GeoJsonDataset, then create geopandas.ParquetDataSet and geopandas.FeatherDataSet. I have an already working implementation privately and I can try to add it here

@noklam
Copy link
Contributor

noklam commented Apr 27, 2023

@Calychas Awesome, would you be able to create a PR?

@noklam noklam added the Community Issue/PR opened by the open-source community label Apr 27, 2023
@Calychas
Copy link
Author

@noklam Yes, I will try to do that sometime in the following 2 weeks

@SajidAlamQB
Copy link
Contributor

SajidAlamQB commented Apr 27, 2023

Awesome, thank you @Calychas!

@noklam
Copy link
Contributor

noklam commented Jun 29, 2023

@Calychas Hey! pinning to see if you still have some bandwidth to add support for this :)

@Calychas
Copy link
Author

Calychas commented Jul 4, 2023

Hey @noklam! Sorry for the delay, just came back to this task

@Calychas
Copy link
Author

Calychas commented Jul 4, 2023

Which dataset would you consider a gold standard for implementation to base the new datasets on? When I look over some of the datasets sometimes I see some minor discrepancies - e.g. using BytesIO or not

@noklam
Copy link
Contributor

noklam commented Jul 4, 2023

@Calychas CSVDataSet is probably the most common one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community
Projects
None yet
3 participants