Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Initial commit of bulk reading into numpy arrays #540

Closed
wants to merge 10 commits into from

Conversation

snorfalorpagus
Copy link
Member

This is a first attempt at providing a bulk load of data into NumPy arrays, as suggested in #469.

It's a work in progress. I'm sure it's not perfectly optimized and there are some edge cases that aren't handle yet (e.g. NULL values in int columns), date fields.

Does this provide the right kind of interface for geopandas @jorisvandenbossche? Do you have any opinions @sgillies?

@coveralls
Copy link

coveralls commented Jan 22, 2018

Coverage Status

Coverage decreased (-0.3%) to 84.065% when pulling dc4ab3d on snorfalorpagus:vectorized into 1938c44 on Toblerity:master.

@coveralls
Copy link

coveralls commented Jan 22, 2018

Coverage Status

Coverage decreased (-0.2%) to 83.389% when pulling 37400b7 on snorfalorpagus:vectorized into 8867328 on Toblerity:master.

@jorisvandenbossche
Copy link
Member

@snorfalorpagus Thanks a lot for exploring this!
To say whether this would output something useful for geopandas, I would first need to take a closer look (but in principle returning numpy arrays is fine). You now create one array per dtype? That would be perfect for geopandas, although from a more general user point of view, a 1D array per field might make more sense.

With regard to the TODO "best way to return geometries for shapely?", I would say either WKB or WKT, depending on which is the fastest/cheapest to create and convert afterwards (which I assume will be WKB?)

@sgillies sgillies added this to the 1.8a3 milestone Mar 20, 2018
@snorfalorpagus
Copy link
Member Author

snorfalorpagus commented Mar 24, 2018

  • Support WKT or WKB
  • Support for all data types available in Fiona already
  • Support for ignoring fields
  • Improved iterating over features

@snorfalorpagus
Copy link
Member Author

It looks like GDAL 1.x is having problem with the datetime field? It looks like we don't actually test this in the regular reader/writer.

length = OGR_L_GetFeatureCount(session.cogr_layer, 0)

data_fids = np.empty([length], dtype=object)
data_properties = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on using a structured array for data_properties?

@brendan-ward
Copy link
Contributor

For those finding this issue later and wanting to know how support for vectorized reading from OGR/GDAL evolved, see pyogrio.

Thanks again @snorfalorpagus for the initial work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants