[WIP] Initial commit of bulk reading into numpy arrays #540

snorfalorpagus · 2018-01-21T22:18:07Z

This is a first attempt at providing a bulk load of data into NumPy arrays, as suggested in #469.

It's a work in progress. I'm sure it's not perfectly optimized and there are some edge cases that aren't handle yet (e.g. NULL values in int columns), date fields.

Does this provide the right kind of interface for geopandas @jorisvandenbossche? Do you have any opinions @sgillies?

coveralls · 2018-01-22T13:30:02Z

Coverage decreased (-0.3%) to 84.065% when pulling dc4ab3d on snorfalorpagus:vectorized into 1938c44 on Toblerity:master.

coveralls · 2018-01-22T13:30:02Z

Coverage decreased (-0.2%) to 83.389% when pulling 37400b7 on snorfalorpagus:vectorized into 8867328 on Toblerity:master.

jorisvandenbossche · 2018-01-22T14:48:02Z

@snorfalorpagus Thanks a lot for exploring this!
To say whether this would output something useful for geopandas, I would first need to take a closer look (but in principle returning numpy arrays is fine). You now create one array per dtype? That would be perfect for geopandas, although from a more general user point of view, a 1D array per field might make more sense.

With regard to the TODO "best way to return geometries for shapely?", I would say either WKB or WKT, depending on which is the fastest/cheapest to create and convert afterwards (which I assume will be WKB?)

snorfalorpagus · 2018-03-24T12:47:12Z

Support WKT or WKB
Support for all data types available in Fiona already
Support for ignoring fields
Improved iterating over features

snorfalorpagus · 2018-03-24T17:06:57Z

It looks like GDAL 1.x is having problem with the datetime field? It looks like we don't actually test this in the regular reader/writer.

snowman2 · 2019-01-30T18:56:28Z

fiona/_vectorized.pyx

+    length = OGR_L_GetFeatureCount(session.cogr_layer, 0)
+
+    data_fids = np.empty([length], dtype=object)
+    data_properties = {}


Thoughts on using a structured array for data_properties?

brendan-ward · 2021-10-08T15:25:20Z

For those finding this issue later and wanting to know how support for vectorized reading from OGR/GDAL evolved, see pyogrio.

Thanks again @snorfalorpagus for the initial work on this!

sgillies added this to the 1.8a3 milestone Mar 20, 2018

snorfalorpagus added 6 commits March 24, 2018 12:50

Initial commit of bulk reading into numpy arrays

dcce27b

Removed incorrect call to cythonize()

b7aec85

Append NumPy include dir, not overwrite

4742cad

Removed type from field name

395c0e5

Support ignore fields/geometry in read_vectorized

f8fecfc

Ignore fiona/_vectorized.c

6784292

snorfalorpagus force-pushed the vectorized branch from c1894e4 to 6784292 Compare March 24, 2018 13:48

snorfalorpagus added 4 commits March 24, 2018 14:30

Support binary fields in read_vectorized

58bcba9

Support datetime fields in read_vectorized

b58ff97

Comment tweaks

35a4308

Support for WKB as geometry type in read_vectorized

dc4ab3d

sgillies mentioned this pull request Mar 24, 2018

Fix for 32 vs 64-bit integer ambiguity #562

Closed

snorfalorpagus mentioned this pull request Mar 25, 2018

Test support for date/datetime/time fields #563

Merged

snorfalorpagus removed this from the 1.8a3 milestone Sep 6, 2018

snowman2 reviewed Jan 30, 2019

View reviewed changes

brendan-ward mentioned this pull request Apr 5, 2020

Make columnar-based access possible in addition to record-based model? #469

Closed

snorfalorpagus closed this Oct 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Initial commit of bulk reading into numpy arrays #540

[WIP] Initial commit of bulk reading into numpy arrays #540

snorfalorpagus commented Jan 21, 2018

coveralls commented Jan 22, 2018 •

edited

Loading

coveralls commented Jan 22, 2018 •

edited

Loading

jorisvandenbossche commented Jan 22, 2018

snorfalorpagus commented Mar 24, 2018 •

edited

Loading

snorfalorpagus commented Mar 24, 2018

snowman2 Jan 30, 2019

brendan-ward commented Oct 8, 2021

[WIP] Initial commit of bulk reading into numpy arrays #540

[WIP] Initial commit of bulk reading into numpy arrays #540

Conversation

snorfalorpagus commented Jan 21, 2018

coveralls commented Jan 22, 2018 • edited Loading

coveralls commented Jan 22, 2018 • edited Loading

jorisvandenbossche commented Jan 22, 2018

snorfalorpagus commented Mar 24, 2018 • edited Loading

snorfalorpagus commented Mar 24, 2018

snowman2 Jan 30, 2019

Choose a reason for hiding this comment

brendan-ward commented Oct 8, 2021

coveralls commented Jan 22, 2018 •

edited

Loading

coveralls commented Jan 22, 2018 •

edited

Loading

snorfalorpagus commented Mar 24, 2018 •

edited

Loading