Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for fids filter with use_arrow=True #304

Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
00c487c
Add support for fids filter with use_arrow=True
theroggy Oct 1, 2023
9315845
Update CHANGES.md
theroggy Oct 1, 2023
d9cfebd
Add to docstring that the order is undefined
theroggy Oct 3, 2023
052838a
Merge remote-tracking branch 'upstream/main' into Add-support-for-fid…
theroggy Oct 12, 2023
db573d3
use_arrow can also impact performance
theroggy Oct 12, 2023
190460b
Merge remote-tracking branch 'upstream/main' into Add-support-for-fid…
theroggy Oct 13, 2023
26ec43e
Merge remote-tracking branch 'upstream/main' into Add-support-for-fid…
theroggy Dec 2, 2023
6f782e1
Make sure fids can be arraylike
theroggy Dec 2, 2023
4a2cc94
Raise explicit error if too many fids asked
theroggy Dec 3, 2023
49e478b
Add @requires_arrow_api to test
theroggy Dec 3, 2023
a28ac9c
Improve doc
theroggy Dec 3, 2023
d700a98
Merge remote-tracking branch 'upstream/main' into Add-support-for-fid…
theroggy Dec 4, 2023
e9bf6d6
Merge remote-tracking branch 'upstream/main' into Add-support-for-fid…
theroggy Dec 15, 2023
6951377
Merge remote-tracking branch 'upstream/main' into Add-support-for-fid…
theroggy Jan 27, 2024
34250e2
Cleanup the exception being raised.
theroggy Jan 27, 2024
f66a857
Merge remote-tracking branch 'upstream/main' into Add-support-for-fid…
theroggy Jan 27, 2024
f47b847
Merge remote-tracking branch 'upstream/main' into Add-support-for-fid…
theroggy Apr 5, 2024
70da3f9
Move change description in changelog from 0.7 to 0.8
theroggy Apr 5, 2024
e947129
Merge remote-tracking branch 'upstream/main' into Add-support-for-fid…
theroggy Apr 10, 2024
9ce488a
Apply feedback on PR
theroggy Apr 10, 2024
1e0c249
Add warning for old GDAL versions + test
theroggy Apr 11, 2024
602e30f
Fix + improve test
theroggy Apr 11, 2024
b437caf
Fix test
theroggy Apr 11, 2024
93b6919
Merge remote-tracking branch 'upstream/main' into Add-support-for-fid…
theroggy Apr 15, 2024
8d3bcf1
Merge remote-tracking branch 'upstream/main' into Add-support-for-fid…
theroggy Apr 17, 2024
599b068
Applied feedback
theroggy Apr 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
intersect the bounding box of the mask when using the Arrow interface for
some drivers; this has been fixed in GDAL 3.8.0.
- Removed warning when no features are read from the data source (#299).
- Add support for `fids` filter with `use_arrow=True` (#304)

### Other changes

Expand Down
17 changes: 15 additions & 2 deletions pyogrio/_io.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1253,7 +1253,11 @@ def ogr_open_arrow(
raise ValueError("forcing 2D is not supported for Arrow")

if fids is not None:
raise ValueError("reading by FID is not supported for Arrow")
if where is not None or bbox is not None or mask is not None or sql is not None or skip_features or max_features:
raise ValueError(
"cannot set both 'fids' and any of 'where', 'bbox', 'mask', "
"'sql', 'skip_features' or 'max_features'"
theroggy marked this conversation as resolved.
Show resolved Hide resolved
)

IF CTE_GDAL_VERSION < (3, 8, 0):
if skip_features:
Expand Down Expand Up @@ -1315,10 +1319,19 @@ def ogr_open_arrow(
geometry_name = get_string(OGR_L_GetGeometryColumn(ogr_layer))

fid_column = get_string(OGR_L_GetFIDColumn(ogr_layer))
fid_column_where = fid_column
# OGR_L_GetFIDColumn returns the column name if it is a custom column,
# or "" if not. For arrow, the default column name is "OGC_FID".
# or "" if not. For arrow, the default column name used to return the FID data
# read is "OGC_FID". When accessing the underlying datasource like when using a
# where clause, the default column name is "FID".
if fid_column == "":
fid_column = "OGC_FID"
fid_column_where = "FID"

# Use fids list to create a where clause, as arrow doesn't support direct fid
# filtering.
if fids is not None:
where = f"{fid_column_where} IN ({','.join(fids.astype(str))})"

# Apply the attribute filter
if where is not None and where != "":
Expand Down
4 changes: 3 additions & 1 deletion pyogrio/geopandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,9 @@ def read_dataframe(
the starting index is driver and file specific (e.g. typically 0 for
Shapefile and 1 for GeoPackage, but can still depend on the specific
file). The performance of reading a large number of features usings FIDs
is also driver specific.
is also driver specific. The order of the rows returned is undefined. If you
would like to sort based on FID, use ``fid_as_index=True`` to have the index of
the GeoDataFrame returned set to the FIDs of the features read.
sql : str, optional (default: None)
The SQL statement to execute. Look at the sql_dialect parameter for more
information on the syntax to use for the query. When combined with other
Expand Down
8 changes: 5 additions & 3 deletions pyogrio/tests/test_geopandas_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -408,10 +408,12 @@ def test_read_mask_where(naturalearth_lowres_all_ext, use_arrow):
assert np.array_equal(df.iso_a3, ["CAN"])


def test_read_fids(naturalearth_lowres_all_ext):
def test_read_fids(naturalearth_lowres_all_ext, use_arrow):
# ensure keyword is properly passed through
fids = np.array([1, 10, 5], dtype=np.int64)
df = read_dataframe(naturalearth_lowres_all_ext, fids=fids, fid_as_index=True)
fids = np.array([1, 5, 10], dtype=np.int64)
df = read_dataframe(
naturalearth_lowres_all_ext, fids=fids, fid_as_index=True, use_arrow=use_arrow
)
assert len(df) == 3
assert np.array_equal(fids, df.index.values)

Expand Down
Loading