Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove numpy as a hard dependency #1270

Merged
merged 2 commits into from
Oct 30, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions pyiceberg/io/pyarrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@
)
from urllib.parse import urlparse

import numpy as np
import pyarrow as pa
import pyarrow.compute as pc
import pyarrow.dataset as ds
Expand Down Expand Up @@ -812,7 +811,17 @@ def _combine_positional_deletes(positional_deletes: List[pa.ChunkedArray], start
all_chunks = positional_deletes[0]
else:
all_chunks = pa.chunked_array(itertools.chain(*[arr.chunks for arr in positional_deletes]))
return np.subtract(np.setdiff1d(np.arange(start_index, end_index), all_chunks, assume_unique=False), start_index)

# Create the full range array with pyarrow
full_range = pa.array(range(start_index, end_index))
# When available, replace with Arrow generator to improve performance
# See https://github.com/apache/iceberg-python/issues/1271 for details

# Filter out values in all_chunks from full_range
result = pc.filter(full_range, pc.invert(pc.is_in(full_range, value_set=all_chunks)))

# Subtract the start_index from each element in the result
return pc.subtract(result, pa.scalar(start_index))


def pyarrow_to_schema(
Expand Down