Skip to content

[SPARK-54568][PYTHON] Avoid unnecessary pandas conversion in create dataframe from ndarray#53280

Closed
zhengruifeng wants to merge 3 commits intoapache:masterfrom
zhengruifeng:test_np_arrow
Closed

[SPARK-54568][PYTHON] Avoid unnecessary pandas conversion in create dataframe from ndarray#53280
zhengruifeng wants to merge 3 commits intoapache:masterfrom
zhengruifeng:test_np_arrow

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Dec 2, 2025

What changes were proposed in this pull request?

Avoid unnecessary pandas conversion in create dataframe from ndarray

Why are the changes needed?

before:
ndarray -> pandas dataframe -> arrow data

after:
ndarray -> arrow data

and will be consistent with connect mode:

elif isinstance(data, np.ndarray):
if _cols is None:
if data.ndim == 1 or data.shape[1] == 1:
_cols = ["value"]
else:
_cols = ["_%s" % i for i in range(1, data.shape[1] + 1)]
if data.ndim == 1:
if 1 != len(_cols):
raise PySparkValueError(
errorClass="AXIS_LENGTH_MISMATCH",
messageParameters={
"expected_length": str(len(_cols)),
"actual_length": "1",
},
)
_table = pa.Table.from_arrays([pa.array(data)], _cols)
else:
if data.shape[1] != len(_cols):
raise PySparkValueError(
errorClass="AXIS_LENGTH_MISMATCH",
messageParameters={
"expected_length": str(len(_cols)),
"actual_length": str(data.shape[1]),
},
)
_table = pa.Table.from_arrays(
[pa.array(data[::, i]) for i in range(0, data.shape[1])], _cols
)

Does this PR introduce any user-facing change?

no

How was this patch tested?

ci

Was this patch authored or co-authored using generative AI tooling?

no

@zhengruifeng
Copy link
Contributor Author

thanks, merged to master

@zhengruifeng zhengruifeng deleted the test_np_arrow branch December 3, 2025 09:09
zhengruifeng added a commit that referenced this pull request Dec 4, 2025
….test_with_none_and_nan`

### What changes were proposed in this pull request?
There was a bug in create dataframe from ndarray containing NaN values:
NaN was incorrectly converted to Null when arrow-optimization is on, it happened to be resolved in #53280

### Why are the changes needed?
for test coverage

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #53305 from zhengruifeng/reenable_test_with_none_and_nan.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants