Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ea770b9
Add test on different column names
EnricoMi Oct 6, 2022
9cbc7d9
Improve error on schema mismatch, add more tests
EnricoMi Oct 7, 2022
4ceb458
Reuse _test_merge in test_pandas_cogrouped_map.py as much as possible
EnricoMi Oct 10, 2022
055d321
Make _test_apply_in_pandas* static as in test_pandas_cogrouped_map.py
EnricoMi Oct 10, 2022
3374ef1
Simplify error assertion in python/pyspark/sql/tests/test_pandas_cogr…
EnricoMi Oct 10, 2022
91d2912
Verify pandas result in wrap_cogrouped_map_pandas_udf
EnricoMi Oct 11, 2022
f3626a4
Remove classmethods in test, add comments
EnricoMi Oct 11, 2022
c488f37
Fix python lint on lambda
EnricoMi Oct 11, 2022
468f127
Remove redundant test for empty DataFrame
EnricoMi Oct 11, 2022
c5dfe63
Also provide expected schema in error message
EnricoMi Oct 12, 2022
4953df9
Separate Missing, Unexpected, Expected and Schema by two spaces
EnricoMi Oct 12, 2022
134b10a
Add field name to ValueError and TypeError message
EnricoMi Oct 12, 2022
107e620
Fix test_arrow.py asserting inner exception
EnricoMi Oct 12, 2022
838eb7c
Limit missing / unexpected columns to 5, schema string to 1024 charac…
EnricoMi Oct 21, 2022
4ebc7b3
Remove argument n from create_array, use s.name instead
EnricoMi Oct 21, 2022
0d9e642
Simplify backslash use in expected message regexp
EnricoMi Nov 2, 2022
dc602f5
Call assign_cols_by_name once per dataframe, not once per group
EnricoMi Nov 5, 2022
15faddc
Remove limiting number of columns and schema in error messages
EnricoMi Nov 24, 2022
9ab23cb
Rework spacing of expected, missing, actual in error message
EnricoMi Nov 24, 2022
124cad0
Rename test class names to proper name
EnricoMi Feb 8, 2023
df27aa7
Assert full expected result, not only subset
EnricoMi Feb 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 19 additions & 10 deletions python/pyspark/sql/pandas/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,18 +231,25 @@ def create_array(s, t):
s = s.astype(s.dtypes.categories.dtype)
try:
array = pa.Array.from_pandas(s, mask=mask, type=t, safe=self._safecheck)
except TypeError as e:
error_msg = (
"Exception thrown when converting pandas.Series (%s) "
"with name '%s' to Arrow Array (%s)."
)
raise TypeError(error_msg % (s.dtype, s.name, t)) from e
except ValueError as e:
error_msg = (
"Exception thrown when converting pandas.Series (%s) "
"with name '%s' to Arrow Array (%s)."
)
if self._safecheck:
error_msg = (
"Exception thrown when converting pandas.Series (%s) to "
+ "Arrow Array (%s). It can be caused by overflows or other "
+ "unsafe conversions warned by Arrow. Arrow safe type check "
+ "can be disabled by using SQL config "
+ "`spark.sql.execution.pandas.convertToArrowArraySafely`."
error_msg = error_msg + (
" It can be caused by overflows or other "
"unsafe conversions warned by Arrow. Arrow safe type check "
"can be disabled by using SQL config "
"`spark.sql.execution.pandas.convertToArrowArraySafely`."
)
raise ValueError(error_msg % (s.dtype, t)) from e
else:
raise e
raise ValueError(error_msg % (s.dtype, s.name, t)) from e
return array

arrs = []
Expand All @@ -265,7 +272,9 @@ def create_array(s, t):
# Assign result columns by position
else:
arrs_names = [
(create_array(s[s.columns[i]], field.type), field.name)
# the selected series has name '1', so we rename it to field.name
# as the name is used by create_array to provide a meaningful error message
(create_array(s[s.columns[i]].rename(field.name), field.type), field.name)
for i, field in enumerate(t)
]

Expand Down
Loading