Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/2089 support sets for pyarrow backend #2090

Merged
Merged
4 changes: 2 additions & 2 deletions dlt/common/libs/pyarrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -620,15 +620,15 @@ def row_tuples_to_arrow(
)
float_array = pa.array(columnar_known_types[field.name], type=pa.float64())
columnar_known_types[field.name] = float_array.cast(field.type, safe=False)
if issubclass(py_type, (dict, list)):
if issubclass(py_type, (dict, list, set)):
logger.warning(
f"Field {field.name} was reflected as JSON type and needs to be serialized back to"
" string to be placed in arrow table. This will slow data extraction down. You"
" should cast JSON field to STRING in your database system ie. by creating and"
" extracting an SQL VIEW that selects with cast."
)
json_str_array = pa.array(
[None if s is None else json.dumps(s) for s in columnar_known_types[field.name]]
[None if s is None else json.dumps(s) if not issubclass(type(s), set) else json.dumps(list(s)) for s in columnar_known_types[field.name]]
)
columnar_known_types[field.name] = json_str_array

Expand Down
Loading