insert_rows_from_dataframe produces int64 is not JSON serializable #1348

1oglop1 · 2022-09-13T09:27:07Z

I believe that this PR #786 introduced a regression in the function Client.insert_rows_from_dataframe

Because to_dataframe now produces the type Int64 and the function _pandas_helpers.dataframe_to_json_generator cannot handle this properly.

python-bigquery/google/cloud/bigquery/client.py

Line 3461 in af0101d

rows_iter = _pandas_helpers.dataframe_to_json_generator(dataframe)

which results in the error: TypeError: Object of type int64 is not JSON serializable

Environment details

OS type and version:
Python version: 3.9.10
google-cloud-bigquery version: Version: 3.3.2

Steps to reproduce

Query something df = client.query(sql).to_dataframe()

Describe dtypes

member_id            object
total                     Int64
total_zero_amount                 Int64
total_amount                float64

Try to inset DF in the table: client.insert_rows_from_dataframe(table=client.get_table("table_name"), dataframe=user_data)
Get the error

Stack trace

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [27], in <cell line: 3>()
      1 table = client.get_table(DB_TABLES.TAB_RESULTS.value)
----> 3 res = client.insert_rows_from_dataframe(table=table, dataframe=user_data)

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/cloud/bigquery/client.py:3465, in Client.insert_rows_from_dataframe(self, table, dataframe, selected_fields, chunk_size, **kwargs)
   3463 for _ in range(chunk_count):
   3464     rows_chunk = itertools.islice(rows_iter, chunk_size)
-> 3465     result = self.insert_rows(table, rows_chunk, selected_fields, **kwargs)
   3466     insert_results.append(result)
   3468 return insert_results

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/cloud/bigquery/client.py:3415, in Client.insert_rows(self, table, rows, selected_fields, **kwargs)
   3406     raise ValueError(
   3407         (
   3408             "Could not determine schema for table '{}'. Call client.get_table() "
   3409             "or pass in a list of schema fields to the selected_fields argument."
   3410         ).format(table)
   3411     )
   3413 json_rows = [_record_field_to_json(schema, row) for row in rows]
-> 3415 return self.insert_rows_json(table, json_rows, **kwargs)

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/cloud/bigquery/client.py:3597, in Client.insert_rows_json(self, table, json_rows, row_ids, skip_invalid_rows, ignore_unknown_values, template_suffix, retry, timeout)
   3595 # We can always retry, because every row has an insert ID.
   3596 span_attributes = {"path": path}
-> 3597 response = self._call_api(
   3598     retry,
   3599     span_name="BigQuery.insertRowsJson",
   3600     span_attributes=span_attributes,
   3601     method="POST",
   3602     path=path,
   3603     data=data,
   3604     timeout=timeout,
   3605 )
   3606 errors = []
   3608 for error in response.get("insertErrors", ()):

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/cloud/bigquery/client.py:759, in Client._call_api(self, retry, span_name, span_attributes, job_ref, headers, **kwargs)
    755 if span_name is not None:
    756     with create_span(
    757         name=span_name, attributes=span_attributes, client=self, job_ref=job_ref
    758     ):
--> 759         return call()
    761 return call()

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/api_core/retry.py:283, in Retry.__call__.<locals>.retry_wrapped_func(*args, **kwargs)
    279 target = functools.partial(func, *args, **kwargs)
    280 sleep_generator = exponential_sleep_generator(
    281     self._initial, self._maximum, multiplier=self._multiplier
    282 )
--> 283 return retry_target(
    284     target,
    285     self._predicate,
    286     sleep_generator,
    287     self._deadline,
    288     on_error=on_error,
    289 )

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/api_core/retry.py:190, in retry_target(target, predicate, sleep_generator, deadline, on_error)
    188 for sleep in sleep_generator:
    189     try:
--> 190         return target()
    192     # pylint: disable=broad-except
    193     # This function explicitly must deal with broad exceptions.
    194     except Exception as exc:

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/cloud/_http/__init__.py:479, in JSONConnection.api_request(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout, extra_api_info)
    476 # Making the executive decision that any dictionary
    477 # data will be sent properly as JSON.
    478 if data and isinstance(data, dict):
--> 479     data = json.dumps(data)
    480     content_type = "application/json"
    482 response = self._make_request(
    483     method=method,
    484     url=url,
   (...)
    490     extra_api_info=extra_api_info,
    491 )

File ~/.asdf/installs/python/3.9.10/lib/python3.9/json/__init__.py:231, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    226 # cached encoder
    227 if (not skipkeys and ensure_ascii and
    228     check_circular and allow_nan and
    229     cls is None and indent is None and separators is None and
    230     default is None and not sort_keys and not kw):
--> 231     return _default_encoder.encode(obj)
    232 if cls is None:
    233     cls = JSONEncoder

File ~/.asdf/installs/python/3.9.10/lib/python3.9/json/encoder.py:199, in JSONEncoder.encode(self, o)
    195         return encode_basestring(o)
    196 # This doesn't pass the iterator directly to ''.join() because the
    197 # exceptions aren't as detailed.  The list call should be roughly
    198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
    200 if not isinstance(chunks, (list, tuple)):
    201     chunks = list(chunks)

File ~/.asdf/installs/python/3.9.10/lib/python3.9/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)
    252 else:
    253     _iterencode = _make_iterencode(
    254         markers, self.default, _encoder, self.indent, floatstr,
    255         self.key_separator, self.item_separator, self.sort_keys,
    256         self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)

File ~/.asdf/installs/python/3.9.10/lib/python3.9/json/encoder.py:179, in JSONEncoder.default(self, o)
    160 def default(self, o):
    161     """Implement this method in a subclass such that it returns
    162     a serializable object for ``o``, or calls the base implementation
    163     (to raise a ``TypeError``).
   (...)
    177 
    178     """
--> 179     raise TypeError(f'Object of type {o.__class__.__name__} '
    180                     f'is not JSON serializable')

TypeError: Object of type int64 is not JSON serializable

The text was updated successfully, but these errors were encountered:

1oglop1 · 2022-09-13T10:02:24Z

cc @tswast

tswast · 2022-09-13T14:17:23Z

Thanks for the report! This indeed appears to be a bug. I'll bring it up with the team to investigate.

tswast · 2022-09-13T14:19:30Z

@1oglop1 Would you mind sharing the version of pandas you are using, too?

1oglop1 · 2022-09-13T16:48:59Z

@tswast Thanks for responding so fast, our version is pandas-1.4.3

danilo-nzyte · 2022-09-30T09:25:08Z

Experiencing the same issue on pandas==1.5.0 too.

Ended up having to change the method to insert_rows_json and include an extra line of code that feels a bit messy, but it does the job:

json_rows = [json.loads(i) for i in df.apply(lambda x: x.to_json(date_format='iso'), axis=1)]

errors = client.insert_rows_json(
    client.get_table(table_ref), json_rows
)

The insert_rows_json method expects a list of dictionaries, to_json() returns a string.

I have to use pd.to_json so that Pandas handles the conversion of the different types (specifically Int64 and values).

If anybody thinks of a more efficient way of doing it, please let me know!

VovaViliLox · 2023-02-23T12:17:44Z

@tswast Any updates to this issue?

galz10 · 2023-04-24T14:32:06Z

Hi @VovaViliLox following your directions i did not run into the issue you received, I was able to insert rows into the table. Could you please check again to see if this issue was fixed.

galz10 · 2023-04-26T19:06:55Z

Since i cannot reproduce this issue, i'll close this issue. I used the latest version 3.10.0 when trying to reproduce the issue.

ana-silvia-s · 2024-01-03T08:28:38Z

Hi, I just ran into this issue using pandas 2.0.3 and bigquery 3.13.0:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 64
     60     print("Loaded recommendations")
     61     batch_recos["score"] = batch_recos["score"] * generate_noise(
     62         mean=0, stddev=0.5, recos=batch_recos
     63     )
---> 64     client.insert_rows_from_dataframe(
     65         table=destination_table, dataframe=batch_recos
     66     )
     67     print(f"Inserted into {destination_table}")
     68 print("Finished!")

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:3537, in Client.insert_rows_from_dataframe(self, table, dataframe, selected_fields, chunk_size, **kwargs)
   3535 for _ in range(chunk_count):
   3536     rows_chunk = itertools.islice(rows_iter, chunk_size)
-> 3537     result = self.insert_rows(table, rows_chunk, selected_fields, **kwargs)
   3538     insert_results.append(result)
   3540 return insert_results

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:3480, in Client.insert_rows(self, table, rows, selected_fields, **kwargs)
   3471     raise ValueError(
   3472         (
   3473             "Could not determine schema for table '{}'. Call client.get_table() "
   3474             "or pass in a list of schema fields to the selected_fields argument."
   3475         ).format(table)
   3476     )
   3478 json_rows = [_record_field_to_json(schema, row) for row in rows]
-> 3480 return self.insert_rows_json(table, json_rows, **kwargs)

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:3676, in Client.insert_rows_json(self, table, json_rows, row_ids, skip_invalid_rows, ignore_unknown_values, template_suffix, retry, timeout)
   3674 # We can always retry, because every row has an insert ID.
   3675 span_attributes = {"path": path}
-> 3676 response = self._call_api(
   3677     retry,
   3678     span_name="BigQuery.insertRowsJson",
   3679     span_attributes=span_attributes,
   3680     method="POST",
   3681     path=path,
   3682     data=data,
   3683     timeout=timeout,
   3684 )
   3685 errors = []
   3687 for error in response.get("insertErrors", ()):

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:808, in Client._call_api(self, retry, span_name, span_attributes, job_ref, headers, **kwargs)
    804 if span_name is not None:
    805     with create_span(
    806         name=span_name, attributes=span_attributes, client=self, job_ref=job_ref
    807     ):
--> 808         return call()
    810 return call()

File /opt/conda/lib/python3.10/site-packages/google/api_core/retry.py:349, in Retry.__call__.<locals>.retry_wrapped_func(*args, **kwargs)
    345 target = functools.partial(func, *args, **kwargs)
    346 sleep_generator = exponential_sleep_generator(
    347     self._initial, self._maximum, multiplier=self._multiplier
    348 )
--> 349 return retry_target(
    350     target,
    351     self._predicate,
    352     sleep_generator,
    353     self._timeout,
    354     on_error=on_error,
    355 )

File /opt/conda/lib/python3.10/site-packages/google/api_core/retry.py:191, in retry_target(target, predicate, sleep_generator, timeout, on_error, **kwargs)
    189 for sleep in sleep_generator:
    190     try:
--> 191         return target()
    193     # pylint: disable=broad-except
    194     # This function explicitly must deal with broad exceptions.
    195     except Exception as exc:

File /opt/conda/lib/python3.10/site-packages/google/cloud/_http/__init__.py:479, in JSONConnection.api_request(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout, extra_api_info)
    476 # Making the executive decision that any dictionary
    477 # data will be sent properly as JSON.
    478 if data and isinstance(data, dict):
--> 479     data = json.dumps(data)
    480     content_type = "application/json"
    482 response = self._make_request(
    483     method=method,
    484     url=url,
   (...)
    490     extra_api_info=extra_api_info,
    491 )

File /opt/conda/lib/python3.10/json/__init__.py:231, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    226 # cached encoder
    227 if (not skipkeys and ensure_ascii and
    228     check_circular and allow_nan and
    229     cls is None and indent is None and separators is None and
    230     default is None and not sort_keys and not kw):
--> 231     return _default_encoder.encode(obj)
    232 if cls is None:
    233     cls = JSONEncoder

File /opt/conda/lib/python3.10/json/encoder.py:199, in JSONEncoder.encode(self, o)
    195         return encode_basestring(o)
    196 # This doesn't pass the iterator directly to ''.join() because the
    197 # exceptions aren't as detailed.  The list call should be roughly
    198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
    200 if not isinstance(chunks, (list, tuple)):
    201     chunks = list(chunks)

File /opt/conda/lib/python3.10/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)
    252 else:
    253     _iterencode = _make_iterencode(
    254         markers, self.default, _encoder, self.indent, floatstr,
    255         self.key_separator, self.item_separator, self.sort_keys,
    256         self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)

File /opt/conda/lib/python3.10/json/encoder.py:179, in JSONEncoder.default(self, o)
    160 def default(self, o):
    161     """Implement this method in a subclass such that it returns
    162     a serializable object for ``o``, or calls the base implementation
    163     (to raise a ``TypeError``).
   (...)
    177 
    178     """
--> 179     raise TypeError(f'Object of type {o.__class__.__name__} '
    180                     f'is not JSON serializable')

TypeError: Object of type int64 is not JSON serializable

I will try out the "solution" danilo-nzyte proposed in this thread but would rather have some sort of official resolution, if possible! thank you!

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Sep 13, 2022

tswast added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Sep 13, 2022

VovaViliLox mentioned this issue Feb 23, 2023

Make insert_dataframe_into_database work with PyArrow nullable datatypes like Int64. Lox-Solution/LoxServices#52

Merged

galz10 self-assigned this Apr 20, 2023

galz10 added the status: investigating The issue is under investigation, which is determined to be non-trivial. label Apr 20, 2023

galz10 closed this as completed Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

insert_rows_from_dataframe produces int64 is not JSON serializable #1348

insert_rows_from_dataframe produces int64 is not JSON serializable #1348

1oglop1 commented Sep 13, 2022

1oglop1 commented Sep 13, 2022

tswast commented Sep 13, 2022

tswast commented Sep 13, 2022

1oglop1 commented Sep 13, 2022

danilo-nzyte commented Sep 30, 2022

VovaViliLox commented Feb 23, 2023

galz10 commented Apr 24, 2023

galz10 commented Apr 26, 2023

ana-silvia-s commented Jan 3, 2024

insert_rows_from_dataframe produces int64 is not JSON serializable #1348

insert_rows_from_dataframe produces int64 is not JSON serializable #1348

Comments

1oglop1 commented Sep 13, 2022

Environment details

Steps to reproduce

Stack trace

1oglop1 commented Sep 13, 2022

tswast commented Sep 13, 2022

tswast commented Sep 13, 2022

1oglop1 commented Sep 13, 2022

danilo-nzyte commented Sep 30, 2022

VovaViliLox commented Feb 23, 2023

galz10 commented Apr 24, 2023

galz10 commented Apr 26, 2023

ana-silvia-s commented Jan 3, 2024