Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insert_rows_from_dataframe produces int64 is not JSON serializable #1348

Closed
1oglop1 opened this issue Sep 13, 2022 · 9 comments
Closed

insert_rows_from_dataframe produces int64 is not JSON serializable #1348

1oglop1 opened this issue Sep 13, 2022 · 9 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. status: investigating The issue is under investigation, which is determined to be non-trivial. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@1oglop1
Copy link

1oglop1 commented Sep 13, 2022

I believe that this PR #786 introduced a regression in the function Client.insert_rows_from_dataframe

Because to_dataframe now produces the type Int64 and the function _pandas_helpers.dataframe_to_json_generator cannot handle this properly.

rows_iter = _pandas_helpers.dataframe_to_json_generator(dataframe)

which results in the error: TypeError: Object of type int64 is not JSON serializable

Environment details

  • OS type and version:
  • Python version: 3.9.10
  • google-cloud-bigquery version: Version: 3.3.2

Steps to reproduce

  1. Query something df = client.query(sql).to_dataframe()
  2. Describe dtypes
    member_id            object
    total                     Int64
    total_zero_amount                 Int64
    total_amount                float64
    
  3. Try to inset DF in the table: client.insert_rows_from_dataframe(table=client.get_table("table_name"), dataframe=user_data)
  4. Get the error

Stack trace

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [27], in <cell line: 3>()
      1 table = client.get_table(DB_TABLES.TAB_RESULTS.value)
----> 3 res = client.insert_rows_from_dataframe(table=table, dataframe=user_data)

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/cloud/bigquery/client.py:3465, in Client.insert_rows_from_dataframe(self, table, dataframe, selected_fields, chunk_size, **kwargs)
   3463 for _ in range(chunk_count):
   3464     rows_chunk = itertools.islice(rows_iter, chunk_size)
-> 3465     result = self.insert_rows(table, rows_chunk, selected_fields, **kwargs)
   3466     insert_results.append(result)
   3468 return insert_results

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/cloud/bigquery/client.py:3415, in Client.insert_rows(self, table, rows, selected_fields, **kwargs)
   3406     raise ValueError(
   3407         (
   3408             "Could not determine schema for table '{}'. Call client.get_table() "
   3409             "or pass in a list of schema fields to the selected_fields argument."
   3410         ).format(table)
   3411     )
   3413 json_rows = [_record_field_to_json(schema, row) for row in rows]
-> 3415 return self.insert_rows_json(table, json_rows, **kwargs)

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/cloud/bigquery/client.py:3597, in Client.insert_rows_json(self, table, json_rows, row_ids, skip_invalid_rows, ignore_unknown_values, template_suffix, retry, timeout)
   3595 # We can always retry, because every row has an insert ID.
   3596 span_attributes = {"path": path}
-> 3597 response = self._call_api(
   3598     retry,
   3599     span_name="BigQuery.insertRowsJson",
   3600     span_attributes=span_attributes,
   3601     method="POST",
   3602     path=path,
   3603     data=data,
   3604     timeout=timeout,
   3605 )
   3606 errors = []
   3608 for error in response.get("insertErrors", ()):

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/cloud/bigquery/client.py:759, in Client._call_api(self, retry, span_name, span_attributes, job_ref, headers, **kwargs)
    755 if span_name is not None:
    756     with create_span(
    757         name=span_name, attributes=span_attributes, client=self, job_ref=job_ref
    758     ):
--> 759         return call()
    761 return call()

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/api_core/retry.py:283, in Retry.__call__.<locals>.retry_wrapped_func(*args, **kwargs)
    279 target = functools.partial(func, *args, **kwargs)
    280 sleep_generator = exponential_sleep_generator(
    281     self._initial, self._maximum, multiplier=self._multiplier
    282 )
--> 283 return retry_target(
    284     target,
    285     self._predicate,
    286     sleep_generator,
    287     self._deadline,
    288     on_error=on_error,
    289 )

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/api_core/retry.py:190, in retry_target(target, predicate, sleep_generator, deadline, on_error)
    188 for sleep in sleep_generator:
    189     try:
--> 190         return target()
    192     # pylint: disable=broad-except
    193     # This function explicitly must deal with broad exceptions.
    194     except Exception as exc:

File ~/Library/Caches/pypoetry/virtualenvs/venv-HcAOwm9Y-py3.9/lib/python3.9/site-packages/google/cloud/_http/__init__.py:479, in JSONConnection.api_request(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout, extra_api_info)
    476 # Making the executive decision that any dictionary
    477 # data will be sent properly as JSON.
    478 if data and isinstance(data, dict):
--> 479     data = json.dumps(data)
    480     content_type = "application/json"
    482 response = self._make_request(
    483     method=method,
    484     url=url,
   (...)
    490     extra_api_info=extra_api_info,
    491 )

File ~/.asdf/installs/python/3.9.10/lib/python3.9/json/__init__.py:231, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    226 # cached encoder
    227 if (not skipkeys and ensure_ascii and
    228     check_circular and allow_nan and
    229     cls is None and indent is None and separators is None and
    230     default is None and not sort_keys and not kw):
--> 231     return _default_encoder.encode(obj)
    232 if cls is None:
    233     cls = JSONEncoder

File ~/.asdf/installs/python/3.9.10/lib/python3.9/json/encoder.py:199, in JSONEncoder.encode(self, o)
    195         return encode_basestring(o)
    196 # This doesn't pass the iterator directly to ''.join() because the
    197 # exceptions aren't as detailed.  The list call should be roughly
    198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
    200 if not isinstance(chunks, (list, tuple)):
    201     chunks = list(chunks)

File ~/.asdf/installs/python/3.9.10/lib/python3.9/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)
    252 else:
    253     _iterencode = _make_iterencode(
    254         markers, self.default, _encoder, self.indent, floatstr,
    255         self.key_separator, self.item_separator, self.sort_keys,
    256         self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)

File ~/.asdf/installs/python/3.9.10/lib/python3.9/json/encoder.py:179, in JSONEncoder.default(self, o)
    160 def default(self, o):
    161     """Implement this method in a subclass such that it returns
    162     a serializable object for ``o``, or calls the base implementation
    163     (to raise a ``TypeError``).
   (...)
    177 
    178     """
--> 179     raise TypeError(f'Object of type {o.__class__.__name__} '
    180                     f'is not JSON serializable')

TypeError: Object of type int64 is not JSON serializable
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Sep 13, 2022
@1oglop1
Copy link
Author

1oglop1 commented Sep 13, 2022

cc @tswast

@tswast tswast added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Sep 13, 2022
@tswast
Copy link
Contributor

tswast commented Sep 13, 2022

Thanks for the report! This indeed appears to be a bug. I'll bring it up with the team to investigate.

@tswast
Copy link
Contributor

tswast commented Sep 13, 2022

@1oglop1 Would you mind sharing the version of pandas you are using, too?

@1oglop1
Copy link
Author

1oglop1 commented Sep 13, 2022

@tswast Thanks for responding so fast, our version is pandas-1.4.3

@danilo-nzyte
Copy link

Experiencing the same issue on pandas==1.5.0 too.

Ended up having to change the method to insert_rows_json and include an extra line of code that feels a bit messy, but it does the job:

json_rows = [json.loads(i) for i in df.apply(lambda x: x.to_json(date_format='iso'), axis=1)]

errors = client.insert_rows_json(
    client.get_table(table_ref), json_rows
)

The insert_rows_json method expects a list of dictionaries, to_json() returns a string.

I have to use pd.to_json so that Pandas handles the conversion of the different types (specifically Int64 and values).

If anybody thinks of a more efficient way of doing it, please let me know!

@VovaViliLox
Copy link

@tswast Any updates to this issue?

@galz10 galz10 self-assigned this Apr 20, 2023
@galz10 galz10 added the status: investigating The issue is under investigation, which is determined to be non-trivial. label Apr 20, 2023
@galz10
Copy link
Contributor

galz10 commented Apr 24, 2023

Hi @VovaViliLox following your directions i did not run into the issue you received, I was able to insert rows into the table. Could you please check again to see if this issue was fixed.

@galz10
Copy link
Contributor

galz10 commented Apr 26, 2023

Since i cannot reproduce this issue, i'll close this issue. I used the latest version 3.10.0 when trying to reproduce the issue.

@galz10 galz10 closed this as completed Apr 26, 2023
@ana-silvia-s
Copy link

Hi, I just ran into this issue using pandas 2.0.3 and bigquery 3.13.0:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 64
     60     print("Loaded recommendations")
     61     batch_recos["score"] = batch_recos["score"] * generate_noise(
     62         mean=0, stddev=0.5, recos=batch_recos
     63     )
---> 64     client.insert_rows_from_dataframe(
     65         table=destination_table, dataframe=batch_recos
     66     )
     67     print(f"Inserted into {destination_table}")
     68 print("Finished!")

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:3537, in Client.insert_rows_from_dataframe(self, table, dataframe, selected_fields, chunk_size, **kwargs)
   3535 for _ in range(chunk_count):
   3536     rows_chunk = itertools.islice(rows_iter, chunk_size)
-> 3537     result = self.insert_rows(table, rows_chunk, selected_fields, **kwargs)
   3538     insert_results.append(result)
   3540 return insert_results

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:3480, in Client.insert_rows(self, table, rows, selected_fields, **kwargs)
   3471     raise ValueError(
   3472         (
   3473             "Could not determine schema for table '{}'. Call client.get_table() "
   3474             "or pass in a list of schema fields to the selected_fields argument."
   3475         ).format(table)
   3476     )
   3478 json_rows = [_record_field_to_json(schema, row) for row in rows]
-> 3480 return self.insert_rows_json(table, json_rows, **kwargs)

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:3676, in Client.insert_rows_json(self, table, json_rows, row_ids, skip_invalid_rows, ignore_unknown_values, template_suffix, retry, timeout)
   3674 # We can always retry, because every row has an insert ID.
   3675 span_attributes = {"path": path}
-> 3676 response = self._call_api(
   3677     retry,
   3678     span_name="BigQuery.insertRowsJson",
   3679     span_attributes=span_attributes,
   3680     method="POST",
   3681     path=path,
   3682     data=data,
   3683     timeout=timeout,
   3684 )
   3685 errors = []
   3687 for error in response.get("insertErrors", ()):

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:808, in Client._call_api(self, retry, span_name, span_attributes, job_ref, headers, **kwargs)
    804 if span_name is not None:
    805     with create_span(
    806         name=span_name, attributes=span_attributes, client=self, job_ref=job_ref
    807     ):
--> 808         return call()
    810 return call()

File /opt/conda/lib/python3.10/site-packages/google/api_core/retry.py:349, in Retry.__call__.<locals>.retry_wrapped_func(*args, **kwargs)
    345 target = functools.partial(func, *args, **kwargs)
    346 sleep_generator = exponential_sleep_generator(
    347     self._initial, self._maximum, multiplier=self._multiplier
    348 )
--> 349 return retry_target(
    350     target,
    351     self._predicate,
    352     sleep_generator,
    353     self._timeout,
    354     on_error=on_error,
    355 )

File /opt/conda/lib/python3.10/site-packages/google/api_core/retry.py:191, in retry_target(target, predicate, sleep_generator, timeout, on_error, **kwargs)
    189 for sleep in sleep_generator:
    190     try:
--> 191         return target()
    193     # pylint: disable=broad-except
    194     # This function explicitly must deal with broad exceptions.
    195     except Exception as exc:

File /opt/conda/lib/python3.10/site-packages/google/cloud/_http/__init__.py:479, in JSONConnection.api_request(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout, extra_api_info)
    476 # Making the executive decision that any dictionary
    477 # data will be sent properly as JSON.
    478 if data and isinstance(data, dict):
--> 479     data = json.dumps(data)
    480     content_type = "application/json"
    482 response = self._make_request(
    483     method=method,
    484     url=url,
   (...)
    490     extra_api_info=extra_api_info,
    491 )

File /opt/conda/lib/python3.10/json/__init__.py:231, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    226 # cached encoder
    227 if (not skipkeys and ensure_ascii and
    228     check_circular and allow_nan and
    229     cls is None and indent is None and separators is None and
    230     default is None and not sort_keys and not kw):
--> 231     return _default_encoder.encode(obj)
    232 if cls is None:
    233     cls = JSONEncoder

File /opt/conda/lib/python3.10/json/encoder.py:199, in JSONEncoder.encode(self, o)
    195         return encode_basestring(o)
    196 # This doesn't pass the iterator directly to ''.join() because the
    197 # exceptions aren't as detailed.  The list call should be roughly
    198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
    200 if not isinstance(chunks, (list, tuple)):
    201     chunks = list(chunks)

File /opt/conda/lib/python3.10/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)
    252 else:
    253     _iterencode = _make_iterencode(
    254         markers, self.default, _encoder, self.indent, floatstr,
    255         self.key_separator, self.item_separator, self.sort_keys,
    256         self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)

File /opt/conda/lib/python3.10/json/encoder.py:179, in JSONEncoder.default(self, o)
    160 def default(self, o):
    161     """Implement this method in a subclass such that it returns
    162     a serializable object for ``o``, or calls the base implementation
    163     (to raise a ``TypeError``).
   (...)
    177 
    178     """
--> 179     raise TypeError(f'Object of type {o.__class__.__name__} '
    180                     f'is not JSON serializable')

TypeError: Object of type int64 is not JSON serializable

I will try out the "solution" danilo-nzyte proposed in this thread but would rather have some sort of official resolution, if possible! thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. status: investigating The issue is under investigation, which is determined to be non-trivial. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

6 participants