-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
insert_rows_from_dataframe produces int64 is not JSON serializable #1348
Comments
cc @tswast |
Thanks for the report! This indeed appears to be a bug. I'll bring it up with the team to investigate. |
@1oglop1 Would you mind sharing the version of pandas you are using, too? |
@tswast Thanks for responding so fast, our version is |
Experiencing the same issue on Ended up having to change the method to json_rows = [json.loads(i) for i in df.apply(lambda x: x.to_json(date_format='iso'), axis=1)]
errors = client.insert_rows_json(
client.get_table(table_ref), json_rows
) The I have to use If anybody thinks of a more efficient way of doing it, please let me know! |
@tswast Any updates to this issue? |
Hi @VovaViliLox following your directions i did not run into the issue you received, I was able to insert rows into the table. Could you please check again to see if this issue was fixed. |
Since i cannot reproduce this issue, i'll close this issue. I used the latest version 3.10.0 when trying to reproduce the issue. |
Hi, I just ran into this issue using pandas 2.0.3 and bigquery 3.13.0: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[16], line 64
60 print("Loaded recommendations")
61 batch_recos["score"] = batch_recos["score"] * generate_noise(
62 mean=0, stddev=0.5, recos=batch_recos
63 )
---> 64 client.insert_rows_from_dataframe(
65 table=destination_table, dataframe=batch_recos
66 )
67 print(f"Inserted into {destination_table}")
68 print("Finished!")
File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:3537, in Client.insert_rows_from_dataframe(self, table, dataframe, selected_fields, chunk_size, **kwargs)
3535 for _ in range(chunk_count):
3536 rows_chunk = itertools.islice(rows_iter, chunk_size)
-> 3537 result = self.insert_rows(table, rows_chunk, selected_fields, **kwargs)
3538 insert_results.append(result)
3540 return insert_results
File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:3480, in Client.insert_rows(self, table, rows, selected_fields, **kwargs)
3471 raise ValueError(
3472 (
3473 "Could not determine schema for table '{}'. Call client.get_table() "
3474 "or pass in a list of schema fields to the selected_fields argument."
3475 ).format(table)
3476 )
3478 json_rows = [_record_field_to_json(schema, row) for row in rows]
-> 3480 return self.insert_rows_json(table, json_rows, **kwargs)
File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:3676, in Client.insert_rows_json(self, table, json_rows, row_ids, skip_invalid_rows, ignore_unknown_values, template_suffix, retry, timeout)
3674 # We can always retry, because every row has an insert ID.
3675 span_attributes = {"path": path}
-> 3676 response = self._call_api(
3677 retry,
3678 span_name="BigQuery.insertRowsJson",
3679 span_attributes=span_attributes,
3680 method="POST",
3681 path=path,
3682 data=data,
3683 timeout=timeout,
3684 )
3685 errors = []
3687 for error in response.get("insertErrors", ()):
File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/client.py:808, in Client._call_api(self, retry, span_name, span_attributes, job_ref, headers, **kwargs)
804 if span_name is not None:
805 with create_span(
806 name=span_name, attributes=span_attributes, client=self, job_ref=job_ref
807 ):
--> 808 return call()
810 return call()
File /opt/conda/lib/python3.10/site-packages/google/api_core/retry.py:349, in Retry.__call__.<locals>.retry_wrapped_func(*args, **kwargs)
345 target = functools.partial(func, *args, **kwargs)
346 sleep_generator = exponential_sleep_generator(
347 self._initial, self._maximum, multiplier=self._multiplier
348 )
--> 349 return retry_target(
350 target,
351 self._predicate,
352 sleep_generator,
353 self._timeout,
354 on_error=on_error,
355 )
File /opt/conda/lib/python3.10/site-packages/google/api_core/retry.py:191, in retry_target(target, predicate, sleep_generator, timeout, on_error, **kwargs)
189 for sleep in sleep_generator:
190 try:
--> 191 return target()
193 # pylint: disable=broad-except
194 # This function explicitly must deal with broad exceptions.
195 except Exception as exc:
File /opt/conda/lib/python3.10/site-packages/google/cloud/_http/__init__.py:479, in JSONConnection.api_request(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout, extra_api_info)
476 # Making the executive decision that any dictionary
477 # data will be sent properly as JSON.
478 if data and isinstance(data, dict):
--> 479 data = json.dumps(data)
480 content_type = "application/json"
482 response = self._make_request(
483 method=method,
484 url=url,
(...)
490 extra_api_info=extra_api_info,
491 )
File /opt/conda/lib/python3.10/json/__init__.py:231, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
226 # cached encoder
227 if (not skipkeys and ensure_ascii and
228 check_circular and allow_nan and
229 cls is None and indent is None and separators is None and
230 default is None and not sort_keys and not kw):
--> 231 return _default_encoder.encode(obj)
232 if cls is None:
233 cls = JSONEncoder
File /opt/conda/lib/python3.10/json/encoder.py:199, in JSONEncoder.encode(self, o)
195 return encode_basestring(o)
196 # This doesn't pass the iterator directly to ''.join() because the
197 # exceptions aren't as detailed. The list call should be roughly
198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
200 if not isinstance(chunks, (list, tuple)):
201 chunks = list(chunks)
File /opt/conda/lib/python3.10/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)
252 else:
253 _iterencode = _make_iterencode(
254 markers, self.default, _encoder, self.indent, floatstr,
255 self.key_separator, self.item_separator, self.sort_keys,
256 self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)
File /opt/conda/lib/python3.10/json/encoder.py:179, in JSONEncoder.default(self, o)
160 def default(self, o):
161 """Implement this method in a subclass such that it returns
162 a serializable object for ``o``, or calls the base implementation
163 (to raise a ``TypeError``).
(...)
177
178 """
--> 179 raise TypeError(f'Object of type {o.__class__.__name__} '
180 f'is not JSON serializable')
TypeError: Object of type int64 is not JSON serializable I will try out the "solution" danilo-nzyte proposed in this thread but would rather have some sort of official resolution, if possible! thank you! |
I believe that this PR #786 introduced a regression in the function Client.insert_rows_from_dataframe
Because
to_dataframe
now produces the typeInt64
and the function_pandas_helpers.dataframe_to_json_generator
cannot handle this properly.python-bigquery/google/cloud/bigquery/client.py
Line 3461 in af0101d
which results in the error:
TypeError: Object of type int64 is not JSON serializable
Environment details
3.9.10
google-cloud-bigquery
version:Version: 3.3.2
Steps to reproduce
df = client.query(sql).to_dataframe()
client.insert_rows_from_dataframe(table=client.get_table("table_name"), dataframe=user_data)
Stack trace
The text was updated successfully, but these errors were encountered: