-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Closed
Milestone
Description
What happened?
I am building an application in Apache Beam and Python that runs in Google DataFlow. I am using the ReadFromSpanner method in apache_beam.io.gcp.experimental.spannerio.
On operation ReadFromSpanner, I get an error:
Python sdk harness failed:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py", line 366, in create_monitoring_info
return metrics_pb2.MonitoringInfo(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen _collections_abc>", line 949, in update
TypeError: bad argument type for built-in operation
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py", line 212, in main
sdk_harness.run()
File "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py", line 283, in run
getattr(self, SdkHarness.REQUEST_METHOD_PREFIX + request_type)(
File "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py", line 361, in _request_harness_monitoring_infos
).to_runner_api_monitoring_infos(None).values()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "apache_beam/metrics/execution.py", line 334, in apache_beam.metrics.execution.MetricsContainer.to_runner_api_monitoring_infos
File "apache_beam/metrics/cells.py", line 76, in apache_beam.metrics.cells.MetricCell.to_runner_api_monitoring_info
File "apache_beam/metrics/cells.py", line 158, in apache_beam.metrics.cells.CounterCell.to_runner_api_monitoring_info_impl
File "/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py", line 233, in int64_counter
return create_monitoring_info(urn, SUM_INT64_TYPE, metric, labels)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py", line 369, in create_monitoring_info
raise RuntimeError(
RuntimeError: Failed to create MonitoringInfo for urn beam:metric:io:api_request_count:v1 type <class 'type'> labels {labels} and payload {payload}
return metrics_pb2.MonitoringInfo(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen _collections_abc>", line 949, in update
TypeError: bad argument type for built-in operation
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py", line 367, in <module>
main(sys.argv)
File "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py", line 212, in main
sdk_harness.run()
File "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py", line 283, in run
getattr(self, SdkHarness.REQUEST_METHOD_PREFIX + request_type)(
File "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py", line 361, in _request_harness_monitoring_infos
).to_runner_api_monitoring_infos(None).values()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "apache_beam/metrics/execution.py", line 334, in apache_beam.metrics.execution.MetricsContainer.to_runner_api_monitoring_infos
File "apache_beam/metrics/cells.py", line 76, in apache_beam.metrics.cells.MetricCell.to_runner_api_monitoring_info
File "apache_beam/metrics/cells.py", line 158, in apache_beam.metrics.cells.CounterCell.to_runner_api_monitoring_info_impl
File "/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py", line 233, in int64_counter
return create_monitoring_info(urn, SUM_INT64_TYPE, metric, labels)
I get this error ONLY with tables that have more than several hundred rows; on small tables, the error is not presented
I tried to increase the number of workers and change the machine type, but it didn't help
Example of my pipeline:
options = PipelineOptions([
'--runner=DataflowRunner',
'--project=project',
'--region=us-east4',
'--temp_location=gs-path',
'--staging_location=gs-path',
'--experiments=shuffle_mode=service',
f'--job_name=beam-test'
])
read_operations = [
ReadOperation.table(table=table, columns=[column]),
]
with beam.Pipeline(options=options) as pipeline:
all_users = pipeline | ReadFromSpanner(
'project', 'instance', 'database',
read_operations=read_operations,
)
Versions:
apache_beam[gcp]==2.66.0
Python 3.11.8
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner