-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Labels
bugBug reportBug report
Description
Describe the bug
When I run the datahub-gc source in the CLI, like datahub ingest -c path-to-recipe.yml I get a crash towards the end like below:
[2025-11-28 14:24:03,292] INFO {datahub.ingestion.source.gc.datahub_gc:168} - Time spent in stage <Execution request Cleanup at 2025-11-28 13:22:33.479504+00:00>: 89.81 seconds
[2025-11-28 14:24:03,293] WARNING {datahub.ingestion.run.pipeline:416} - Reporting failed on completion
Traceback (most recent call last):
File "/home/david/git/datahub-proview/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 398, in _notify_reporters_on_ingestion_completion
reporter.on_completion(
File "/home/david/git/datahub-proview/.venv/lib/python3.10/site-packages/datahub/ingestion/reporting/datahub_ingestion_run_summary_provider.py", line 245, in on_completion
structured_report_str = json.dumps(report, indent=2)
File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 376, in _iterencode_dict
raise TypeError(f'keys must be str, int, float, bool or None, '
TypeError: keys must be str, int, float, bool or None, not tuple
I think it crashes on the content in ingestion_stage_durations:
'ingestion_stage_durations': {(<IngestionHighStage._UNDEFINED: 'Ingestion'>, 'Expired Token Cleanup at 2025-11-28 13:10:36.084722+00:00'): 0.05,
(<IngestionHighStage._UNDEFINED: 'Ingestion'>, 'Truncate Indices at 2025-11-28 13:10:36.131842+00:00'): 1.96,
(<IngestionHighStage._UNDEFINED: 'Ingestion'>, 'Soft Deleted Entities Cleanup at 2025-11-28 13:10:38.093062+00:00'): 234.78,
(<IngestionHighStage._UNDEFINED: 'Ingestion'>, 'Data Process Cleanup at 2025-11-28 13:14:32.875259+00:00'): 480.6,
(<IngestionHighStage._UNDEFINED: 'Ingestion'>, 'Execution request Cleanup at 2025-11-28 13:22:33.479504+00:00'): 89.81},To Reproduce
Steps to reproduce the behavior:
- Start datahub (like datahub docker quickstart and have some data in it)
- Point to datahub with
~/.datahubenv - just run
datahub ingest -c datahub-gc.dhub.ymlwith a datahub-gc.dhub.yml like below:
source:
type: datahub-gc
config: {}Expected behavior
The reporting should not crash.
Desktop (please complete the following information):
- OS: Linux (Debian 12.12)
Additional context
acryl-datahub==1.3.0
Metadata
Metadata
Assignees
Labels
bugBug reportBug report