Skip to content

reporting crash when running datahub-gc source from CLI #15445

@daha

Description

@daha

Describe the bug
When I run the datahub-gc source in the CLI, like datahub ingest -c path-to-recipe.yml I get a crash towards the end like below:

[2025-11-28 14:24:03,292] INFO     {datahub.ingestion.source.gc.datahub_gc:168} - Time spent in stage <Execution request Cleanup at 2025-11-28 13:22:33.479504+00:00>: 89.81 seconds
[2025-11-28 14:24:03,293] WARNING  {datahub.ingestion.run.pipeline:416} - Reporting failed on completion
Traceback (most recent call last):
  File "/home/david/git/datahub-proview/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 398, in _notify_reporters_on_ingestion_completion
    reporter.on_completion(
  File "/home/david/git/datahub-proview/.venv/lib/python3.10/site-packages/datahub/ingestion/reporting/datahub_ingestion_run_summary_provider.py", line 245, in on_completion
    structured_report_str = json.dumps(report, indent=2)
  File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/home/david/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/json/encoder.py", line 376, in _iterencode_dict
    raise TypeError(f'keys must be str, int, float, bool or None, '
TypeError: keys must be str, int, float, bool or None, not tuple

I think it crashes on the content in ingestion_stage_durations:

 'ingestion_stage_durations': {(<IngestionHighStage._UNDEFINED: 'Ingestion'>, 'Expired Token Cleanup at 2025-11-28 13:10:36.084722+00:00'): 0.05,
                               (<IngestionHighStage._UNDEFINED: 'Ingestion'>, 'Truncate Indices at 2025-11-28 13:10:36.131842+00:00'): 1.96,
                               (<IngestionHighStage._UNDEFINED: 'Ingestion'>, 'Soft Deleted Entities Cleanup at 2025-11-28 13:10:38.093062+00:00'): 234.78,
                               (<IngestionHighStage._UNDEFINED: 'Ingestion'>, 'Data Process Cleanup at 2025-11-28 13:14:32.875259+00:00'): 480.6,
                               (<IngestionHighStage._UNDEFINED: 'Ingestion'>, 'Execution request Cleanup at 2025-11-28 13:22:33.479504+00:00'): 89.81},

To Reproduce
Steps to reproduce the behavior:

  1. Start datahub (like datahub docker quickstart and have some data in it)
  2. Point to datahub with ~/.datahubenv
  3. just run datahub ingest -c datahub-gc.dhub.yml with a datahub-gc.dhub.yml like below:
source:
  type: datahub-gc
  config: {}

Expected behavior
The reporting should not crash.

Desktop (please complete the following information):

  • OS: Linux (Debian 12.12)

Additional context
acryl-datahub==1.3.0

Metadata

Metadata

Assignees

Labels

bugBug report

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions