Skip to content

Conversation

@alamashir
Copy link
Contributor

Description

Fixes #50387 - Bug in Data Fusion hook where start_pipeline crashes with KeyError: 'runId' when pipelines fail to start.

Problem

The hook used the multi-program start endpoint which returns HTTP 200 even on failures. The code accessed response_json[0]["runId"] without validation, causing a KeyError when the field was missing.

Solution

  • Switch to single-program start endpoint: POST .../apps/{app}/{program-type}s/{program-id}/start
  • Validate runId exists in response before accessing
  • Return clear error messages when pipelines fail to start

Changes

Updated start_pipeline method to use correct CDAP endpoint and added response validation.

Breaking Changes

None - method signature and return type unchanged. Only the internal API endpoint changed.

Testing

pytest providers/google/tests/unit/google/cloud/hooks/test_datafusion.py -v

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Nov 25, 2025
@VladaZakharova
Copy link
Contributor

hi there! thank you for the fix. Can please also provide the screenshot of the green system tests running in Airflow UI?

@alamashir
Copy link
Contributor Author

@VladaZakharova Thank you for the feedback! I've verified the fix works correctly with all 45 unit tests passing, including a new test specifically for the error case (test_start_pipeline_should_fail_if_no_run_id).

Running the full system tests requires setting up and provisioning a Data Fusion instance. Given that the unit tests comprehensively cover the fix (switching from multi-program to single-program endpoint + runId validation), I believe they provide sufficient verification. However, if you'd like me to set up and run the full system tests, I'm happy to do so.

@shahar1
Copy link
Contributor

shahar1 commented Nov 29, 2025

owever, if you'd like me to set up and run the full system tests, I'm happy to do so.

@VladaZakharova Thank you for the feedback! I've verified the fix works correctly with all 45 unit tests passing, including a new test specifically for the error case (test_start_pipeline_should_fail_if_no_run_id).

Running the full system tests requires setting up and provisioning a Data Fusion instance. Given that the unit tests comprehensively cover the fix (switching from multi-program to single-program endpoint + runId validation), I believe they provide sufficient verification. However, if you'd like me to set up and run the full system tests, I'm happy to do so.

If you're able to run the system tests on your side and provide some screenshots, we'll be happy if you could do so.
If not, please let us know - I believe that Vlada's team could figure out a solution for that.

@alamashir
Copy link
Contributor Author

@shahar1 @VladaZakharova if you can, that will be great.
I spent some time setting up CDF session but had some issues and that will need quite some time for me to setup. I can spend more time if needed end of next week. So if i dont hear back from you all by then, i can do it.

…perly

The start_pipeline method was using the multi-program start endpoint which
returns HTTP 200 even when individual programs fail to start. This caused
a KeyError when trying to access the runId from error responses.

Changes:
- Updated start_pipeline to use single-program start endpoint
- Added validation to check if runId exists in response before accessing it
- Improved error messages to provide context about failures
- Updated tests to reflect new endpoint and added test for missing runId scenario

Fixes apache#50387
@potiuk potiuk force-pushed the fix-50387-datafusion-start-pipeline-runid-error branch from 975a6f7 to 0669841 Compare November 29, 2025 23:21
@potiuk potiuk merged commit a4f2b33 into apache:main Dec 20, 2025
82 checks passed
Subham-KRLX pushed a commit to Subham-KRLX/airflow that referenced this pull request Jan 2, 2026
…perly (apache#58698)

* Fix Google Cloud Data Fusion hook to handle pipeline start errors properly

The start_pipeline method was using the multi-program start endpoint which
returns HTTP 200 even when individual programs fail to start. This caused
a KeyError when trying to access the runId from error responses.

Changes:
- Updated start_pipeline to use single-program start endpoint
- Added validation to check if runId exists in response before accessing it
- Improved error messages to provide context about failures
- Updated tests to reflect new endpoint and added test for missing runId scenario

Fixes apache#50387

* Add spec to MagicMock for better static type checking
stegololz pushed a commit to stegololz/airflow that referenced this pull request Jan 9, 2026
…perly (apache#58698)

* Fix Google Cloud Data Fusion hook to handle pipeline start errors properly

The start_pipeline method was using the multi-program start endpoint which
returns HTTP 200 even when individual programs fail to start. This caused
a KeyError when trying to access the runId from error responses.

Changes:
- Updated start_pipeline to use single-program start endpoint
- Added validation to check if runId exists in response before accessing it
- Improved error messages to provide context about failures
- Updated tests to reflect new endpoint and added test for missing runId scenario

Fixes apache#50387

* Add spec to MagicMock for better static type checking
shahar1 added a commit to shahar1/airflow that referenced this pull request Jan 17, 2026
jscheffl pushed a commit that referenced this pull request Jan 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Airflow Datafusion Hook: Bug in CDAP Program Start Status Validation & API Usage

5 participants