Fix Google Cloud Data Fusion hook to handle pipeline start errors properly #58698

alamashir · 2025-11-25T23:46:38Z

Description

Fixes #50387 - Bug in Data Fusion hook where start_pipeline crashes with KeyError: 'runId' when pipelines fail to start.

Problem

The hook used the multi-program start endpoint which returns HTTP 200 even on failures. The code accessed response_json[0]["runId"] without validation, causing a KeyError when the field was missing.

Solution

Switch to single-program start endpoint: POST .../apps/{app}/{program-type}s/{program-id}/start
Validate runId exists in response before accessing
Return clear error messages when pipelines fail to start

Changes

Updated start_pipeline method to use correct CDAP endpoint and added response validation.

Breaking Changes

None - method signature and return type unchanged. Only the internal API endpoint changed.

Testing

pytest providers/google/tests/unit/google/cloud/hooks/test_datafusion.py -v

providers/google/tests/unit/google/cloud/hooks/test_datafusion.py

VladaZakharova · 2025-11-26T10:40:06Z

hi there! thank you for the fix. Can please also provide the screenshot of the green system tests running in Airflow UI?

alamashir · 2025-11-28T03:05:02Z

@VladaZakharova Thank you for the feedback! I've verified the fix works correctly with all 45 unit tests passing, including a new test specifically for the error case (test_start_pipeline_should_fail_if_no_run_id).

Running the full system tests requires setting up and provisioning a Data Fusion instance. Given that the unit tests comprehensively cover the fix (switching from multi-program to single-program endpoint + runId validation), I believe they provide sufficient verification. However, if you'd like me to set up and run the full system tests, I'm happy to do so.

shahar1 · 2025-11-29T14:30:03Z

owever, if you'd like me to set up and run the full system tests, I'm happy to do so.

@VladaZakharova Thank you for the feedback! I've verified the fix works correctly with all 45 unit tests passing, including a new test specifically for the error case (test_start_pipeline_should_fail_if_no_run_id).

Running the full system tests requires setting up and provisioning a Data Fusion instance. Given that the unit tests comprehensively cover the fix (switching from multi-program to single-program endpoint + runId validation), I believe they provide sufficient verification. However, if you'd like me to set up and run the full system tests, I'm happy to do so.

If you're able to run the system tests on your side and provide some screenshots, we'll be happy if you could do so.
If not, please let us know - I believe that Vlada's team could figure out a solution for that.

alamashir · 2025-11-29T14:35:42Z

@shahar1 @VladaZakharova if you can, that will be great.
I spent some time setting up CDF session but had some issues and that will need quite some time for me to setup. I can spend more time if needed end of next week. So if i dont hear back from you all by then, i can do it.

…perly The start_pipeline method was using the multi-program start endpoint which returns HTTP 200 even when individual programs fail to start. This caused a KeyError when trying to access the runId from error responses. Changes: - Updated start_pipeline to use single-program start endpoint - Added validation to check if runId exists in response before accessing it - Improved error messages to provide context about failures - Updated tests to reflect new endpoint and added test for missing runId scenario Fixes apache#50387

…perly (apache#58698) * Fix Google Cloud Data Fusion hook to handle pipeline start errors properly The start_pipeline method was using the multi-program start endpoint which returns HTTP 200 even when individual programs fail to start. This caused a KeyError when trying to access the runId from error responses. Changes: - Updated start_pipeline to use single-program start endpoint - Added validation to check if runId exists in response before accessing it - Improved error messages to provide context about failures - Updated tests to reflect new endpoint and added test for missing runId scenario Fixes apache#50387 * Add spec to MagicMock for better static type checking

…rors properly (apache#58698)" This reverts commit a4f2b33.

…rors properly (#58698)" (#60701) This reverts commit a4f2b33.

boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Nov 25, 2025

dabla reviewed Nov 26, 2025

View reviewed changes

providers/google/tests/unit/google/cloud/hooks/test_datafusion.py Outdated Show resolved Hide resolved

providers/google/tests/unit/google/cloud/hooks/test_datafusion.py Outdated Show resolved Hide resolved

VladaZakharova approved these changes Nov 26, 2025

View reviewed changes

shahar1 approved these changes Nov 29, 2025

View reviewed changes

alamashir added 2 commits November 30, 2025 00:21

Add spec to MagicMock for better static type checking

0669841

potiuk force-pushed the fix-50387-datafusion-start-pipeline-runid-error branch from 975a6f7 to 0669841 Compare November 29, 2025 23:21

potiuk merged commit a4f2b33 into apache:main Dec 20, 2025
82 checks passed

shahar1 mentioned this pull request Dec 30, 2025

Status of testing Providers that were prepared on December 30, 2025 #59952

Closed

54 tasks

MaksYermak mentioned this pull request Jan 16, 2026

Airflow Datafusion Hook: Regression in the work of start_pipeline method for DataFusionHook #60661

Closed

2 tasks

suii2210 mentioned this pull request Jan 17, 2026

No dag run/context in task_instance_mutation_hook #60610

Open

2 tasks

shahar1 added a commit to shahar1/airflow that referenced this pull request Jan 17, 2026

Revert "Fix Google Cloud Data Fusion hook to handle pipeline start er…

7724259

…rors properly (apache#58698)" This reverts commit a4f2b33.

This was referenced Jan 17, 2026

Revert "Fix Google Cloud Data Fusion hook to handle pipeline start errors properly (#58698) #60701

Merged

Airflow Datafusion Hook: Bug in CDAP Program Start Status Validation & API Usage #50387

Open

jscheffl pushed a commit that referenced this pull request Jan 17, 2026

Revert "Fix Google Cloud Data Fusion hook to handle pipeline start er…

f466c77

…rors properly (#58698)" (#60701) This reverts commit a4f2b33.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Google Cloud Data Fusion hook to handle pipeline start errors properly #58698

Fix Google Cloud Data Fusion hook to handle pipeline start errors properly #58698

Uh oh!

alamashir commented Nov 25, 2025

Uh oh!

Uh oh!

Uh oh!

VladaZakharova commented Nov 26, 2025

Uh oh!

alamashir commented Nov 28, 2025

Uh oh!

shahar1 commented Nov 29, 2025

Uh oh!

alamashir commented Nov 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix Google Cloud Data Fusion hook to handle pipeline start errors properly #58698

Fix Google Cloud Data Fusion hook to handle pipeline start errors properly #58698

Uh oh!

Conversation

alamashir commented Nov 25, 2025

Description

Problem

Solution

Changes

Breaking Changes

Testing

Uh oh!

Uh oh!

Uh oh!

VladaZakharova commented Nov 26, 2025

Uh oh!

alamashir commented Nov 28, 2025

Uh oh!

shahar1 commented Nov 29, 2025

Uh oh!

alamashir commented Nov 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants