Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent templated field logic checks in operators __init__ in BigQueryToPostgresOperator operator #36491

Merged
merged 1 commit into from
Dec 29, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@
"""This module contains Google BigQuery to PostgreSQL operator."""
from __future__ import annotations

from typing import Sequence

from airflow.providers.google.cloud.transfers.bigquery_to_sql import BigQueryToSqlBaseOperator
from airflow.providers.postgres.hooks.postgres import PostgresHook

Expand All @@ -36,8 +34,6 @@ class BigQueryToPostgresOperator(BigQueryToSqlBaseOperator):
:param postgres_conn_id: Reference to :ref:`postgres connection id <howto/connection:postgres>`.
"""

template_fields: Sequence[str] = (*BigQueryToSqlBaseOperator.template_fields, "dataset_id", "table_id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a breaking change?

I think the real issue is with:

try:
self.dataset_id, self.table_id = dataset_table.split(".")
except ValueError:
raise ValueError(f"Could not parse {dataset_table} as <dataset>.<table>") from None

and it will affect all operators that inhert from the base class

Copy link
Contributor

@shahar1 shahar1 Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a breaking change?

I think the real issue is with:

try:
self.dataset_id, self.table_id = dataset_table.split(".")
except ValueError:
raise ValueError(f"Could not parse {dataset_table} as <dataset>.<table>") from None

and it will affect all operators that inhert from the base class

Coming to think of it, it might be breaking indeed as fields that don't exist in the parent's template_fields are removed from child's definition.
I suggest reverting it for now.
@romsharon98 instead of deleting this line, try to hardcode all of the values that should be templated, and see if it works (a bit ugly, but I don't have better idea for now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your insight @eladkal !
Can you help me understand why is this breaking change?

This is how I understand it:
Lets assume I revert the PR, so both "dataset_id", "table_id" are templated field for BigQueryToPostgresOperator.

But the parent constructor (BigQueryToSqlBaseOperator) always run them over by the line you mentioned.

So as I understand it this reverted line has no meaning.

Copy link
Member

@potiuk potiuk Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCRATCH_THAT:

I think we should not treat it as breaking (or at least what I undertstand it here).

I think the only scenario where it would matter is:

  1. Someone creates a custom operator derived from BigQueryToPostgresOperator
  2. The same someone adds new fields there "dataset_id" and "table_id"
  3. And expects them to be templated.

Even if it worked previously, that was accidental and unintended and we should treat this change as a bug-fix. If somoene adds new fields in a derived operator it's their responsibilty to add those fields to templated fields.

While this change might technically break someone's implementation, IMHO We should treat it as bugfix because:

a) it's a very low chance this will happen
b) while we are breaking something technically we are bringing things back to how they were intended to work. Having those fields in this operator was accidental not intentional

I will repeat it for as long as it sticks - SemVer and "breaking" classification is not whether something is "technically" broken but whether our intentions changed. If we would apply "breaking change" label for every change that changes behaviour then pretty much every single bugfix is "technically" breaking because it changes behaviour.

UPDATE: I just realized I missed the parent class. Let me revise it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal is right - but we shiould not revert it - instead we should ad those to fields to the base class.

Copy link
Contributor

@shahar1 shahar1 Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal is right - but we shiould not revert it - instead we should ad those to fields to the base class.

Sounds good to me.
A note from a technical perspective of the validation pre-commit -
As the validation is currently based on very simplified AST parsing, it would be better for now to define the fields directly (i.e., template_fields = ['a','b']), rather than relying on parents' fields (i.e, template_fields=(**ParentClass.template_fields,'b'))., otherwise the validation might fail.
The cost would be minimal abuse to the inheritance, which can later be fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is really what I also proposed - to move template_fields = ['dataset_id', 'template_id'] to BigQueryToSqlBaseOperator. This is where they belong and this is what will make them consistent with the AST check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup PR #36663


def __init__(
self,
*,
Expand Down