Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix update-common-sql-api-stubs pre-commit check #38915

Merged
merged 1 commit into from
Apr 11, 2024

Conversation

Taragolis
Copy link
Contributor

This pre-commit check doesn't run due to wrong path and seems like it doesn't run for a long time.

I'm not sure that do we need this stub files, since provider drop support of Airflow 2.4 for a long time, and there is no any subclassing resolution required but this more for the future discussion.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk potiuk merged commit 4f169bd into apache:main Apr 11, 2024
41 checks passed
@Taragolis Taragolis deleted the fix-update-common-sql-api-stubs branch April 11, 2024 07:03
@potiuk
Copy link
Member

potiuk commented Apr 11, 2024

Contex: The main reason for this pre-commit was to detect (and see in PR) when someone modified common.sql interface in a breaking way. It's been implemented experimentally, and I think it's been a failed attempt (in the sense that a) it was broken for a long time b) nobody knew about it. Discussion here: https://lists.apache.org/thread/1by8ko8jrrp1xwxt5781bwn2tokxjodl

I think we should find a better way of doing that check and flag if somoene does a potentiall breaking change in common.sql (which should not happen) and also I think we should add a little better policy on bumping all common.sql dependencies to latest minor version - to avoid cases that somoene uses features from latest version but older common.sql is installed - but then I do not really want ot to be again the person who proposes and leads this alone - so maybe one of you would like to take a lead on it?

@Taragolis
Copy link
Contributor Author

One of the problem that if some contributor want to add some feature into the the specific dialect in half cases it add something into the common.sql rather than into the specific provider, how to prevent this - a have no idea, rather than prohibit this during the code review

How to prevent breaking changes? In most cases it could be done by make DBApiHook as Interface/Protocol only with bare implementation and move specific implementation into the particular hooks, right now a lot of sits into DBApiHook which might make think that something would work with any of the 24 Hooks but it is far away from the reality.

@Taragolis
Copy link
Contributor Author

Taragolis commented Apr 11, 2024

Maybe also run provider's DBApiHooks implementations agains non-main version. (New special tests!), when we run tests against min supported and max released version of the provider instead of code from the main.

I'm not sure that we ready for that, but it might help to find this kind of breaking changes.

WDYT @potiuk

@potiuk
Copy link
Member

potiuk commented Apr 11, 2024

One of the problem that if some contributor want to add some feature into the the specific dialect in half cases it add something into the common.sql rather than into the specific provider, how to prevent this - a have no idea, rather than prohibit this during the code review

Yes. I think we have to be simply careful with those - like in #38715 . The stub was mostly about the .pyi interface changes, but of course behavioural changes are only detectable (currently) by review. cc: @dabla - maybe good to get comments from a recent "fresh" contributor to those.

How to prevent breaking changes? In most cases it could be done by make DBApiHook as Interface/Protocol only with bare implementation and move specific implementation into the particular hooks, right now a lot of sits into DBApiHook which might make think that something would work with any of the 24 Hooks but it is far away from the reality.

To be perfectly honest - I think far from reality is not the right statement :) - I think some parts (especially format and serialization of returned data) might work a bit differently - but this was not even the primary use case for all DBApiHook (for a very long time returning data from DBApiHooks was treated as mis-use.

For the context - mostly it was about DML/DDL queries and making sure all the SQL-related operators that are implemented in common.sql work and their logic is not copied in all 24 providers (before making them comon we had several 10s of copies of those operators accross the 24 providers"

  • SQLColumnCheckOperator
  • SQLTableCheckOperator
  • SQLCheckOperator
  • SQLValueCheckOperator
  • SQLIntervalCheckOperator
  • SQLThresholdCheckOperator
  • BranchSQLOperator

But if you think we can make it less vulnerable for coupling problems by minimising implementation of hooks and possibly copying the code to hooks, yes, I think it's a good idea.

Maybe also run provider's DBApiHooks implementations agains non-main version. (New special tests!), when we run tests against min supported and max released version of the provider instead of code from the main.

Yes I thought about it - but then we have 12 MINOR versions of common.sql and several providers have different >= DIFFERENT version, So it's hard to make a simple and meanigful set of tests there - especially that just "import" is not really enough, we should also run actual tests, and we currently run tests only in main. This is why I proposed >= LAST_MAIN for all providers - then we will be generally forced to fix any incompatibilties in common.sql (which we do anyway BTW).

@Taragolis
Copy link
Contributor Author

The problem here, that we talk about implementation about SQL, and there is no one database which implements all features from the modern SQL Standard (SQL ANSI 92 outdated decades ago). And even if 2 databases implements one part from the standard it might do it differently and even implements partially, because SQL:Standard know as collection of outdated or not applicable in to the real word examples/types/statements.

The truth as usual somewhere in between "copy all into the generic" and "keep it all in providers".

For the context - mostly it was about DML/DDL queries and making sure all the SQL-related operators that are implemented in common.sql work and their logic is not copied in all 24 providers

I guess, I would like to propose look how it resolved in SQLAlchemy, as a better example. And it resolved on feature on the dialect, if some particular feature doesn't supported into the specific Dialect you get an error, rather than generic implementation.

So we might want to get this approach, e.g. assign specific class as class Attribute/Variable and we could check is it supported and how it should support:

  • not-implemented: (Default) This feature not implemented for the hook. There is should be an assumption that new features not implemented by default.
  • generic: Use generic approach. Generic approach sits behind the private/internal methods with final decorator. If it set to this value the DBApiHook as well as common operators will use internal methods
  • implemented: Opposite of the generic
  • legacy: it might be also an option to fallback with warning to the previous implementation
class DBApiHook:
    ...
    features: dict[str, Literal["not-implemented", "generic", "implemented"]] # or TypedDict

    @final
    def _generic_awesome_feature(self, foo=foo, bar=bar):
        return self._generic_awesome_feature(foo=foo, bar=bar)

    @final
    def awesome_feature(self, *, foo, bar) -> SomeAwesomeAnnotation:
        if (feature_flag := self.features.get("awesome", "not-implemented")) == "not-implemented":
            raise NotImplementedError(f"You {type(self).__name__} not implements/support this awesome feature")
        elif feature_flag == "implemented":
            return self.provider_awesome_feature(foo=foo, bar=bar)
        elif feature_flag != "generic":
            raise RuntimeError(f"Unknown option {feature_flag} ")
        return self._generic_awesome_feature(foo=foo, bar=bar)


    def provider_awesome_feature(self, *, foo, bar) -> SomeAwesomeAnnotation:
        raise NotImplementedError("You should really implements this one")

class SomeProviderHook(DBApiHook):

    @override  # PEP 698
    def provider_awesome_feature(self, *, foo, bar):
        return foo, foo

before making them comon we had several 10s of copies of those operators accross the 24 providers"

In the other hand we have a one copy which might not works correctly because originally it copied from the Postgres or MySQL and some of the stuff won't work correctly into the another or give a hope that it might work.

in all 24 providers

Small nit 24 Hooks, not providers. Some of them implements multiple hooks within the one provider

@potiuk
Copy link
Member

potiuk commented Apr 14, 2024

I don't think we disagree here.

The attempt so far in the DBAPiHook and common.sql was to make a "common" interface out of "not-that common" DBAPI + SQL - and then make it possible for openlineage integration to make use of those common cases.

Initially the assumption was that the "common" part of the API in common.sql relies on the "common" parts of both. Not trying to implement all features and dialects, but implement a subset of simple functionality which should be "same" for both. While we succeed in most cases, we failed in some others (maybe). We should likely identifiy where we failed and either make it common or turn it into the "feature" approach you described.

But I am not sure if we should aim to have a complete replacement for all the awesome features here in the DBApiHook - I think it was never a goal to have a "complete DB set of features" there - just very basic ones that should be possible to be common.

I have no good solution there how to limit future changes, but I feel that the "awesome fetures" if they are implemented, should be implemented in each hook sepearately, but if we feel that we should add more to the DBApiHook, then - I agree we should have some way you describe.

I don't think there is anything to stop us doing what you descrine now - if we want to add somethign "new" to to DBApiHook.

@dabla
Copy link
Contributor

dabla commented Apr 15, 2024

One of the problem that if some contributor want to add some feature into the the specific dialect in half cases it add something into the common.sql rather than into the specific provider, how to prevent this - a have no idea, rather than prohibit this during the code review

Yes. I think we have to be simply careful with those - like in #38715 . The stub was mostly about the .pyi interface changes, but of course behavioural changes are only detectable (currently) by review. cc: @dabla - maybe good to get comments from a recent "fresh" contributor to those.

How to prevent breaking changes? In most cases it could be done by make DBApiHook as Interface/Protocol only with bare implementation and move specific implementation into the particular hooks, right now a lot of sits into DBApiHook which might make think that something would work with any of the 24 Hooks but it is far away from the reality.

To be perfectly honest - I think far from reality is not the right statement :) - I think some parts (especially format and serialization of returned data) might work a bit differently - but this was not even the primary use case for all DBApiHook (for a very long time returning data from DBApiHooks was treated as mis-use.

For the context - mostly it was about DML/DDL queries and making sure all the SQL-related operators that are implemented in common.sql work and their logic is not copied in all 24 providers (before making them comon we had several 10s of copies of those operators accross the 24 providers"

  • SQLColumnCheckOperator
  • SQLTableCheckOperator
  • SQLCheckOperator
  • SQLValueCheckOperator
  • SQLIntervalCheckOperator
  • SQLThresholdCheckOperator
  • BranchSQLOperator

But if you think we can make it less vulnerable for coupling problems by minimising implementation of hooks and possibly copying the code to hooks, yes, I think it's a good idea.

Maybe also run provider's DBApiHooks implementations agains non-main version. (New special tests!), when we run tests against min supported and max released version of the provider instead of code from the main.

Yes I thought about it - but then we have 12 MINOR versions of common.sql and several providers have different >= DIFFERENT version, So it's hard to make a simple and meanigful set of tests there - especially that just "import" is not really enough, we should also run actual tests, and we currently run tests only in main. This is why I proposed >= LAST_MAIN for all providers - then we will be generally forced to fix any incompatibilties in common.sql (which we do anyway BTW).

Maybe we could enforce (dunno if it's possible) that when a Pull Request is created it is always a draft by default, and once review is done and changes/enhancements look okay/accepted to be merged, we first convert the pull request from draft to non-draft and then run those tests would be run on the pull request itself instead of in the main where it's actually to late. Dunno if it's feasible that only revieuwers could be allowed to put a PR out of draft and not the author (unless he's a reviewer ofc). That could be a way to enforce all test to be run before the PR is actually merged?

@potiuk
Copy link
Member

potiuk commented Apr 15, 2024

Maybe we could enforce (dunno if it's possible) that when a Pull Request is created it is always a draft by default, and once review is done and changes/enhancements look okay/accepted to be merged, we first convert the pull request from draft to non-draft and then run those tests would be run on the pull request itself instead of in the main where it's actually to late. Dunno if it's feasible that only revieuwers could be allowed to put a PR out of draft and not the author (unless he's a reviewer ofc). That could be a way to enforce all test to be run before the PR is actually merged?

It's not about draft vs. main.- it's more about adding even more complexity and cases to our infrastructure to do all those tests. You would have have to run current tests against old common.sql providers which is not guaranteed to work at all - even in normal circumstances because current tests can mock and patch internals of common.sql thet could have changed for example. We are already doing some of that for back-compatiblity tests of providers (running imports only for past versions of Airflow. But if someone would like to draft-PR any checks like that - they are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants