Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SDKs for google provider package #30067

Merged
merged 31 commits into from
May 17, 2023
Merged

Conversation

lwyszomi
Copy link
Contributor

@lwyszomi lwyszomi commented Mar 13, 2023

As everyone know google provider package have a lot of old dependencies. I would like to start migration to the latest versions of the SDK. For now we are blocked by some other dependencies because they are using protobuf<4.

apache-beam
mysql-connector-python
yandexcloud

Also in the google SDKs we had a lot of breaking changes so after updating we need to adjust broken operators. I did investigation how big is this problem and I'm attaching the list of services where some of the operators are broken:

  • AutoML -> need investigation
  • BigQuery -> need investigation
  • BigTable -> need investigation
  • CloudBuild -> need investigation
  • CloudFunctions -> need investigation
  • CloudMemorystore -> OK
  • CloudSQL -> need investigation
  • Composer -> adjust system tests
  • Compute -> need adjustments in system tests
  • DataLossPrevention -> OK
  • Dataflow -> need investigation
  • dataform -> OK
  • datafusion -> OK
  • dataplex -> need investigation
  • dataprep -> need investigation
  • Dataproc -> need investigation
  • dataprocMetastore -> need investigation
  • datastore -> need adjustments in system tests
  • GCS -> need investigations
  • kubernetesEngine -> OK
  • LifeSciences -> OK
  • MLEngine -> need investigation
  • NaturalLanguage -> need investigation
  • PubSub - OK
  • Spanner - OK
  • SpeachToTet -> need investigation
  • SQLToSheets -> need investigation
  • StackDriver -> need investigation
  • StorageTransfer -> need investigation
  • Tasks -> need investigation
  • TextToSpeech -> need investigation
  • Transfers -> Need investigation
  • Translate -> OK
  • TranslateSpeech -> Need investigation
  • VertexAI - Need Investigation
  • VideoIntelligence - need investigation
  • Vision - need investigation
  • Workflows - OK

Fixes: #27292


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@felicienveldema
Copy link

felicienveldema commented Mar 21, 2023

Got referred to here from #27292 .
I'm experiencing an issue where I am not able to upgrade the deprecated Google Ads API 18.

I eventually get stuck with apache-airflow-providers-google depending on google-cloud-secret-manager < 2.x . Which depend on protobuf 3 which causes my predicament. Higher versions depend on protobuf > 4.5.x .

Is there progress on this ticket? Just wondering but keep up the good work

@lwyszomi
Copy link
Contributor Author

@felicienveldema We are still working on the changes in the google-provider package, but still we have some problems with the dependencies in 3 packages because they are still depends on protobuf<4.0 and probably at May we will have updates. So I think the will have new google-provider version supporting latest version of SDKs at the end of May or later.

@r-richmond
Copy link
Contributor

@potiuk

I got curious about the 3 listed packages above causing issues. One of them mysql-connector-python's latest version has protobuf pinned at 3.20.3. How do you anticipate this conflict will be resolved? (After reviewing the commit activity & Oracle's ownership I'm assuming they will be slow to update if they do at all, apologies if this is a bad assumption).

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

I got curious about the 3 listed packages above causing issues. One of them mysql-connector-python's latest version has protobuf pinned at 3.20.3. How do you anticipate this conflict will be resolved? (After reviewing the commit activity & Oracle's ownership I'm assuming they will be slow to update if they do at all, apologies if this is a bad assumption).

I have not looked at it yet. Do you have some ideas ?

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

Generally the options are:

  • replace the library with something else
  • exclude such provider (and stop releasing it) that holds us back
  • vendor-in the library and bump the dependency
  • make the dependency optional and skip tests for it
  • work with the maintainers and actively help them to upgrade

So we have a number of options we can follow.

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

I just opened the issue to: yandex-cloud: yandex-cloud/python-sdk#71 and I will prepare support for disabling providers and excluding them if they are holding us back (cc: @eladkal).

I will also raise this to our devlist.

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

Devlist discussion started: https://lists.apache.org/thread/j98bgw9jo7xr4fvjh27d6bfoyxr1omcm (especially CC: @eladkal especially) I am curious what you think.

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

FYI: We have no problem with apache-beam: apache/beam#24599 - 2 weeks ago they marged protobuf bump, so we just neeed to wait for the next release

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

I also asked Oracle/MySQL in their forums (the only way we can do it) https://forums.mysql.com/read.php?50,708413 and see what they say. But I am also all for disabling mysql provider if they don't respond.

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

We have first reaction: yandex-cloud/python-sdk#71 (comment)

@r-richmond
Copy link
Contributor

But I am also all for disabling mysql provider if they don't respond.

I'm also 100% for this (but I've got some bias in that I always use Postgres over mysql). I hesitated to suggest that since I was unsure if that would impact offering mysql as one of Airflow Meta DB backends.

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

I'm also 100% for this (but I've got some bias in that I always use Postgres over mysql). I hesitated to suggest that since I was unsure if that would impact offering mysql as one of Airflow Meta DB backends.

I have to check but I think this has actually nothing to do with mysql metadata backend. For that we are using sqlalchemy and it has a few drivers it can choose from. And I think our driver for CI/tests is mysqlclient not mysql-connector-python.

BTW. This is another possibilty to rewrite the hooks to use mysqlclient. I might take a look at that actually.

@cgadam
Copy link

cgadam commented Mar 29, 2023

Hi, is it too risky for Airflow to just update from google-ads v18.0.0 to v18.2.0? See: #30353

Today v11 of Google API is sunsetting: https://developers.google.com/google-ads/api/docs/sunset-dates which means that current latest version of Airflow won't be officially compatible (due to its constraint file: https://raw.githubusercontent.com/apache/airflow/constraints-2.5.2/constraints-3.7.txt) with any google-ads package that can actually interact with the Google Ads API. (API calls will start failing)

Latest compatibility to a new API version was added in: https://github.com/googleads/google-ads-python/pull/672/files#diff-91c5b46dc84a94604a4e4d0caed9bf85590a2eddbb12d2e8dc80badf324a9dfbR9 (v17.0.0) and it added support v11 of the API.

v18.2.0 actually added support for v12 of the API. See here.

@cgadam
Copy link

cgadam commented Mar 30, 2023

Hi, is it too risky for Airflow to just update from google-ads v18.0.0 to v18.2.0? See: #30353

Today v11 of Google API is sunsetting: https://developers.google.com/google-ads/api/docs/sunset-dates which means that current latest version of Airflow won't be officially compatible (due to its constraint file: https://raw.githubusercontent.com/apache/airflow/constraints-2.5.2/constraints-3.7.txt) with any google-ads package that can actually interact with the Google Ads API. (API calls will start failing)

Latest compatibility to a new API version was added in: https://github.com/googleads/google-ads-python/pull/672/files#diff-91c5b46dc84a94604a4e4d0caed9bf85590a2eddbb12d2e8dc80badf324a9dfbR9 (v17.0.0) and it added support v11 of the API.

v18.2.0 actually added support for v12 of the API. See here.

We're in the dark night now. Sunset has passed 😅 We're now getting error: "Version v11 is deprecated. Requests to this version will be blocked."

@moiseenkov
Copy link
Contributor

moiseenkov commented Mar 31, 2023

Hi everyone,
Speaking about disabling mysql-connector-python, I found that the current MySqlHook implementation allows users to choose which library to use in an Airflow connection: mysql-connector-python or mysqlclient (default). What is the reason for it?

I'm wondering, because after removing the mysql-connector-python this feature will be no longer needed and can be removed as well. However, new libraries might appear in the future, and we will probably need it back then, so in this case it would be nice to save it for future use even if there will be only one option available. WDYT @potiuk, should we save it?

@potiuk
Copy link
Member

potiuk commented Mar 31, 2023

I'm wondering, because after removing the mysql-connector-python this feature will be no longer needed and can be removed as well. However, new libraries might appear in the future, and we will probably need it back then, so in this case it would be nice to save it for future use even if there will be only one option available. WDYT @potiuk, should we save it?

I have not thought about it yet. I am waiting for the response of Oracle (If it comes) for a week - according to our new policy that's being "lazy consent now" and then I will take a closer look at that after. There is also an option to tunr mysql-connector-python into ACTUALLY optional feature (which I think is the best option) - so make it an extra (we already have a few of those). In this case we should leave it.

@potiuk
Copy link
Member

potiuk commented Mar 31, 2023

@cgadam: It is likely we might have a proposal how to solve it soon - would you be willing to test it if I give you access to a beta/pre-release of google provider that you could test with it with an implemented worakround (with an intention of making it into next release?)

Beata Kossakowska and others added 11 commits May 17, 2023 20:57
Changes:
- update train model that is used for prediction
- update version and runner for ApacheBeam in utils for MLEngine
- update connection inside async hook
Changes:
- fix tests/system/providers/google/cloud/dataprep/example_dataprep.py
- Secret Manager was missing updating to v2, now expects a request dict
- Compute ssh had a bug when no cmd_timeout was passed
- Cloud Build tests were improved/refactored in community, so deleting
old ones
- googleapiclient.errors.HttpError was incorrectly used in our tests, it
it didn´t matter before but a change in the class makes HttpError()
raise an error in initialization the way we were using it before
- fix static checks

```
$ pytest tests/providers/google/cloud/
...
===== 2763 passed, 71 skipped, 21 warnings in 193.46s (0:03:13) =====
```
@potiuk
Copy link
Member

potiuk commented May 17, 2023

👀 👀 👀 👀

@kristopherkane
Copy link
Contributor

I wanted to say thanks for all this work and I've been tracking it from a distance. I'm looking forward to the updated Dataproc libs for further enhancements to the Dataproc serverless operator.

@potiuk
Copy link
Member

potiuk commented May 17, 2023

I wanted to say thanks for all this work and I've been tracking it from a distance. I'm looking forward to the updated Dataproc libs for further enhancements to the Dataproc serverless operator.

Thanks in the name of all the people who worked on that (I was also just helping) - it's rare to get an unsolicited positive feedback and a thank you note. So rare :).

@potiuk
Copy link
Member

potiuk commented May 17, 2023

Those are intermittent errors only (I need to make them more stable). Merging

@potiuk potiuk merged commit 28d1bf8 into apache:main May 17, 2023
@potiuk
Copy link
Member

potiuk commented May 17, 2023

🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉
🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉
🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉

@potiuk
Copy link
Member

potiuk commented May 20, 2023

CC: @ephraimbuddy -> I just realized we will need it - I marked this one also for 2.6.2. While the "code" changes aren't used in the release from 2.6.2, the "dependency" part (provider.yaml and generated/provider_dependencies.json) will be needed to properly build CI once we release the new google provider with all its deps

@eladkal
Copy link
Contributor

eladkal commented May 20, 2023

CC: @ephraimbuddy -> I just realized we will need it - I marked this one also for 2.6.2. While the "code" changes aren't used in the release from 2.6.2, the "dependency" part (provider.yaml and generated/provider_dependencies.json) will be needed to properly build CI once we release the new google provider with all its deps

I'm hoping that we will release 2.6.2 as followup right after provider wave is released.

@eladkal eladkal added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jun 9, 2023
potiuk pushed a commit that referenced this pull request Jun 9, 2023
* Update SDK versions for Google provider

* Adjust google ads operators to v12

Changes:
- fix tests/system/providers/google/cloud/bigquery/example_bigquery_queries.py
- fix tests/system/providers/google/cloud/bigquery/example_bigquery_queries_async.py

* Fix GCS system tests

* Fix CloudBuild unit test

* Update BigTable operators to accomodate for new dependencies.

* Fix Cloud Tasks System tests

Tasks dag was quite flaky without the retry option in the run_task step,
but it's consistently green with the option set.

We also add a GCP_APP_ENGINE_LOCATION env variable since this depends on
the used GCP Project App Engine's location

* Add setup docstring to Tasks system tests.

* Update Vision operators to accommodate new dependencies.

Changes:
- fix methods for CloudVisionHook
- fix Vision Operators
- fix tests/providers/google/cloud/hooks/test_vision.py
- fix tests/providers/google/cloud/operators/test_vision.py
- fix tests/system/providers/google/cloud/vision/example_vision_annotate_image.py
- fix tests/system/providers/google/cloud/vision/example_vision_autogenerated.py
- fix tests/system/providers/google/cloud/vision/example_vision_explicit.py

* Update SpeechToText operators to accommodate new dependencies.

Changes:
- fix synthesize_speech method for CloudTextToSpeechHook
- fix CloudSpeechToTextRecognizeSpeechOperator
- fix tests/providers/google/cloud/operators/test_speech_to_text.py
- fix tests/providers/google/cloud/hooks/test_text_to_speech.py
- fix tests/providers/google/cloud/hooks/test_speech_to_text.py

* Update Translate Speech operators to accommodate new dependencies.

Changes:
- fix synthesize_speech method for CloudTextToSpeechHook
- fix CloudTranslateSpeechOperator
- tests/providers/google/cloud/operators/test_translate_speech.py

* Update VideoIntelligence operators to accommodate new dependencies.

Changes:
- fix annotate_video method for CloudVideoIntelligenceHook
- fix VideoIntelligence Operators
- fix tests/providers/google/cloud/hooks/test_video_intelligence.py
- fix tests/providers/google/cloud/operators/test_video_intelligence.py

* Update Compute Engine operators to accomodate for new dependencies.

Changes:
- added wait_for_operation_complete() method to check the execution flow
- added new attribute cmd_timeout for ComputeEngineSSHHook

* Fix Stackdriver system test

This test has not worked because of slack channel and credentials not
being setup. We now test the same operators by creating notification
channels and policy alerts against pubsub topics, which don't need to
exist before the test is ran, making the test self-contained.

* Update Natural Language operators to accommodate new dependencies.

Changes:
- fix airflow/providers/google/cloud/operators/natural_language.py
- fix airflow/providers/google/cloud/hooks/natural_language.py
- fix tests/providers/google/cloud/hooks/test_natural_language.py
- fix tests/providers/google/cloud/operators/test_natural_language.py
- fix tests/system/providers/google/cloud/natural_language/example_natural_language.py

* Update Composer system tests.

Fix environment id to contain underscores.

* Update AutoML operators to accommodate new dependencies.

Changes:
- add timeout parameter to all long-running operations for operators
- fix tests/system/providers/google/cloud/automl/example_automl_dataset.py
- fix tests/system/providers/google/cloud/automl/example_automl_model.py
- fix tests/system/providers/google/cloud/automl/example_automl_nl_text_extraction.py
- fix tests/system/providers/google/cloud/automl/example_automl_vision_classification.py

* Fix Cloud SQL delete operator

For some delete instance operations, the operation stops being available ~9 seconds after completion, so we need a shorter sleep time to make sure we don'tmiss the DONE status.

* Update VertexAI operators to accommodate new dependencies.

* Add SQL to Sheets Test instructions

* Update Dataproc Metastore operators to accommodate new dependencies.

* Update Dataproc operators to accommodate new dependencies.

* Update Dataflow sys tests to new sdk

* Update Dataproc on gke operators to accommodate new dependencies.

* Update MLEngine operators to accomodate new dependencies.

Changes:
- update train model that is used for prediction
- update version and runner for ApacheBeam in utils for MLEngine
- update connection inside async hook

* Update Dataprep operators to accommodate new dependencies.

Changes:
- fix tests/system/providers/google/cloud/dataprep/example_dataprep.py

* Add Dataflow Go system test

* Update providers.yaml for google

* fixup! Update providers.yaml for google

* Google SDK Fixes after rebase

- Secret Manager was missing updating to v2, now expects a request dict
- Compute ssh had a bug when no cmd_timeout was passed
- Cloud Build tests were improved/refactored in community, so deleting
old ones
- googleapiclient.errors.HttpError was incorrectly used in our tests, it
it didn´t matter before but a change in the class makes HttpError()
raise an error in initialization the way we were using it before
- fix static checks
* Fix Google providers type errors

---------

Co-authored-by: Lukasz Wyszomirski <wyszomirski@google.com>
Co-authored-by: Maksim Moiseenkov <maksim_moiseenkov@epam.com>
Co-authored-by: Eugene Kostieiev <kosteev@google.com>
Co-authored-by: Augusto Hidalgo <augustoh@google.com>
Co-authored-by: Beata Kossakowska <bkossakowska@google.com>
Co-authored-by: Ulada Zakharava <uladaz@google.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
(cherry picked from commit 28d1bf8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Python 3.11 for Google Provider (upgrading all dependencies)