-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Description
Apache Airflow version
2.6.2
What happened
We are currently on AWS Provider 6.0.0 and looking to upgrade to the latest version 8.2.0. However, there are some issues with the GlueCrawlerOperator making the upgrade challenging, namely that the operator attempts to update the crawler tags on every run. Because we manage our resource tagging through Terraform, we do not provide any tags to the operator, which results in all of the tags being deleted (as well as needing additional glue:GetTags and glue:UntagResource permissions needing to be added to relevant IAM roles to even run the crawler).
It seems strange that the default behaviour of the operator has been changed to make modifications to infrastructure, especially as this differs from the GlueJobOperator, which only performs updates when certain parameters are set. Potentially something similar could be done here, where if no Tags key is present in the config dict they aren't modified at all. Not sure what the best approach is.
What you think should happen instead
The crawler should run without any alterations to the existing infrastructure
How to reproduce
Run a GlueCrawlerOperator without tags in config, against a crawler with tags present
Operating System
Debian GNU/Linux 11 (bullseye)
Versions of Apache Airflow Providers
Amazon 8.2.0
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct