Triggerer process die with DB Deadlock #23639

humit0 · 2022-05-11T08:03:17Z

Apache Airflow version

2.2.5

What happened

When create many Deferrable operator (eg. TimeDeltaSensorAsync), triggerer component died because of DB Deadlock issue.

[2022-05-11 02:45:08,420] {triggerer_job.py:358} INFO - Trigger <airflow.triggers.temporal.DateTimeTrigger moment=2022-05-13T11:10:00+00:00> (ID 5397) starting
[2022-05-11 02:45:08,421] {triggerer_job.py:358} INFO - Trigger <airflow.triggers.temporal.DateTimeTrigger moment=2022-05-13T11:10:00+00:00> (ID 5398) starting
[2022-05-11 02:45:09,459] {triggerer_job.py:358} INFO - Trigger <airflow.triggers.temporal.DateTimeTrigger moment=2022-05-13T11:10:00+00:00> (ID 5400) starting
[2022-05-11 02:45:09,461] {triggerer_job.py:358} INFO - Trigger <airflow.triggers.temporal.DateTimeTrigger moment=2022-05-13T11:10:00+00:00> (ID 5399) starting
[2022-05-11 02:45:10,503] {triggerer_job.py:358} INFO - Trigger <airflow.triggers.temporal.DateTimeTrigger moment=2022-05-13T11:10:00+00:00> (ID 5401) starting
[2022-05-11 02:45:10,504] {triggerer_job.py:358} INFO - Trigger <airflow.triggers.temporal.DateTimeTrigger moment=2022-05-13T11:10:00+00:00> (ID 5402) starting
[2022-05-11 02:45:11,113] {triggerer_job.py:108} ERROR - Exception when executing TriggererJob._run_trigger_loop
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/airflow/jobs/triggerer_job.py", line 106, in _execute
    self._run_trigger_loop()
  File "/usr/local/lib/python3.8/site-packages/airflow/jobs/triggerer_job.py", line 127, in _run_trigger_loop
    Trigger.clean_unused()
  File "/usr/local/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/airflow/models/trigger.py", line 91, in clean_unused
    session.query(TaskInstance).filter(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 4063, in update
    update_op.exec_()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
    self._do_exec()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
    self._execute_stmt(update_stmt)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
    self.result = self.query._execute_crud(stmt, self.mapper)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
    return conn.execute(stmt, self._params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
[SQL: UPDATE task_instance SET trigger_id=%s WHERE task_instance.state != %s AND task_instance.trigger_id IS NOT NULL]
[parameters: (None, <TaskInstanceState.DEFERRED: 'deferred'>)]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2022-05-11 02:45:11,118] {triggerer_job.py:111} INFO - Waiting for triggers to clean up
[2022-05-11 02:45:11,592] {triggerer_job.py:117} INFO - Exited trigger loop
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/airflow/__main__.py", line 48, in main
    args.func(args)
  File "/usr/local/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/airflow/utils/cli.py", line 92, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/airflow/cli/commands/triggerer_command.py", line 56, in triggerer
    job.run()
  File "/usr/local/lib/python3.8/site-packages/airflow/jobs/base_job.py", line 246, in run
    self._execute()
  File "/usr/local/lib/python3.8/site-packages/airflow/jobs/triggerer_job.py", line 106, in _execute
    self._run_trigger_loop()
  File "/usr/local/lib/python3.8/site-packages/airflow/jobs/triggerer_job.py", line 127, in _run_trigger_loop
    Trigger.clean_unused()
  File "/usr/local/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/airflow/models/trigger.py", line 91, in clean_unused
    session.query(TaskInstance).filter(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 4063, in update
    update_op.exec_()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
    self._do_exec()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
    self._execute_stmt(update_stmt)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
    self.result = self.query._execute_crud(stmt, self.mapper)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
    return conn.execute(stmt, self._params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/usr/local/lib/python3.8/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
[SQL: UPDATE task_instance SET trigger_id=%s WHERE task_instance.state != %s AND task_instance.trigger_id IS NOT NULL]
[parameters: (None, <TaskInstanceState.DEFERRED: 'deferred'>)]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

What you think should happen instead

Triggerer processor does not raise Deadlock error.

How to reproduce

Create "test_timedelta" DAG and run it.

from datetime import datetime, timedelta

from airflow import DAG
from airflow.operators.dummy import DummyOperator
from airflow.sensors.time_delta import TimeDeltaSensorAsync

default_args = {
    "owner": "user",
    "start_date": datetime(2021, 2, 8),
    "retries": 2,
    "retry_delay": timedelta(minutes=20),
    "depends_on_past": False,
}

with DAG(
    dag_id="test_timedelta",
    default_args=default_args,
    schedule_interval="10 11 * * *",
    max_active_runs=1,
    max_active_tasks=2,
    catchup=False,
) as dag:
    start =  DummyOperator(task_id="start")
    end = DummyOperator(task_id="end")
    for idx in range(800):
        tx = TimeDeltaSensorAsync(
            task_id=f"sleep_{idx}",
            delta=timedelta(days=3),
        )
        start >> tx >> end

Operating System

uname_result(system='Linux', node='d2845d6331fd', release='5.10.104-linuxkit', version='#1 SMP Thu Mar 17 17:08:06 UTC 2022', machine='x86_64', processor='')

Versions of Apache Airflow Providers

Deployment

Other Docker-based deployment

Deployment details

webserver: 1 instance
scheduler: 1 instance
worker: 1 instance (Celery)
triggerer: 1 instance
redis: 1 instance
Database: 1 instance (mysql)

Anything else

webserver: 172.19.0.9
scheduler: 172.19.0.7
triggerer: 172.19.0.5
worker: 172.19.0.8

MYSQL (SHOW ENGINE INNODB STATUS;)

------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-05-11 07:47:49 139953955817216
*** (1) TRANSACTION:
TRANSACTION 544772, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 7 lock struct(s), heap size 1128, 2 row lock(s)
MySQL thread id 20, OS thread handle 139953861383936, query id 228318 172.19.0.5 airflow_user updating
UPDATE task_instance SET trigger_id=NULL WHERE task_instance.state != 'deferred' AND task_instance.trigger_id IS NOT NULL

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 125 page no 231 n bits 264 index ti_state of table `airflow_db`.`task_instance` trx id 544772 lock_mode X locks rec but not gap
Record lock, heap no 180 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 6; hex 717565756564; asc queued;;
 1: len 14; hex 746573745f74696d6564656c7461; asc test_timedelta;;
 2: len 9; hex 736c6565705f323436; asc sleep_246;;
 3: len 30; hex 7363686564756c65645f5f323032322d30352d30395431313a31303a3030; asc scheduled__2022-05-09T11:10:00; (total 36 bytes);


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 125 page no 47 n bits 128 index PRIMARY of table `airflow_db`.`task_instance` trx id 544772 lock_mode X locks rec but not gap waiting
Record lock, heap no 55 PHYSICAL RECORD: n_fields 28; compact format; info bits 0
 0: len 14; hex 746573745f74696d6564656c7461; asc test_timedelta;;
 1: len 9; hex 736c6565705f323436; asc sleep_246;;
 2: len 30; hex 7363686564756c65645f5f323032322d30352d30395431313a31303a3030; asc scheduled__2022-05-09T11:10:00; (total 36 bytes);
 3: len 6; hex 000000085001; asc     P ;;
 4: len 7; hex 01000001411e2f; asc     A /;;
 5: len 7; hex 627b6a250b612d; asc b{j% a-;;
 6: SQL NULL;
 7: SQL NULL;
 8: len 7; hex 72756e6e696e67; asc running;;
 9: len 4; hex 80000001; asc     ;;
 10: len 12; hex 643238343564363333316664; asc d2845d6331fd;;
 11: len 4; hex 726f6f74; asc root;;
 12: len 4; hex 8000245e; asc   $^;;
 13: len 12; hex 64656661756c745f706f6f6c; asc default_pool;;
 14: len 7; hex 64656661756c74; asc default;;
 15: len 4; hex 80000002; asc     ;;
 16: len 20; hex 54696d6544656c746153656e736f724173796e63; asc TimeDeltaSensorAsync;;
 17: len 7; hex 627b6a240472e0; asc b{j$ r ;;
 18: SQL NULL;
 19: len 4; hex 80000002; asc     ;;
 20: len 5; hex 80057d942e; asc   } .;;
 21: len 4; hex 80000001; asc     ;;
 22: len 4; hex 800021c7; asc   ! ;;
 23: len 30; hex 36353061663737642d363762372d343166382d383439342d636637333061; asc 650af77d-67b7-41f8-8494-cf730a; (total 36 bytes);
 24: SQL NULL;
 25: SQL NULL;
 26: SQL NULL;
 27: len 2; hex 0400; asc   ;;


*** (2) TRANSACTION:
TRANSACTION 544769, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 7 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 2
MySQL thread id 12010, OS thread handle 139953323235072, query id 228319 172.19.0.8 airflow_user updating
UPDATE task_instance SET start_date='2022-05-11 07:47:49.745773', state='running', try_number=1, hostname='d2845d6331fd', job_id=9310 WHERE task_instance.task_id = 'sleep_246' AND task_instance.dag_id = 'test_timedelta' AND task_instance.run_id = 'scheduled__2022-05-09T11:10:00+00:00'

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 125 page no 47 n bits 120 index PRIMARY of table `airflow_db`.`task_instance` trx id 544769 lock_mode X locks rec but not gap
Record lock, heap no 55 PHYSICAL RECORD: n_fields 28; compact format; info bits 0
 0: len 14; hex 746573745f74696d6564656c7461; asc test_timedelta;;
 1: len 9; hex 736c6565705f323436; asc sleep_246;;
 2: len 30; hex 7363686564756c65645f5f323032322d30352d30395431313a31303a3030; asc scheduled__2022-05-09T11:10:00; (total 36 bytes);
 3: len 6; hex 000000085001; asc     P ;;
 4: len 7; hex 01000001411e2f; asc     A /;;
 5: len 7; hex 627b6a250b612d; asc b{j% a-;;
 6: SQL NULL;
 7: SQL NULL;
 8: len 7; hex 72756e6e696e67; asc running;;
 9: len 4; hex 80000001; asc     ;;
 10: len 12; hex 643238343564363333316664; asc d2845d6331fd;;
 11: len 4; hex 726f6f74; asc root;;
 12: len 4; hex 8000245e; asc   $^;;
 13: len 12; hex 64656661756c745f706f6f6c; asc default_pool;;
 14: len 7; hex 64656661756c74; asc default;;
 15: len 4; hex 80000002; asc     ;;
 16: len 20; hex 54696d6544656c746153656e736f724173796e63; asc TimeDeltaSensorAsync;;
 17: len 7; hex 627b6a240472e0; asc b{j$ r ;;
 18: SQL NULL;
 19: len 4; hex 80000002; asc     ;;
 20: len 5; hex 80057d942e; asc   } .;;
 21: len 4; hex 80000001; asc     ;;
 22: len 4; hex 800021c7; asc   ! ;;
 23: len 30; hex 36353061663737642d363762372d343166382d383439342d636637333061; asc 650af77d-67b7-41f8-8494-cf730a; (total 36 bytes);
 24: SQL NULL;
 25: SQL NULL;
 26: SQL NULL;
 27: len 2; hex 0400; asc   ;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 125 page no 231 n bits 264 index ti_state of table `airflow_db`.`task_instance` trx id 544769 lock_mode X locks rec but not gap waiting
Record lock, heap no 180 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 6; hex 717565756564; asc queued;;
 1: len 14; hex 746573745f74696d6564656c7461; asc test_timedelta;;
 2: len 9; hex 736c6565705f323436; asc sleep_246;;
 3: len 30; hex 7363686564756c65645f5f323032322d30352d30395431313a31303a3030; asc scheduled__2022-05-09T11:10:00; (total 36 bytes);

*** WE ROLL BACK TRANSACTION (1)

Airflow env

AIRFLOW__CELERY__RESULT_BACKEND=db+mysql://airflow_user:airflow_pass@mysql/airflow_db
AIRFLOW__CORE__DEFAULT_TIMEZONE=KST
AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
AIRFLOW__CORE__LOAD_EXAMPLES=False
AIRFLOW__WEBSERVER__DEFAULT_UI_TIMEZONE=KST
AIRFLOW_HOME=/home/deploy/airflow
AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL=30
AIRFLOW__CORE__EXECUTOR=CeleryExecutor
AIRFLOW__WEBSERVER__SECRET_KEY=aoiuwernholo
AIRFLOW__DATABASE__LOAD_DEFAULT_CONNECTIONS=False
AIRFLOW__CORE__SQL_ALCHEMY_CONN=mysql+mysqldb://airflow_user:airflow_pass@mysql/airflow_db

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

Fixes: apache#23639

Fixes: #23639

* Clean up in-line f-string concatenation (#23591) * Apply specific ID collation to root_dag_id too (#23536) In certain databases there is a need to set the collation for ID fields like dag_id or task_id to something different than the database default. This is because in MySQL with utf8mb4 the index size becomes too big for the MySQL limits. In past pull requests this was handled [#7570](https://github.com/apache/airflow/pull/7570), [#17729](https://github.com/apache/airflow/pull/17729), but the root_dag_id field on the dag model was missed. Since this field is used to join with the dag_id in various other models ([and self-referentially](https://github.com/apache/airflow/blob/451c7cbc42a83a180c4362693508ed33dd1d1dab/airflow/models/dag.py#L2766)), it also needs to have the same collation as other ID fields. This can be seen by running `airflow db reset` before and after applying this change while also specifying `sql_engine_collation_for_ids` in the configuration. Other related PRs [#19408](https://github.com/apache/airflow/pull/19408) * Add doc and sample dag for EC2 (#23547) * Helm chart 1.6.0rc1 (#23548) * Add sample dag and doc for S3ListOperator (#23449) * Add sample dag and doc for S3ListOperator * Fix doc * 19943 Grid view status filters (#23392) * Move tree filtering inside react and add some filters * Move filters from context to utils * Fix tests for useTreeData * Fix last tests. * Add tests for useFilters * Refact to use existing SimpleStatus component * Additional fix after rebase. * Update following bbovenzi code review * Update following code review * Fix tests. * Fix page flickering issues from react-query * Fix side panel and small changes. * Use default_dag_run_display_number in the filter options * Handle timezone * Fix flaky test Co-authored-by: Brent Bovenzi <brent.bovenzi@gmail.com> * Improve caching for multi-platform images. (#23562) This is another attempt to improve caching performance for multi-platform images as the previous ones were undermined by a bug in buildx multiplatform cache-to implementattion that caused the image cache to be overwritten between platforms, when multiple images were build. The bug is created for the buildx behaviour at https://github.com/docker/buildx/issues/1044 and until it is fixed we have to prpare separate caches for each platform and push them to separate tags. That adds a bit overhead on the building step, but for now it is the simplest way we can workaround the bug if we do not want to manually manipulate manifests and images. * Use inclusive words in apache airflow project (#23090) * Add exception to catch single line private keys (#23043) * Add sample dag and doc for S3ListPrefixesOperator (#23448) * Add sample dag and doc for S3ListPrefixesOperator * Fix static checks * Update min requirements for rich to 12.4.1 (#23604) * Add exportContext.offload flag to CLOUD_SQL_EXPORT_VALIDATION. (#23614) * Make Breeze help generation indepdent from having breeze installed (#23612) Generation of Breeze help requires breeze to be installed. However if you have locally installed breeze with different dependencies and did not run self-upgrade, the results of generation of the images might be different (for example when different rich version is used). This change works in the way that: * you do not have to have breeze installed at all to make it work * it always upgrades to latest breeze when it is not installed * but this only happens when you actually modified some breeze code * Add Quicksight create ingestion Hook and Operator (#21863) * Add Quicksight create ingestion Hook and Operator Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> * Add slim images to docker-stack docs index (#23601) * Fixed Kubernetes Operator large xcom content Defect (#23490) * [FEATURE] google provider - split GkeStartPodOperator execute (#23518) * Implement send_callback method for CeleryKubernetesExecutor and LocalKubernetesExecutor (#23617) * Fix: Exception when parsing log #20966 (#23301) * UnicodeDecodeError: 'utf-8' codec can't decode byte 0xXX in position X: invalid start byte File "/opt/work/python395/lib/python3.9/site-packages/airflow/hooks/subprocess.py", line 89, in run_command line = raw_line.decode(output_encoding).rstrip() # raw_line == b'\x00\x00\x00\x11\xa9\x01\n' UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 4: invalid start byte * Update subprocess.py * Update subprocess.py * Fix: Exception when parsing log #20966 * Fix: Exception when parsing log #20966 Another alternative is: try-catch it. e.g. ``` line = '' for raw_line in iter(self.sub_process.stdout.readline, b''): try: line = raw_line.decode(output_encoding).rstrip() except UnicodeDecodeError as err: print(err, output_encoding, raw_line) self.log.info("%s", line) ``` * Create test_subprocess.sh * Update test_subprocess.py * Added shell directive and license to test_subprocess.sh * Distinguish between raw and decoded lines as suggested by @uranusjr * simplify test Co-authored-by: muhua <microhuang@live.com> * Make provider doc preparation a bit more fun :) (#23629) Previously you had to manually add versions when changelog was modified. But why not to get a bit more fun and get the versions bumped automatically based on your assesment when reviewing the provideers rather than after looking at the generated changelog. * Prevent KubernetesJobWatcher getting stuck on resource too old (#23521) * Prevent KubernetesJobWatcher getting stuck on resource too old If the watch fails because "resource too old" the KubernetesJobWatcher should not retry with the same resource version as that will end up in loop where there is no progress. * Reset ResourceVersion().resource_version to 0 * [FEATURE] update K8S-KIND to 0.13.0 (#23636) * [FEATURE] add K8S 1.24 support (#23637) * Fix typo issue (#23633) * Fix assuming "Feature" answer on CI when generating docs (#23640) We have now different answers posisble when generating docs, and for testing we assume we answered randomly during the generation of documentation. * Simplify flash message for _airflow_moved tables (#23635) Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Add index for event column in log table (#23625) * Don't run pre-migration checks for downgrade (#23634) These checks are only make sense for upgrades. Generally they exist to resolve referential integrity issues etc before adding constraints. In the downgrade context, we generally only remove constraints, so it's a non-issue. * Added postgres 14 to support versions(including breeze) (#23506) * Added postgres 14 to support versions(including breeze) * Add `RedshiftDeleteClusterOperator` support (#23563) Add support for `RedshiftDeleteClusterOperator`. This will help to clean resources using airflow operators when needed. In the current implementation, By default, I'm waiting until the cluster is completely removed to return immediately without waiting set `wait_for_completion` param to False - Add operator class - Add basic unit test - Add an example task - Add relevant documentation * Added kubernetes version (1.24) in README.md(for Main version(dev)), … (#23649) * Added kubernetes version (1.24) in README.md(for Main version(dev)), accidentally removed in merge cnflict. * Update README.md Co-authored-by: Jarek Potiuk <jarek@potiuk.com> * Fixed test and remove pytest.mark.xfail for test_exc_tb (#23650) * Fix k8s pod.execute randomly stuck indefinitely by logs consumption (#23497) (#23618) * [FEATURE] google provider - BigQueryInsertJobOperator log query (#23648) * Rename cluster_policy to task_policy (#23468) * Rename cluster_policy to task_policy * rename task_policy as example_task_policy. * Revert "Fix k8s pod.execute randomly stuck indefinitely by logs consumption (#23497) (#23618)" (#23656) This reverts commit ee342b85b97649e2e29fcf83f439279b68f1b4d4. * Prepare provider documentation 2022.05.11 (#23631) Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> * AIP45 Remove dag parsing in airflow run local (#21877) * remove `--` in `./breeze build-docs` command (#23671) * Synchronize support for Postgres and K8S in docs (#23673) We just added support for Postgres 14 and K8S 1.24 and since we did not have any changes to support either in main we are bringing the support to 2.3 line as well. This documentation syncs all remaining places where it should be updated. * Migrate Dataproc to new system tests design (#22777) * Add wildcard possibility to `package-filter` parametere (#23672) the glob parameters (for example `apache-airflow-providers-*`) did not work because only fixed list of parameters was allowed. This PR converts the package-filter parameter to stop verifying the value passed - so autocomplete continues to work but you should still be able to use glob. It also removes few places where the parameters were used with `--` separator. * Replace "absolute()" with "resolve()" in pathlib objects (#23675) TIL that absolute() is an undocumented in Pathlib and that we should use resolve() instead. So this is it. * Upgrade `pip` to latest released 22.1.0 version (#23665) We are finally able to get rid of the annoying false-positive warnings and we have finally a chance on having warning-free installation during docker builds. * Shorten max pre-commit hook name length (#23677) When names are too long, pre-commit output looks very ugly and takes up 2x lines. Here I reduce max length just a little bit further so that pre-commit output renders properly on a macbook pro 16" with terminal window splitting screen horizontally. * remove stale serialized dags (#22917) * Move around overflow, position and padding (#23044) * Fix expand/collapse all buttons (#23590) * communicate via customevents * Handle open group logic in wrapper * fix tests * Make grid action buttons sticky * Add default toggle fn * fix splitting task id by '.' * fix missing dagrun ids * Update doc and sample dag for Quicksight (#23653) * Use func.count to count rows (#23657) * Add git_source to DatabricksSubmitRunOperator (#23620) The existing `DatabricksSubmitRunOperator` is extended with the support for the `git_source` parameter which allows users to run notebook tasks from files committed to git repositories. If specified, any notebook task that is part of the payload will clone the repository and check out the commit, tag, or the tip of the specified branch. This is an alternative to dev repos ([docs](https://docs.databricks.com/repos/index.html)) where the checkout/update would have to be triggered manually. Public documentation for the feature available here: https://docs.databricks.com/dev-tools/api/latest/jobs.html (NB: as noted in the docs, the feature is currently in public preview). * Disable Flower by default from docker-compose (#23685) * Fix property name in breeze Shell Params (#23696) The rename from #23562 missed few shell_parms usage where it also should be replaced. * Clarify that bundle extras should not be used for PyPi installs (#23697) The bundle extras we have are only used for development and they should not be used to install airflow from PyPI. This update to documentation clarifies it. Closes: #23692 * Add environment check and build image check for more Breeze commands (#23687) Several commands of Breeze depends on docker, docker compose being available as well as breeze image. They will work fine if you "just" built the image but they might benefit from the image being rebuilt (to make sure all latest dependencies are installed in the image). The common checks done in "shell" command for that are now extracted to common utils and run as first thing in those commands that need it. * Add UI tests for /utils and /components (#23456) * Add UI tests for /utils and /components * add test for Table * Address PR feedback * Fix window prompt var * Fix TaskName test from rebase * fix lint errors * Add slim image to docs/docker-stack/README.md (#23710) * Use profiles to disable flower in docker-compose (#23709) * Ensure execution_timeout as timedelta (#23655) * Handle invalid date parsing in webserver views. (#23161) * Handle invalid date from query parameters in views. * Add tests. * Use common parsing helper. * Add type hint. * Remove unwanted error check. * Fix extra_links endpoint. * Add fields to CLOUD_SQL_EXPORT_VALIDATION. (#23724) * Add doc and sample dag for GCSToS3Operator (#23730) * Fix grid details header text overlap (#23728) Move top margin to each breadcrumb component to make sure that there is no overlap when the header wraps with long names. * Add version to migration prefix (#23564) We don't really need the alembic revision id in the filename. having version instead is much more useful. having both of them takes up too much space. * Add typing for airflow/configuration.py (#23716) * Add typing for airflow/configuration.py The configuraiton.py did not have typing information and it made it rather difficult to reason about it-especially that it went a few changes in the past that made it rather complex to understand. This PR adds typing information all over the configuration file * Remove titles from link buttons (#23736) * Disable flower in chart by default (#23737) * Add AWS project structure tests (re: AIP-47) (#23630) * Speech To Text assets & system tests migration (AIP-47) (#23643) Co-authored-by: Wojciech Januszek <januszek@google.com> * Add 'reschedule' to the serialized fields for the BaseSensorOperator (#23674) fix #23411 * Updated MongoDB logo (#23746) As per https://www.mongodb.com/brand-resources * Fix broken main branch (#23751) main branch is broken since https://github.com/apache/airflow/pull/23630 needed rebase before merge as https://github.com/apache/airflow/pull/23730 added the missing example dag * Allow more parameters to be piped through via execute_in_subprocess (#23286) * Increase timeout for Helm Chart executor upgrade tests (#23759) * Fix task log is not captured (#23684) when StandardTaskRunner runs tasks with exec Issue: https://github.com/apache/airflow/issues/23540 * Helm chart 1.6.0rc2 (#23754) * Fix doc description of [core] parallelism config setting (#23768) * Change `Github` to `GitHub` (#23764) * Add tagging image as latest for CI image wait (#23775) The "wait for image" step lacked --tag-as-latest which made the subsequent "fix-ownership" step run sometimes far longer than needed - because it rebuilt the image for fix-ownership case. Also the "fix-ownership" command has been changed to just pull the image if one is missing locally rather than build. This command might be run in an environment where the image is missing or any other image was build (for example in jobs where an image was build for different Python version) in this case the command will simply use whatever Python version is available (it does not matter), or in case no image is available, it will pull the image as the last resort. * Fix auto upstream dep when expanding non-templated field (#23771) If you tried to expand via xcom into a non-templated field without explicitly setting the upstream task dependency, the scheduler would crash because the upstream task dependency wasn't being set automatically. It was being set only for templated fields, but now we do it for both. * clearer method name in scheduler_job.py (#23702) * Fallback to parse dag_file when no dag in the db (#23738) * cleanup usage of `get_connections()`` from test suite (#23757) The function is deprecated and raises warnings https://github.com/apache/airflow/pull/10192 Replacing the usage with `get_connection()` * Maintain grid view selection on filtering upstream (#23779) * Maintain grid selection on filter upstream The grid view selection was being cleared when clicking "Filter Upstream". The selection should persist. Also, added a left margin to the "Reset root" button * fix linting * Fix ``SqliteHook`` compatibility with SQLAlchemy engine (#23790) Same as https://github.com/apache/airflow/pull/19508 but for Sqlite as described in https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#connect-strings to be able to create a Sqlalchemy engine from the URI itself. Without this, it currently fails with the following error due to how we create URI in Connections. An absolute path is denoted by starting with a slash, means you need four slashes: ``` url = sqlite://%2Ftmp%2Fsqlite.db def create_connect_args(self, url): if url.username or url.password or url.host or url.port: > raise exc.ArgumentError( "Invalid SQLite URL: %s\n" "Valid SQLite URL forms are:\n" " sqlite:///:memory: (or, sqlite://)\n" " sqlite:///relative/path/to/file.db\n" " sqlite:////absolute/path/to/file.db" % (url,) ) E sqlalchemy.exc.ArgumentError: Invalid SQLite URL: sqlite://%2Ftmp%2Fsqlite.db E Valid SQLite URL forms are: E sqlite:///:memory: (or, sqlite://) E sqlite:///relative/path/to/file.db E sqlite:////absolute/path/to/file.db ``` * Fix python version used for cache preparaation (#23785) Cache preparation on CI used default (Python 3.7) version of the image. It had an influence on time of "full build needed" only and for users who wanted to build breeze image for Python version different than default Python 3.7. It had no big influence on "main" builds" because in main we are build images with "upgrade-to-newer-dependencies" which takes longer anyway. * Add `dttm` searchable field in audit log (#23794) * Further speed up fixing ownership in CI (#23782) After #23775 I noticed that there is yet another small improvement area in the CI buld speed. Currently build-ci-image builds and push only "commit-tagged" images, but "fix-ownership" requires the "latest" image to run. This PR adds --tag-as-latest option also to build-image and build-prod-image commands - similarly as for the pull-image and pull-prod-image. This will retag the "commit" images as latest in the build-ci-images step and allow to save 1m on pulling the latest image before fix-ownership (bringing it back to 1s overhead) * Modify db clean to also catch the ProgrammingError exception (#23699) * Update the DMS Sample DAG and Docs (#23681) * postgres_operator_howto_guide.rst (#23789) Saying "**the** PostgreSQL database" confused me. I thought it was implying that a user could/should connect to the airflow metadata db * Support host_name on Datadog provider (#23784) This is required to use other Datadog tenants like app.datadoghq.eu * Cloud SQL assets & system tests migration (AIP-47) (#23583) * Unbreak main after missing classes were added (#23819) * Fix python version command (#23818) * update CloudSqlInstanceImportOperator to CloudSQLImportInstanceOperator (#23800) * Reformat the whole AWS documentation (#23810) * Fix error when SnowflakeHook take empty list in `sql` param (#23767) * Grid data: do not load all mapped instances (#23813) * only get necessary task instances * add comment * encode_ti -> get_task_summary * Fix regression in ignoring symlinks (#23535) * [Issue#22846] allow option to encode or not encode UUID when uploading from Cassandra to GCS (#23766) * Fix provider import error matching (#23825) * Fix secrets rendered in UI when task is not executed. (#22754) * Fix retrieval of deprecated non-config values (#23723) It turned out that deprecation of config values did not work as intended. While deprecation worked fine when the value was specified in configuration value it did not work when `run_as_user` was used. In those cases the "as_dict" option was used to generate temporary configuratin and this temporary configuration contained default value for the new configuration value - for example it caused that the generated temporary value contained: ``` [database] sql_alchemy_conn=sqlite:///{AIRFLOW_HOME}/airflow.db ``` Even if the deprecated `core/sql_alchemy_conn` was set (and no new `database/sql_alchemy_conn` was set at the same time. This effectively rendered the old installation that did not convert to the new "database" configuration not working for run_as_user, because the tasks run with "run_as_user" used wrong, empty sqlite database instaead of the one configured for Airflow. Also during adding tests, it turned out that the mechanism was also not working as intended before - in case `_CMD` or `_SECRET` were used as environment variables rather than configuration. In those cases both _CMD and _SECRET should be evaluated during as_dict() evaluation, because the "run_as_user" might have not enough permission to run the command or retrieve secret. The _cmd and _secret variables were only evaluated during as_dict() when they were in the config file (note that this only happens when include_cmd, include_env, include_secret are set to True). The changes implemented in this PR fix both problems: * the _CMD and _SECRET env vars are evaluated during as_dict when the respective include_* is set * the defaults are only set for the values that have deprecations in case the deprecations have no values set in either of the ways: * in config file * in env variable * in _cmd (via config file or env variable) * in _secret (via config file or env variable) Fixes: #23679 * Automatically reschedule stalled queued tasks in CeleryExecutor (v2) (#23690) Celery can lose tasks on worker shutdown, causing airflow to just wait on them indefinitely (may be related to celery/celery#7266). This PR expands the "stalled tasks" functionality which is already in place for adopted tasks, and adds the ability to apply it to all tasks such that these lost/hung tasks can be automatically recovered and queued up again. * Document fix for broken elasticsearch logs with 2.3.0+ upgrade (#23821) In certain upgrade paths, Airflow isn't given an opportunity to track the old `log_id_template`, so document the fix for folks who run into trouble. * Add tool to automaticaly update status of AIP-47 issues. (#23745) * Self upgrade when refreshing images (#23686) When you have two branches, you should sefl-upgrade breeze to make sure you use the version that is tied with your branch. Usually we have two active branches - main and the last released line, so switching between then is not unlikely for maintainers. * Exclude missing tasks from the gantt view (#23627) * Exclude missing tasks from the gantt view Stops the gantt view from crashing if a task no longer exists in a DAG but there are TaskInstances for that task. * Fix tests * Don't use the root logger in KPO _suppress function (#23835) * Update Production Guide for Helm Chart docs (#23836) Explain that db initialization is not necessary if using the helm chart. * Helm chart 1.6.0 is released; bump chart version to 1.7.0-dev (#23840) * Add missing "airflow-constraints-reference" parameter (#23844) The build commands were missing "airflow-constraints-reference" parameter and it always defaulted to constraints-main * Better fix for constraint-reference (#23845) The previous fix (#23844) broke main on package verification as the package verification used the same parameter that was set to empty. This change rmeoves some remnant from the "bash" version where we had to check if variable was empty and also making the "constraint" parameters accepting default values from the current branch to be used also for build commands. * Mask sensitive values for not-yet-running TIs (#23807) Alternative approach to #22754. Resolves #22738. * Add limit for JPype1 (#23847) The JPype1 limit has to be introduced because otherwise the 1.4.0 JPype1 breaks our ARM builds. The 1.4.0 did not release the sdist version of the package. This made our cache refresh job to fail as 1.4.0 version cannot be installed on ARM image. The issue is captured in https://github.com/jpype-project/jpype/issues/1069 * Add "no-issue-needed" rule directly in CONTRIBUTING.rst (#23802) The rule was not really explained directly where you'd expect it, it was hidden deeply in "triage" process where many contributors would not even get to. This PR adds appropriate explanation and also explains that discussions is the preferred way to discuss things in Airflow rather than issues. * Handler parameter from `JdbcOperator` to `JdbcHook.run` (#23817) * Doc: Add column names for DB Migration Reference (#23853) Before the automation: https://airflow.apache.org/docs/apache-airflow/2.2.5/migrations-ref.html Currently (with missing column names): https://airflow.apache.org/docs/apache-airflow/2.3.0/migrations-ref.html * Fix exception trying to display moved table warnings (#23837) If you still have an old dangling table from the 2.2 migration this would fail. Make it more resilient and cope with both styles of moved table name * Update sample dag and doc for RDS (#23651) * Fix DataprocJobBaseOperator not being compatible with dotted names (#23439). (#23791) * job_name parameter is now sanitized, replacing dots by underscores. * Upgrade `pip` to 22.1.1 version (just released) (#23854) * Add better feedback to Breeze users about expected action timing (#23827) There are a few actions in Breeze that might take more or less time when invoked. This is mostly when you need to upgrade Breeze or update to latest version of the image because some dependedncies were added or image was modified. While we have improved significantly the waiting time involved now (and caching problems have been fixed to make it as fast possible), there are still a few situations that you need to have a good connectivity and a little time to run the upgrade. Which is often not something you would like to loose your time on in a number of cases when you need to do things fast. Usually Breeeze does not force the user to perform such long actions - it allows to continue without doing them (either by timeout or by letting user answer "no" to question asked. Previously Breeze have not informed the user about the exepcted time of running such operation, but with this change it tells what is the expected delay - thus allowing the user to make informed action whether they want to run the upgrade or not. * Fix UnboundLocalError when sql is empty list in DbApiHook (#23816) * Fix UnboundLocalError when sql is empty list in DatabricksSqlHook (#23815) * Add number of node params only for single-node cluster in RedshiftCreateClusterOperator (#23839) * Sql to gcs with exclude columns (#23695) * Add support for associating custom tags to job runs submitted via EmrContainerOperator (#23769) Co-authored-by: Sandeep Kadyan <sandeep.kadyan@publicissapient.com> * Add Deferrable Databricks operators (#19736) * Fix Amazon EKS example DAG raises warning during Imports (#23849) Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> * Fix databricks tests (#23856) * Add __wrapped__ property to _TaskDecorator (#23830) Co-authored-by: Sanjay Pillai <sanjaypillai11 [at] gmail.com> * Highlight task states by hovering on legend row (#23678) * Rework the legend row and add the hover effect. * Move horevedTaskState to state and fix merge conflicts. * Add tests. * Order of item in the LegendRow, add no_status support * Clean up f-strings in logging calls (#23597) * update K8S-KIND to 0.14.0 (#23859) * Replaced all days_ago functions with datetime functions (#23237) Co-authored-by: Dev232001 <thedevhooda@gmail.com> * Add clear DagRun endpoint. (#23451) * Ignore the DeprecationWarning in test_days_ago (#23875) Co-authored-by: alexkru <alexkru@wix.com> * Speed up Breeze experience on Mac OS (#23866) This change should significantly speed up Breeze experience (and especially iterating over a change in Breeze for MacOS users - independently if you are using x86 or arm architecture. The problem with MacOS with docker is particularly slow filesystem used to map sources from Host to Docker VM. It is particularly bad when there are multiple small files involved. The improvement come from two areas: * removing duplicate pycache cleaning * moving MyPy cache to docker volume When entering breeze we are - just in case - cleaning .pyc and __pychache__ files potentially generated outside of the docker container - this is particularly useful if you use local IDE and you do not have bytecode generation disabled (we have it disabled in Breeze). Generating python bytecode might lead to various problems when you are switching branches and Python versions, so for Breeze development where the files change often anyway, disabling them and removing when they are found is important. This happens at entering breeze and it might take a second or two depending if you have locally generated. It could happen that __init script was called twice (depending which script was called - therefore the time could be double the one that was actually needed. Also if you ever generated provider packages, the time could be much longer, because node_modules generated in provider sources were not excluded from searching (and on MacOS it takes a LOT of time). This also led to duplicate time of exit as the initialization code installed traps that were also run twice. The traps however were rather fast so had no negative influence on performance. The change adds a guard so that initialization is only ever executed once. Second part of the change is moving the cache of mypy to a docker volume rather than being used from local source folder (default when complete sources are mounted). We were already using selective mount to make sure MacOS filesystem slowness affects us in minimal way - but with this change, the cache will be stored in docker volume that does not suffer from the same problems as mounting volumes from host. The Docker volume is preserved until the `docker stop` command is run - which means that iterating over a change should be WAY faster now - observed speed-up were around 5x speedups for MyPy pre-commit. * Add default task retry delay config (#23861) * Move MappedOperator tests to mirror code location (#23884) At some point during the development of AIP-42 we moved the code for MappedOperator out of baseoperator.py to mappedoperator.py, but we didn't move the tests at the same time * Enable clicking on DAG owner in autocomplete dropdown (#23804) PR#18991 introduced directly navigating to a DAG when selecting one from the typeahead search results. Unfortunately, the search results also includes DAG owner names, and selecting one of those navigates to a DAG with that name, which almost certainly doesn't exist. This extends the autocompletion endpoint to return the type of result, and adjusts the typeahead selection to use this to know which way to navigate. * Document LocalKubernetesExecutor support in chart (#23876) * Avoid extra questions in `breeze build image` command. (#23898) Fixes: #23867 * Update INTHEWILD.md (#23892) * Split contributor's quick start into separate guides. (#23762) The foldable parts were not good. They made links not to work as well as they were not too discoverable. Fixes: #23174 * Avoid printing exception when exiting tests command (#23897) Fixes: #23868 * Move string arg evals to `execute()` in `EksCreateClusterOperator` (#23877) Currently there are string-value evaluations of `compute`, `nodegroup_role_arn`, and `fargate_pod_execution_role_arn` args in the constructor of `EksCreateClusterOperator`. These args are all listed as a template fields so it's entirely possible that the value(s) passed in to the operator is a Jinja expression or an `XComArg`. Either of these value types could cause a false-negative `ValueError` (in the case of unsupported `compute` values) or a `false-positive` (in the the cases of explicit checks for the *arn values) since the values themselves have not been rendered. This PR moves the evaluations of these args to the `execute()` scope. * Update .readthedocs.yml (#23903) String instead of Int see https://docs.readthedocs.io/en/stable/config-file/v2.html * Make --file command in static-checks autocomplete file name (#23896) The --verbose and --dry-dun commands caused n --files command to fail and the flag was "artifficial" -it was equivalent to bool flag. the actual files were taken from arguments. This PR fixes it by turning the arguments into multiple ``--file`` commands - each with its own completioin for local files. * Chart: Update default airflow version to `2.3.1` (#23913) * Fix Breeze documentation typo (#23919) * Update environments documentation links (#23920) * `2.3.1` has been released (#23912) * Make CI and PROD image builds consistent (#23841) Simple refactoring to make the jobs more consistent. * Alphabetizes two tables (#23923) The rest of the page has consistently alphabetized tables. This commit fixes three `extras` that were not alphabetized. * Use "remote" pod when patching KPO pod as "checked" (#23676) When patching as "checked", we have to use the current version of the pod otherwise we may get an error when trying to patch it, e.g.: ``` Operation cannot be fulfilled on pods \"test-kubernetes-pod-db9eedb7885c40099dd40cd4edc62415\": the object has been modified; please apply your changes to the latest version and try again" ``` This error would not cause a failure of the task, since errors in `cleanup` are suppressed. However, it would fail to patch. I believe one scenario when the pod may be updated is when retrieving xcom, since the sidecar is terminated after extracting the value. Concerning some changes in the tests re the "already_checked" label, it was added to a few "expected pods" recently, when we changed it to patch even in the case of a successful pod. Since we are changing the "patch" code to patch with the latest read on the pod that we have (i.e. using the `remote_pod` variable), and no longer the pod object stored on `k.pod`, the label no longer shows up in those tests (that's because in k.pod isn't actually a read of the remote pod, but just happens to get mutated in the patch function before it is used to actually patch the pod). Further, since the `remote_pod` is a local variable, we can't observe it in tests. So we have to read the pod using k8s api. _But_, our "find pod" function excludes "already checked" pods! So we have to make this configurable. So, now we have a proper integration test for the "already_checked" behavior (there was already a unit test). * Clarify manual merging of PR in release doc (#23928) It was not clear to me what this really means * Fix broken main (#23940) main breaks with `Traceback: /usr/local/lib/python3.7/importlib/__init__.py:127: in import_module return _bootstrap._gcd_import(name[level:], package, level) tests/providers/amazon/aws/hooks/test_cloud_formation.py:31: in <module> class TestCloudFormationHook(unittest.TestCase): tests/providers/amazon/aws/hooks/test_cloud_formation.py:67: in TestCloudFormationHook @mock_cloudformation /usr/local/lib/python3.7/site-packages/moto/__init__.py:30: in f module = importlib.import_module(module_name, "moto") /usr/local/lib/python3.7/importlib/__init__.py:127: in import_module return _bootstrap._gcd_import(name[level:], package, level) /usr/local/lib/python3.7/site-packages/moto/cloudformation/__init__.py:1: in <module> from .models import cloudformation_backends /usr/local/lib/python3.7/site-packages/moto/cloudformation/models.py:18: in <module> from .parsing import ResourceMap, OutputMap /usr/local/lib/python3.7/site-packages/moto/cloudformation/parsing.py:17: in <module> from moto.apigateway import models # noqa # pylint: disable=all /usr/local/lib/python3.7/site-packages/moto/apigateway/__init__.py:1: in <module> from .models import apigateway_backends /usr/local/lib/python3.7/site-packages/moto/apigateway/models.py:9: in <module> from openapi_spec_validator import validate_spec E ModuleNotFoundError: No module named 'openapi_spec_validator' ` Fix is already in placed in moto https://github.com/spulec/moto/pull/5165 but version 3.1.11 wasn't released yet * Update INSTALL_PROVIDERS_FROM_SOURCES instructions. (#23938) * Add typing to Azure Cosmos Client Hook (#23941) New release of Azure Cosmos library has added typing information and it broke main builds with mypy verification. * Remove redundant register exit signals in `dag-processor` command (#23886) * Disable rebase workflow (#23943) The change of the release workflow in #23928 removed the reason why we should have rebase workflow possible. We only needed to do rebase when we merged test branch into stable branch and since we are doing it manually, there is no more reeason to have it in the GitHub UI. * Prevent UI from crashing if grid task instances are null (#23939) * UI fix for null task instances * improve tests without global vars * fix test data * Grid fix details button truncated and small UI tweaks (#23934) * Show details button and wrap on LegendRow. * Update following brent review * Fix display on small width * Rotate icon for a 'ReadLess' effect * Fix and speed up grid view (#23947) This fetches all TIs for a given task across dag runs, leading to signifincatly faster response times. It also fixes a bug where Nones were being passed to the UI when a new task was added to a DAG with exiting runs. * Removes duplicate code block (#23952) There's are two code blocks with identical text in the helm-chart docs. This commit removes one of them. * Update dep for databricks #23917 (#23927) * Use '--subdir' argument value for standalong dag processor. (#23864) * Revert "Add limit for JPype1 (#23847)" (#23953) This turned out to be mistake in manual submission. Fixed on JPype1 side. This reverts commit 3699be49b24ef5a0a8d8de81a149af2c5a7dc206. * Faster grid view (#23951) * Disallow calling expand with no arguments (#23463) * [FEATURE] KPO use K8S hook (#22086) * Add cascade to `dag_tag` to `dag` foreignkey (#23444) Bulk delete does not work if the cascade behaviour of a foreignkey is set on python side(relationship configuration). To allow bulk delete of dags we need to setup cascade deletion in the DB. The warning on query.delete at https://docs.sqlalchemy.org/en/14/orm/session_basics.html#selecting-a-synchronization-strategy stated that: The operations do not offer in-Python cascading of relationships - it is assumed that ON UPDATE CASCADE and/or ON DELETE CASCADE is configured for any foreign key references which require it, otherwise the database may emit an integrity violation if foreign key references are being enforced. Another alternative is avoiding bulk delete of dags but I prefer we support bulk deletes. This will break offline sql generation for mssql(already broken before now :) ). Also, since there's only one foreign key in `dag_tag` table, I assume that the foreign key would be named `dag_tag_ibfk_1` in `mysql`. This avoided having to query the db for the name. The foreignkey is explicitly named now, would be easy for future upgrades * DagFileProcessorManager: Start a new process group only if current process not a session leader (#23872) * Introduce `flake8-implicit-str-concat` plugin to static checks (#23873) * Fix UnboundLocalError when sql is empty list in ExasolHook (#23812) * Fix inverted section levels in best-practices.rst (#23968) This PR fixes inverted levels in the sections added to the "Best Practices" document in #21879. * Add support to specify language name in PapermillOperator (#23916) * Add support to specify language name in PapermillOperator * Replace getattr() with simple attribute access * [23945] Icons in grid view for different dag types (#23970) * Helm logo no longer a link (#23977) * Fix links in documentation (#23975) * fix links * added right link to breeze * Add TaskInstance State 'REMOVED' to finished states and success states (#23797) Now that we support dynamic task mapping, we should have the 'REMOVED' state of task instances as a finished state because for dynamic tasks with a removed task instance, the dagrun would be stuck in running state if 'REMOVED' state is not in finished states. * Remove `xcom_push` from `DockerOperator` (#23981) * Fix missing shorthand for docker buildx rm -f (#23984) Latest version of buildx removed -f as shorthand for --force flag. * use explicit --mount with types of mounts rather than --volume flags (#23982) The --volume flag is an old style of specifying mounts used by docker, the newer and more explicit version is --mount where you have to specify type, source, destination in the form of key/value pairs. This is more explicit and avoids some guesswork when volumes are mounted (for example seems that on WSL2 volume name might be guessed as path wrongly). The change explicitly specifies which of the mounts are bind mounts and which are volume mounts. Another nice side effect of this change is that when source is missing, docker will not automatically create directories with the missing name but it will fail. This is nicer because before it led to creating directories when they were missing (for example .bash_aliases and similar). This allows us to avoid some cleanups to account for those files being created - instead we simply skip those mounts if the file/folder does not exist. * Force colors in yarn test output in CI (#23986) * Fix breeze failures when there is no buildx installed on Mac (#23988) If you have no buildx plugin installed on Mac (for example when you use colima instead of Docker Desktop) the breeze check was failing - but buildx in fact is not needed to run typical breeze commands, and breeze already has support for it - it was just wrongly handled. * Replace generation of docker volumes to be done from python (#23985) The pre-commit to generate docker volumes in docker compose file is now written in Python and it also uses the newer "volume:" syntax to define the volumes mounted in the docker-compose. * Replace `use_task_execution_date` with `use_task_logical_date` (#23983) * Replace `use_task_execution_date` with `use_task_logical_date` We have some operators/sensors that use `*_execution_date` as the class parameters. This PR deprecate the usage of these parameters and replace it with `logical_date`. There is no change in functionality, under the hood the functionality already uses `logical_date` this is just about the parameters name as exposed to the users. * Remove pinning for xmltodict (#23992) We have now moto 3.1.9+ in constraints so we should remove the limit. Fixes: #23576 * Remove fixing cncf.kubernetes provider when generating constraints (#23994) When we yanked cncf.kubernetes provider, we pinned 3.1.2 temporarily for provider generation. This removes the pinning as we are already at 4.0.2 version * Add better diagnostics capabilities for pre-commits run via CI image (#23980) The pre-commits that require CI image run docker command under the hood that is highly optimized for performance (only mounts files that are necessary to be mounted) - in order to improve performance on Mac OS and make sure that artifacts are not left in the source code of Airflow. However that makes the command slightly more difficult to debug because they generate dynamically the docker command used, including the volumens that should be mounted when the docker command is run. This PR adds better diagnostics to the pre-commit scripts allowing VERBOSE="true" and DRY_RUN="true" variables that can help with diagnosing problems such as running the scripts on WSL2. It also fixes a few documentation bugs that have been missed after changing names of the image-related static checks and thanks to separating the common code to utility function it allows to set SKIP_IMAGE_PRE_COMMITS variable to true which will skip running all pre-commit checks that require breeze image to be available locally. * Disable fail-fast on pushing images to docker cache (#24005) There is an issue with pushing cache to docker registry that is connected to containerd bug but started to appear more frequently recently (as evidenced for example by https://github.saobby.my.eu.orgmunity/t/buildx-failed-with-error-cannot-reuse-body-request-must-be-retried/253178 ). The issue is still open in containerd: https://github.com/containerd/containerd/issues/5978. Until it if fixed, we disable fail-fast on pushing cache so that even if it happens, we just have to re-run that single python version that actually failed. Currently there is a much lower chance of success because all 4 build have to succeed. * Add automated retries on retryable condition for building images in CI (#24006) There is a flakiness in pushing cache images to ghcr.io, therefore we want to add automated retries when the images fail intermittently. The root cause of the problem is tracked in containerd: https://github.com/containerd/containerd/issues/5978 * Ensure @contextmanager decorates generator func (#23103) * Revert "Add automated retries on retryable condition for building images in CI (#24006)" (#24016) This reverts commit 7cf0e43b70eb1c57a90ee7e2ff14b03487ffb018. * Cleanup `BranchDayOfWeekOperator` example dag (#24007) * Cleanup BranchDayOfWeekOperator example dag There is no need for `dag=dag` when using context manager. * Added missing project_id to the wait_for_job (#24020) * Only run separate per-platform build when preparing build cache (#24023) Apparently pushing multi-platform images when building cache on CI has some problems recently, connected with ghcr.io being more vulnerable to race condition described in this issue: https://github.com/containerd/containerd/issues/5978 Apparently when two, different platform layers are pushed about the same time to ghcr.io, the error "cannot reuse body, request must be retried" is generated. However we actually do not even need to build the multiplatform latest images because as of recently we have separate cache for each platform, and the ghcr.io/:latest images are not used any more not even for docker builds. We we always build images rather than pull and we use --from-cache for that - specific per platform. The only image pulling we do is when we pull the :COMMIT_HASH images in CI- but those are single-platform images (amd64) and even if we add tests for arm, they will have different tag. Hopefully we can still build release images without causing the race condition too frequently - this is more likely because when we build images for cache we use machines with different performance characteristics and the same layers are pushed at different times from different platforms. * Preparing buildx cache is allowed without --push-image flag (#24028) The previous version of buildx cache preparation implied --push-image flag, but now this is completely separated (we do not push image, we just prepare cache), so when mutli-platform buildx preparation is run we should also allow the cache to run without --push-image flag. * Add partition related methods to GlueCatalogHook: (#23857) * "get_partition" to retrieve a Partition * "create_partition" to create a Partition * Adds foldable CI group for command output (#24026) * Add foldable groups in CI outputs in commands that need it (#24035) This is follow-up after #24026 which added capability of selectively deciding for each breeze command, whether the output of the command should be "foldable" group. All CI output has been reviewed, and the commands which "need" it were identified. This also fixes a problem introduced there - that the command itself was not "foldable" group itself. * Increase size of ARM build instance (#24036) Our ARM cache builds started to hang recently at yarn prod step. The most likely reason are limited resources we had for the ARM instance to run the docker build - it was rather small instance with 2GB RAM and it is likely not nearly enought to cope with recent changes related to Grid View where we likely need much more memory during the yarn build step. This change increases the instance memory to 8 GB (c6g.xlarge). Also this instance type gives 70% cost saving and has very low probability of being evicted (it's not in high demand in Ohio Region of AWS. Also the AMI used is refreshed with latest software (docker) * Remove unused [github_enterprise] from ref docs (#24033) * Add enum validation for [webserver]analytics_tool (#24032) * Support impersonation service account parameter for Dataflow runner (#23961) * Fix closing connection dbapi.get_pandas_df (#23452) * Light Refactor and Clean-up AWS Provider (#23907) * Removing magic numbers from exceptions (#23997) * Removing magic numbers from exceptions * Running pre-commit * Upgrade to pip 22.1.2 (#24043) Pip has been upgraded to version 22.1.2 12 minutes ago. Time to catch up. * Shaves-off about 3 minutes from usage of ARM instances on CI (#24052) Preparing airflow packages and provider packages does not need to be done on ARM and actually the ARM instance is idle while they are prepared during cache building. This change moves preparation of the packages to before the ARM instance is started which saves about 3 minutes of ARM instance time. * SSL Bucket, Light Logic Refactor and Docstring Update for Alibaba Provider (#23891) * Use KubernetesHook to create api client in KubernetesPodOperator (#20578) Add support for k8s hook in KPO; use it always (even when no conn id); continue to consider the core k8s settings that KPO already takes into account but emit deprecation warning about them. KPO historically takes into account a few settings from core airflow cfg (e.g. verify ssl, tcp keepalive, context, config file, and in_cluster). So to use the hook to generate the client, somehow the hook has to take these settings into account. But we don't want the hook to consider these settings in general. So we read them in KPO and if necessary patch the hook and warn. * Re-add --force-build flag (#24061) After #24052 we also need to add --force-build flag as for Python 3.7 rebuilding CI cache would have been silently ignored as no image building would be needed * Fix grid view for mapped tasks (#24059) * Fix StatD timing metric units (#21106) Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Co-authored-by: Tzu-ping Chung <tp@astronomer.io> * Drop Python 3.6 compatibility objects/modules (#24048) * Remove hack from BigQuery DTS hook (#23887) * Spanner assets & system tests migration (AIP-47) (#23957) * Run the `check_migration` loop at least once (#24068) This is broken since 2.3.0. that's if a user specifies a migration_timeout of 0 then no migration is run at all. * Bump eventsource from 1.0.7 to 1.1.1 in /airflow/ui (#24062) Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.0.7 to 1.1.1. - [Release notes](https://github.com/EventSource/eventsource/releases) - [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md) - [Commits](https://github.com/EventSource/eventsource/compare/v1.0.7...v1.1.1) --- updated-dependencies: - dependency-name: eventsource dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Remove certifi limitations from eager upgrade limits (#23995) The certifi limitation was introduced to keep snowflake happy while performing eager upgrade because it added limits on certifi. However seems like it is not limitation any more in latest versions of snowflake python connector, so we can safely remove it from here. The only remaining limit is dill but this one still holds. * fix style of example block (#24078) * Handle occasional deadlocks in trigger with retries (#24071) Fixes: #23639 * Adds Pura Scents, edits The Dyrt (#24086) * Migrate Yandex example DAGs to new design AIP-47 (#24082) closes: #22470 * set color to operators in cloud_sql.py (#24000) * Migrate HTTP example DAGs to new design AIP-47 (#23991) closes: #22448 , #22431 * Make expand() error vague so it's not misleading (#24018) * Use github for postgres chart index (#24089) Bitnami's CloudFront CDN is seemingly having issues, so point at github direct instead until it is resolved. * Fix the link to google workplace (#24080) * Bring MappedOperator members in sync with BaseOperator (#24034) * Add note about Docker volume remount issues in WSL 2 (#24094) * Convert Athena Sample DAG to System Test (#24058) * Self-update pre-commit to latest versions (#24106) * Temporarily fix bitnami index problem (#24112) We started to experience "Internal Error" when installing Helm chart and apperently bitnami "solved" the problem by removing from their index software older than 6 months(!). This makes our CI fail but It is much worse. This renders all our charts useless for people to install This is terribly wrong, and I raised this in the issue here: https://github.com/bitnami/charts/issues/10539#issuecomment-1144869092 * Fix small typos in static code checks doc (#24113) - Trivial typo fix in the command to run static checks on the last commit - Update "run all tests" to "run all checks" where applicable for consistency * Really workaround bitnami chart problem (#24115) The original fix in #24112 did not work due to: * not updated lock * EOL characters at the end of multiline long URL This PR fixes it. * Reduce grid view API calls (#24083) * Reduce API calls from /grid - Separate /grid_data from /grid - Remove need for formatData - Increase default query stale time to prevent extra fetches - Fix useTask query keys * consolidate grid data functions * fix www tests test grid_data instead of /grid * Removing magic status code numbers from api_connecxion (#24050) * Do not support MSSQL less than v2017 in code (#24095) Our experimental support for MSSQL starts from v2017(in README.md) but we still support 2000 & 2005 in code. This PR removes this support, allowing us to use mssql.DATETIME2 in all MSSQL DB. * Rename Permissions to Permission Pairs. (#24065) * Note that yarn dev needs webserver in debug mode (#24119) * Note that yarn dev needs webserver -d * Update CONTRIBUTING.rst Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Use -D * Revert "Use -D" This reverts commit 94d63adcf36aac13f5d94c2d4cd651907d833794. Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * fixing SSHHook bug when using allow_host_key_change param (#24116) * Adds mssql volumes to "all" backends selection (#24123) The "stop" command of Breeze uses "all" backend to remove all volumes - but mssql has special approach where the volumes defined depend on the filesystem used and we need to add the specific docker-compose files to list of files used when we use stop command. * Breeze must create `hooks\` and `dags\` directories for bind mounts (#24122) Now that breeze uses --mount instead of --volume (the former of which does not create missing mount dirs like the latter does see docs here: https://docs.docker.com/storage/bind-mounts/#differences-between--v-and---mount-behavior) we need to create these directories explicitly. * AIP-47 | Migrate Trino example DAGs to new design (#24118) Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com> Co-authored-by: Michael Peteuil <michael.peteuil@gmail.com> Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com> Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: pierrejeambrun <pierrejbrun@gmail.com> Co-authored-by: Brent Bovenzi <brent.bovenzi@gmail.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Edith Puclla <58795858+edithturn@users.noreply.github.com> Co-authored-by: nsAstro <102520074+nsAstro@users.noreply.github.com> Co-authored-by: ishiis <ishii.shunichi@gmail.com> Co-authored-by: Harpreet Singh <singhharpreet.chadha@gmail.com> Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Co-authored-by: rahulgoyal2987 <rahulgoyal338@gmail.com> Co-authored-by: raphaelauv <raphaelauv@users.noreply.github.com> Co-authored-by: mhenc <mhenc@google.com> Co-authored-by: Jakub Novák <kubus.novak@gmail.com> Co-authored-by: muhua <microhuang@live.com> Co-authored-by: Ruben Laguna <ruben.laguna@gmail.com> Co-authored-by: humit <jhjang1005@naver.com> Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com> Co-authored-by: Gabriel Machado <gabriel.ms1@hotmail.com> Co-authored-by: Kanthi <subkanthi@gmail.com> Co-authored-by: pankajastro <98807258+pankajastro@users.noreply.github.com> Co-authored-by: Sebastian Chamena <43488475+schattian@users.noreply.github.com> Co-authored-by: Ping Zhang <pingzh@umich.edu> Co-authored-by: ishiis <shunichi.ishii@smarthr.co.jp> Co-authored-by: Bartłomiej Hirsz <bartek.hirsz@gmail.com> Co-authored-by: akolar-db <72745279+akolar-db@users.noreply.github.com> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com> Co-authored-by: Karthikeyan Singaravelan <tir.karthi@gmail.com> Co-authored-by: Niko <onikolas@amazon.com> Co-authored-by: Wojciech Januszek <wjanuszek@sigma.ug.edu.pl> Co-authored-by: Wojciech Januszek <januszek@google.com> Co-authored-by: David Caron <dcaron05@gmail.com> Co-authored-by: Ross Lawley <ross.lawley@gmail.com> Co-authored-by: Charles Machalow <csm10495@gmail.com> Co-authored-by: Chris Redekop <32752154+repl-chris@users.noreply.github.com> Co-authored-by: John Bampton <jbampton@users.noreply.github.com> Co-authored-by: Ryan Hatter <25823361+RNHTTR@users.noreply.github.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> Co-authored-by: Jian Yuan Lee <jianyuan@gmail.com> Co-authored-by: D. Ferruzzi <ferruzzi@amazon.com> Co-authored-by: Gonzalo Peci <pecigonzalo@users.noreply.github.com> Co-authored-by: Dmytro Kazanzhy <dkazanzhy@gmail.com> Co-authored-by: Ian Buss <ianbuss@users.noreply.github.com> Co-authored-by: Xiao Fu <xiao.xfu24@gmail.com> Co-authored-by: Joel Ossher <73489824+joelossher@users.noreply.github.com> Co-authored-by: Mike Kravtsov <61209278+mkravtsov-fetchrewards@users.noreply.github.com> Co-authored-by: Ash Berlin-Taylor <ash@apache.org> Co-authored-by: Guilherme Martins Crocetti <24530683+gmcrocetti@users.noreply.github.com> Co-authored-by: 서재권(Data Platform) <90180644+jaegwonseo@users.noreply.github.com> Co-authored-by: Sandeep <sandeep.kadyan@gmail.com> Co-authored-by: Sandeep Kadyan <sandeep.kadyan@publicissapient.com> Co-authored-by: Eugene Karimov <13220923+eskarimov@users.noreply.github.com> Co-authored-by: Vedant Bhamare <55763604+Dark-Knight11@users.noreply.github.com> Co-authored-by: sanjayp <sanjaypillai11@gmail.com> Co-authored-by: Tzu-ping Chung <tp@astronomer.io> Co-authored-by: Dev232001 <thedevhooda@gmail.com> Co-authored-by: Alex Kruchkov <36231027+alexkruc@users.noreply.github.com> Co-authored-by: alexkru <alexkru@wix.com> Co-authored-by: Sumit Maheshwari <msumit@users.noreply.github.com> Co-authored-by: Mark Norman Francis <norm@201created.com> Co-authored-by: Vincent Koc <koconder@users.noreply.github.com> Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com> Co-authored-by: Igor Tavares <igorborgest@gmail.com> Co-authored-by: Marty Jackson <mfjackson2008@gmail.com> Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> Co-authored-by: Kengo Seki <sekikn@apache.org> Co-authored-by: John Green <nhojjohn@users.noreply.github.com> Co-authored-by: David Skoda <dskoda1@binghamton.edu> Co-authored-by: Łukasz Wyszomirski <wyszomirski@google.com> Co-authored-by: Hubert Pietroń <94397721+hubert-pietron@users.noreply.github.com> Co-authored-by: Bernardo Couto <35502483+bernardocouto@users.noreply.github.com> Co-authored-by: viktorvia <86823020+viktorvia@users.noreply.github.com> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Co-authored-by: henriqueribeiro <henriqueribeiro@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chenglong Yan <alanx.yan@gmail.com> Co-authored-by: François de Metz <francois@2metz.fr> Co-authored-by: Paul Williams <pdw@udel.edu> Co-authored-by: James Timmins <james@astronomer.io> Co-authored-by: chethanuk-plutoflume <chethanuk@outlook.com>

* Doc: Add column names for DB Migration Reference (apache#23853) Before the automation: https://airflow.apache.org/docs/apache-airflow/2.2.5/migrations-ref.html Currently (with missing column names): https://airflow.apache.org/docs/apache-airflow/2.3.0/migrations-ref.html * Fix exception trying to display moved table warnings (apache#23837) If you still have an old dangling table from the 2.2 migration this would fail. Make it more resilient and cope with both styles of moved table name * Update sample dag and doc for RDS (apache#23651) * Fix DataprocJobBaseOperator not being compatible with dotted names (apache#23439). (apache#23791) * job_name parameter is now sanitized, replacing dots by underscores. * Upgrade `pip` to 22.1.1 version (just released) (apache#23854) * Add better feedback to Breeze users about expected action timing (apache#23827) There are a few actions in Breeze that might take more or less time when invoked. This is mostly when you need to upgrade Breeze or update to latest version of the image because some dependedncies were added or image was modified. While we have improved significantly the waiting time involved now (and caching problems have been fixed to make it as fast possible), there are still a few situations that you need to have a good connectivity and a little time to run the upgrade. Which is often not something you would like to loose your time on in a number of cases when you need to do things fast. Usually Breeeze does not force the user to perform such long actions - it allows to continue without doing them (either by timeout or by letting user answer "no" to question asked. Previously Breeze have not informed the user about the exepcted time of running such operation, but with this change it tells what is the expected delay - thus allowing the user to make informed action whether they want to run the upgrade or not. * Fix UnboundLocalError when sql is empty list in DbApiHook (apache#23816) * Fix UnboundLocalError when sql is empty list in DatabricksSqlHook (apache#23815) * Add number of node params only for single-node cluster in RedshiftCreateClusterOperator (apache#23839) * Sql to gcs with exclude columns (apache#23695) * Add support for associating custom tags to job runs submitted via EmrContainerOperator (apache#23769) Co-authored-by: Sandeep Kadyan <sandeep.kadyan@publicissapient.com> * Add Deferrable Databricks operators (apache#19736) * Fix Amazon EKS example DAG raises warning during Imports (apache#23849) Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> * Fix databricks tests (apache#23856) * Add __wrapped__ property to _TaskDecorator (apache#23830) Co-authored-by: Sanjay Pillai <sanjaypillai11 [at] gmail.com> * Highlight task states by hovering on legend row (apache#23678) * Rework the legend row and add the hover effect. * Move horevedTaskState to state and fix merge conflicts. * Add tests. * Order of item in the LegendRow, add no_status support * Clean up f-strings in logging calls (apache#23597) * update K8S-KIND to 0.14.0 (apache#23859) * Replaced all days_ago functions with datetime functions (apache#23237) Co-authored-by: Dev232001 <thedevhooda@gmail.com> * Add clear DagRun endpoint. (apache#23451) * Ignore the DeprecationWarning in test_days_ago (apache#23875) Co-authored-by: alexkru <alexkru@wix.com> * Speed up Breeze experience on Mac OS (apache#23866) This change should significantly speed up Breeze experience (and especially iterating over a change in Breeze for MacOS users - independently if you are using x86 or arm architecture. The problem with MacOS with docker is particularly slow filesystem used to map sources from Host to Docker VM. It is particularly bad when there are multiple small files involved. The improvement come from two areas: * removing duplicate pycache cleaning * moving MyPy cache to docker volume When entering breeze we are - just in case - cleaning .pyc and __pychache__ files potentially generated outside of the docker container - this is particularly useful if you use local IDE and you do not have bytecode generation disabled (we have it disabled in Breeze). Generating python bytecode might lead to various problems when you are switching branches and Python versions, so for Breeze development where the files change often anyway, disabling them and removing when they are found is important. This happens at entering breeze and it might take a second or two depending if you have locally generated. It could happen that __init script was called twice (depending which script was called - therefore the time could be double the one that was actually needed. Also if you ever generated provider packages, the time could be much longer, because node_modules generated in provider sources were not excluded from searching (and on MacOS it takes a LOT of time). This also led to duplicate time of exit as the initialization code installed traps that were also run twice. The traps however were rather fast so had no negative influence on performance. The change adds a guard so that initialization is only ever executed once. Second part of the change is moving the cache of mypy to a docker volume rather than being used from local source folder (default when complete sources are mounted). We were already using selective mount to make sure MacOS filesystem slowness affects us in minimal way - but with this change, the cache will be stored in docker volume that does not suffer from the same problems as mounting volumes from host. The Docker volume is preserved until the `docker stop` command is run - which means that iterating over a change should be WAY faster now - observed speed-up were around 5x speedups for MyPy pre-commit. * Add default task retry delay config (apache#23861) * Move MappedOperator tests to mirror code location (apache#23884) At some point during the development of AIP-42 we moved the code for MappedOperator out of baseoperator.py to mappedoperator.py, but we didn't move the tests at the same time * Enable clicking on DAG owner in autocomplete dropdown (apache#23804) PR#18991 introduced directly navigating to a DAG when selecting one from the typeahead search results. Unfortunately, the search results also includes DAG owner names, and selecting one of those navigates to a DAG with that name, which almost certainly doesn't exist. This extends the autocompletion endpoint to return the type of result, and adjusts the typeahead selection to use this to know which way to navigate. * Document LocalKubernetesExecutor support in chart (apache#23876) * Avoid extra questions in `breeze build image` command. (apache#23898) Fixes: apache#23867 * Update INTHEWILD.md (apache#23892) * Split contributor's quick start into separate guides. (apache#23762) The foldable parts were not good. They made links not to work as well as they were not too discoverable. Fixes: apache#23174 * Avoid printing exception when exiting tests command (apache#23897) Fixes: apache#23868 * Move string arg evals to `execute()` in `EksCreateClusterOperator` (apache#23877) Currently there are string-value evaluations of `compute`, `nodegroup_role_arn`, and `fargate_pod_execution_role_arn` args in the constructor of `EksCreateClusterOperator`. These args are all listed as a template fields so it's entirely possible that the value(s) passed in to the operator is a Jinja expression or an `XComArg`. Either of these value types could cause a false-negative `ValueError` (in the case of unsupported `compute` values) or a `false-positive` (in the the cases of explicit checks for the *arn values) since the values themselves have not been rendered. This PR moves the evaluations of these args to the `execute()` scope. * Update .readthedocs.yml (apache#23903) String instead of Int see https://docs.readthedocs.io/en/stable/config-file/v2.html * Make --file command in static-checks autocomplete file name (apache#23896) The --verbose and --dry-dun commands caused n --files command to fail and the flag was "artifficial" -it was equivalent to bool flag. the actual files were taken from arguments. This PR fixes it by turning the arguments into multiple ``--file`` commands - each with its own completioin for local files. * Chart: Update default airflow version to `2.3.1` (apache#23913) * Fix Breeze documentation typo (apache#23919) * Update environments documentation links (apache#23920) * `2.3.1` has been released (apache#23912) * Make CI and PROD image builds consistent (apache#23841) Simple refactoring to make the jobs more consistent. * Alphabetizes two tables (apache#23923) The rest of the page has consistently alphabetized tables. This commit fixes three `extras` that were not alphabetized. * Use "remote" pod when patching KPO pod as "checked" (apache#23676) When patching as "checked", we have to use the current version of the pod otherwise we may get an error when trying to patch it, e.g.: ``` Operation cannot be fulfilled on pods \"test-kubernetes-pod-db9eedb7885c40099dd40cd4edc62415\": the object has been modified; please apply your changes to the latest version and try again" ``` This error would not cause a failure of the task, since errors in `cleanup` are suppressed. However, it would fail to patch. I believe one scenario when the pod may be updated is when retrieving xcom, since the sidecar is terminated after extracting the value. Concerning some changes in the tests re the "already_checked" label, it was added to a few "expected pods" recently, when we changed it to patch even in the case of a successful pod. Since we are changing the "patch" code to patch with the latest read on the pod that we have (i.e. using the `remote_pod` variable), and no longer the pod object stored on `k.pod`, the label no longer shows up in those tests (that's because in k.pod isn't actually a read of the remote pod, but just happens to get mutated in the patch function before it is used to actually patch the pod). Further, since the `remote_pod` is a local variable, we can't observe it in tests. So we have to read the pod using k8s api. _But_, our "find pod" function excludes "already checked" pods! So we have to make this configurable. So, now we have a proper integration test for the "already_checked" behavior (there was already a unit test). * Clarify manual merging of PR in release doc (apache#23928) It was not clear to me what this really means * Fix broken main (apache#23940) main breaks with `Traceback: /usr/local/lib/python3.7/importlib/__init__.py:127: in import_module return _bootstrap._gcd_import(name[level:], package, level) tests/providers/amazon/aws/hooks/test_cloud_formation.py:31: in <module> class TestCloudFormationHook(unittest.TestCase): tests/providers/amazon/aws/hooks/test_cloud_formation.py:67: in TestCloudFormationHook @mock_cloudformation /usr/local/lib/python3.7/site-packages/moto/__init__.py:30: in f module = importlib.import_module(module_name, "moto") /usr/local/lib/python3.7/importlib/__init__.py:127: in import_module return _bootstrap._gcd_import(name[level:], package, level) /usr/local/lib/python3.7/site-packages/moto/cloudformation/__init__.py:1: in <module> from .models import cloudformation_backends /usr/local/lib/python3.7/site-packages/moto/cloudformation/models.py:18: in <module> from .parsing import ResourceMap, OutputMap /usr/local/lib/python3.7/site-packages/moto/cloudformation/parsing.py:17: in <module> from moto.apigateway import models # noqa # pylint: disable=all /usr/local/lib/python3.7/site-packages/moto/apigateway/__init__.py:1: in <module> from .models import apigateway_backends /usr/local/lib/python3.7/site-packages/moto/apigateway/models.py:9: in <module> from openapi_spec_validator import validate_spec E ModuleNotFoundError: No module named 'openapi_spec_validator' ` Fix is already in placed in moto getmoto/moto#5165 but version 3.1.11 wasn't released yet * Update INSTALL_PROVIDERS_FROM_SOURCES instructions. (apache#23938) * Add typing to Azure Cosmos Client Hook (apache#23941) New release of Azure Cosmos library has added typing information and it broke main builds with mypy verification. * Remove redundant register exit signals in `dag-processor` command (apache#23886) * Disable rebase workflow (apache#23943) The change of the release workflow in apache#23928 removed the reason why we should have rebase workflow possible. We only needed to do rebase when we merged test branch into stable branch and since we are doing it manually, there is no more reeason to have it in the GitHub UI. * Prevent UI from crashing if grid task instances are null (apache#23939) * UI fix for null task instances * improve tests without global vars * fix test data * Grid fix details button truncated and small UI tweaks (apache#23934) * Show details button and wrap on LegendRow. * Update following brent review * Fix display on small width * Rotate icon for a 'ReadLess' effect * Fix and speed up grid view (apache#23947) This fetches all TIs for a given task across dag runs, leading to signifincatly faster response times. It also fixes a bug where Nones were being passed to the UI when a new task was added to a DAG with exiting runs. * Removes duplicate code block (apache#23952) There's are two code blocks with identical text in the helm-chart docs. This commit removes one of them. * Update dep for databricks apache#23917 (apache#23927) * Use '--subdir' argument value for standalong dag processor. (apache#23864) * Revert "Add limit for JPype1 (apache#23847)" (apache#23953) This turned out to be mistake in manual submission. Fixed on JPype1 side. This reverts commit 3699be4. * Faster grid view (apache#23951) * Disallow calling expand with no arguments (apache#23463) * [FEATURE] KPO use K8S hook (apache#22086) * Add cascade to `dag_tag` to `dag` foreignkey (apache#23444) Bulk delete does not work if the cascade behaviour of a foreignkey is set on python side(relationship configuration). To allow bulk delete of dags we need to setup cascade deletion in the DB. The warning on query.delete at https://docs.sqlalchemy.org/en/14/orm/session_basics.html#selecting-a-synchronization-strategy stated that: The operations do not offer in-Python cascading of relationships - it is assumed that ON UPDATE CASCADE and/or ON DELETE CASCADE is configured for any foreign key references which require it, otherwise the database may emit an integrity violation if foreign key references are being enforced. Another alternative is avoiding bulk delete of dags but I prefer we support bulk deletes. This will break offline sql generation for mssql(already broken before now :) ). Also, since there's only one foreign key in `dag_tag` table, I assume that the foreign key would be named `dag_tag_ibfk_1` in `mysql`. This avoided having to query the db for the name. The foreignkey is explicitly named now, would be easy for future upgrades * DagFileProcessorManager: Start a new process group only if current process not a session leader (apache#23872) * Introduce `flake8-implicit-str-concat` plugin to static checks (apache#23873) * Fix UnboundLocalError when sql is empty list in ExasolHook (apache#23812) * Fix inverted section levels in best-practices.rst (apache#23968) This PR fixes inverted levels in the sections added to the "Best Practices" document in apache#21879. * Add support to specify language name in PapermillOperator (apache#23916) * Add support to specify language name in PapermillOperator * Replace getattr() with simple attribute access * [23945] Icons in grid view for different dag types (apache#23970) * Helm logo no longer a link (apache#23977) * Fix links in documentation (apache#23975) * fix links * added right link to breeze * Add TaskInstance State 'REMOVED' to finished states and success states (apache#23797) Now that we support dynamic task mapping, we should have the 'REMOVED' state of task instances as a finished state because for dynamic tasks with a removed task instance, the dagrun would be stuck in running state if 'REMOVED' state is not in finished states. * Remove `xcom_push` from `DockerOperator` (apache#23981) * Fix missing shorthand for docker buildx rm -f (apache#23984) Latest version of buildx removed -f as shorthand for --force flag. * use explicit --mount with types of mounts rather than --volume flags (apache#23982) The --volume flag is an old style of specifying mounts used by docker, the newer and more explicit version is --mount where you have to specify type, source, destination in the form of key/value pairs. This is more explicit and avoids some guesswork when volumes are mounted (for example seems that on WSL2 volume name might be guessed as path wrongly). The change explicitly specifies which of the mounts are bind mounts and which are volume mounts. Another nice side effect of this change is that when source is missing, docker will not automatically create directories with the missing name but it will fail. This is nicer because before it led to creating directories when they were missing (for example .bash_aliases and similar). This allows us to avoid some cleanups to account for those files being created - instead we simply skip those mounts if the file/folder does not exist. * Force colors in yarn test output in CI (apache#23986) * Fix breeze failures when there is no buildx installed on Mac (apache#23988) If you have no buildx plugin installed on Mac (for example when you use colima instead of Docker Desktop) the breeze check was failing - but buildx in fact is not needed to run typical breeze commands, and breeze already has support for it - it was just wrongly handled. * Replace generation of docker volumes to be done from python (apache#23985) The pre-commit to generate docker volumes in docker compose file is now written in Python and it also uses the newer "volume:" syntax to define the volumes mounted in the docker-compose. * Replace `use_task_execution_date` with `use_task_logical_date` (apache#23983) * Replace `use_task_execution_date` with `use_task_logical_date` We have some operators/sensors that use `*_execution_date` as the class parameters. This PR deprecate the usage of these parameters and replace it with `logical_date`. There is no change in functionality, under the hood the functionality already uses `logical_date` this is just about the parameters name as exposed to the users. * Remove pinning for xmltodict (apache#23992) We have now moto 3.1.9+ in constraints so we should remove the limit. Fixes: apache#23576 * Remove fixing cncf.kubernetes provider when generating constraints (apache#23994) When we yanked cncf.kubernetes provider, we pinned 3.1.2 temporarily for provider generation. This removes the pinning as we are already at 4.0.2 version * Add better diagnostics capabilities for pre-commits run via CI image (apache#23980) The pre-commits that require CI image run docker command under the hood that is highly optimized for performance (only mounts files that are necessary to be mounted) - in order to improve performance on Mac OS and make sure that artifacts are not left in the source code of Airflow. However that makes the command slightly more difficult to debug because they generate dynamically the docker command used, including the volumens that should be mounted when the docker command is run. This PR adds better diagnostics to the pre-commit scripts allowing VERBOSE="true" and DRY_RUN="true" variables that can help with diagnosing problems such as running the scripts on WSL2. It also fixes a few documentation bugs that have been missed after changing names of the image-related static checks and thanks to separating the common code to utility function it allows to set SKIP_IMAGE_PRE_COMMITS variable to true which will skip running all pre-commit checks that require breeze image to be available locally. * Disable fail-fast on pushing images to docker cache (apache#24005) There is an issue with pushing cache to docker registry that is connected to containerd bug but started to appear more frequently recently (as evidenced for example by https://github.saobby.my.eu.orgmunity/t/buildx-failed-with-error-cannot-reuse-body-request-must-be-retried/253178 ). The issue is still open in containerd: containerd/containerd#5978. Until it if fixed, we disable fail-fast on pushing cache so that even if it happens, we just have to re-run that single python version that actually failed. Currently there is a much lower chance of success because all 4 build have to succeed. * Add automated retries on retryable condition for building images in CI (apache#24006) There is a flakiness in pushing cache images to ghcr.io, therefore we want to add automated retries when the images fail intermittently. The root cause of the problem is tracked in containerd: containerd/containerd#5978 * Ensure @contextmanager decorates generator func (apache#23103) * Revert "Add automated retries on retryable condition for building images in CI (apache#24006)" (apache#24016) This reverts commit 7cf0e43. * Cleanup `BranchDayOfWeekOperator` example dag (apache#24007) * Cleanup BranchDayOfWeekOperator example dag There is no need for `dag=dag` when using context manager. * Added missing project_id to the wait_for_job (apache#24020) * Only run separate per-platform build when preparing build cache (apache#24023) Apparently pushing multi-platform images when building cache on CI has some problems recently, connected with ghcr.io being more vulnerable to race condition described in this issue: containerd/containerd#5978 Apparently when two, different platform layers are pushed about the same time to ghcr.io, the error "cannot reuse body, request must be retried" is generated. However we actually do not even need to build the multiplatform latest images because as of recently we have separate cache for each platform, and the ghcr.io/:latest images are not used any more not even for docker builds. We we always build images rather than pull and we use --from-cache for that - specific per platform. The only image pulling we do is when we pull the :COMMIT_HASH images in CI- but those are single-platform images (amd64) and even if we add tests for arm, they will have different tag. Hopefully we can still build release images without causing the race condition too frequently - this is more likely because when we build images for cache we use machines with different performance characteristics and the same layers are pushed at different times from different platforms. * Preparing buildx cache is allowed without --push-image flag (apache#24028) The previous version of buildx cache preparation implied --push-image flag, but now this is completely separated (we do not push image, we just prepare cache), so when mutli-platform buildx preparation is run we should also allow the cache to run without --push-image flag. * Add partition related methods to GlueCatalogHook: (apache#23857) * "get_partition" to retrieve a Partition * "create_partition" to create a Partition * Adds foldable CI group for command output (apache#24026) * Add foldable groups in CI outputs in commands that need it (apache#24035) This is follow-up after apache#24026 which added capability of selectively deciding for each breeze command, whether the output of the command should be "foldable" group. All CI output has been reviewed, and the commands which "need" it were identified. This also fixes a problem introduced there - that the command itself was not "foldable" group itself. * Increase size of ARM build instance (apache#24036) Our ARM cache builds started to hang recently at yarn prod step. The most likely reason are limited resources we had for the ARM instance to run the docker build - it was rather small instance with 2GB RAM and it is likely not nearly enought to cope with recent changes related to Grid View where we likely need much more memory during the yarn build step. This change increases the instance memory to 8 GB (c6g.xlarge). Also this instance type gives 70% cost saving and has very low probability of being evicted (it's not in high demand in Ohio Region of AWS. Also the AMI used is refreshed with latest software (docker) * Remove unused [github_enterprise] from ref docs (apache#24033) * Add enum validation for [webserver]analytics_tool (apache#24032) * Support impersonation service account parameter for Dataflow runner (apache#23961) * Fix closing connection dbapi.get_pandas_df (apache#23452) * Light Refactor and Clean-up AWS Provider (apache#23907) * Removing magic numbers from exceptions (apache#23997) * Removing magic numbers from exceptions * Running pre-commit * Upgrade to pip 22.1.2 (apache#24043) Pip has been upgraded to version 22.1.2 12 minutes ago. Time to catch up. * Shaves-off about 3 minutes from usage of ARM instances on CI (apache#24052) Preparing airflow packages and provider packages does not need to be done on ARM and actually the ARM instance is idle while they are prepared during cache building. This change moves preparation of the packages to before the ARM instance is started which saves about 3 minutes of ARM instance time. * SSL Bucket, Light Logic Refactor and Docstring Update for Alibaba Provider (apache#23891) * Use KubernetesHook to create api client in KubernetesPodOperator (apache#20578) Add support for k8s hook in KPO; use it always (even when no conn id); continue to consider the core k8s settings that KPO already takes into account but emit deprecation warning about them. KPO historically takes into account a few settings from core airflow cfg (e.g. verify ssl, tcp keepalive, context, config file, and in_cluster). So to use the hook to generate the client, somehow the hook has to take these settings into account. But we don't want the hook to consider these settings in general. So we read them in KPO and if necessary patch the hook and warn. * Re-add --force-build flag (apache#24061) After apache#24052 we also need to add --force-build flag as for Python 3.7 rebuilding CI cache would have been silently ignored as no image building would be needed * Fix grid view for mapped tasks (apache#24059) * Fix StatD timing metric units (apache#21106) Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Co-authored-by: Tzu-ping Chung <tp@astronomer.io> * Drop Python 3.6 compatibility objects/modules (apache#24048) * Remove hack from BigQuery DTS hook (apache#23887) * Spanner assets & system tests migration (AIP-47) (apache#23957) * Run the `check_migration` loop at least once (apache#24068) This is broken since 2.3.0. that's if a user specifies a migration_timeout of 0 then no migration is run at all. * Bump eventsource from 1.0.7 to 1.1.1 in /airflow/ui (apache#24062) Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.0.7 to 1.1.1. - [Release notes](https://github.com/EventSource/eventsource/releases) - [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md) - [Commits](EventSource/eventsource@v1.0.7...v1.1.1) --- updated-dependencies: - dependency-name: eventsource dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Remove certifi limitations from eager upgrade limits (apache#23995) The certifi limitation was introduced to keep snowflake happy while performing eager upgrade because it added limits on certifi. However seems like it is not limitation any more in latest versions of snowflake python connector, so we can safely remove it from here. The only remaining limit is dill but this one still holds. * fix style of example block (apache#24078) * Handle occasional deadlocks in trigger with retries (apache#24071) Fixes: apache#23639 * Adds Pura Scents, edits The Dyrt (apache#24086) * Migrate Yandex example DAGs to new design AIP-47 (apache#24082) closes: apache#22470 * set color to operators in cloud_sql.py (apache#24000) * Migrate HTTP example DAGs to new design AIP-47 (apache#23991) closes: apache#22448 , apache#22431 * Make expand() error vague so it's not misleading (apache#24018) * Use github for postgres chart index (apache#24089) Bitnami's CloudFront CDN is seemingly having issues, so point at github direct instead until it is resolved. * Fix the link to google workplace (apache#24080) * Bring MappedOperator members in sync with BaseOperator (apache#24034) * Add note about Docker volume remount issues in WSL 2 (apache#24094) * Convert Athena Sample DAG to System Test (apache#24058) * Self-update pre-commit to latest versions (apache#24106) * Temporarily fix bitnami index problem (apache#24112) We started to experience "Internal Error" when installing Helm chart and apperently bitnami "solved" the problem by removing from their index software older than 6 months(!). This makes our CI fail but It is much worse. This renders all our charts useless for people to install This is terribly wrong, and I raised this in the issue here: bitnami/charts#10539 (comment) * Fix small typos in static code checks doc (apache#24113) - Trivial typo fix in the command to run static checks on the last commit - Update "run all tests" to "run all checks" where applicable for consistency * Really workaround bitnami chart problem (apache#24115) The original fix in apache#24112 did not work due to: * not updated lock * EOL characters at the end of multiline long URL This PR fixes it. * Reduce grid view API calls (apache#24083) * Reduce API calls from /grid - Separate /grid_data from /grid - Remove need for formatData - Increase default query stale time to prevent extra fetches - Fix useTask query keys * consolidate grid data functions * fix www tests test grid_data instead of /grid * Removing magic status code numbers from api_connecxion (apache#24050) * Do not support MSSQL less than v2017 in code (apache#24095) Our experimental support for MSSQL starts from v2017(in README.md) but we still support 2000 & 2005 in code. This PR removes this support, allowing us to use mssql.DATETIME2 in all MSSQL DB. * Rename Permissions to Permission Pairs. (apache#24065) * Note that yarn dev needs webserver in debug mode (apache#24119) * Note that yarn dev needs webserver -d * Update CONTRIBUTING.rst Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Use -D * Revert "Use -D" This reverts commit 94d63ad. Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * fixing SSHHook bug when using allow_host_key_change param (apache#24116) * Adds mssql volumes to "all" backends selection (apache#24123) The "stop" command of Breeze uses "all" backend to remove all volumes - but mssql has special approach where the volumes defined depend on the filesystem used and we need to add the specific docker-compose files to list of files used when we use stop command. * Breeze must create `hooks\` and `dags\` directories for bind mounts (apache#24122) Now that breeze uses --mount instead of --volume (the former of which does not create missing mount dirs like the latter does see docs here: https://docs.docker.com/storage/bind-mounts/#differences-between--v-and---mount-behavior) we need to create these directories explicitly. * AIP-47 | Migrate Trino example DAGs to new design (apache#24118) * Update production-deployment.rst (apache#24121) The sql_alchemy_conn option is in the database section, not the core section. Simple typo fix. * Migrate Zendesk example DAGs to new design apache#22471 (apache#24129) * Migrate JDBC example DAGs to new design apache#22450 (apache#24137) * Migrate Jenkins example DAGs to new design apache#22451 (apache#24138) * Migrate Microsoft example DAGs to new design apache#22452 - mssql (apache#24139) * Migrate MySQL example DAGs to new design apache#22453 (apache#24142) * Migrate Opsgenie example DAGs to new design apache#22455 (apache#24144) * Migrate Presto example DAGs to new design apache#22459 (apache#24145) * Migrate Plexus example DAGs to new design apache#22457 (apache#24147) * Migrate SQLite example DAGs to new design apache#22461 (apache#24150) * Migrate Telegram example DAGs to new design apache#22468 (apache#24126) * AIP-47 - Migrate Tableau DAGs to new design (apache#24125) * Migrate Salesforce example DAGs to new design apache#22463 (apache#24127) * Update credentials when using ADC in Compute Engine (apache#23773) * Improve Windows development compatibility for breeze (apache#24098) * Migrate Asana example DAGs to new design apache#22440 (apache#24131) * Migrate Neo4j example DAGs to new design apache#22454 (apache#24143) * Workflows assets & system tests migration (AIP-47) (apache#24105) * Workflows assets & system tests migration (AIP-47) Co-authored-by: Wojciech Januszek <januszek@google.com> * Add disabled_algorithms as an extra parameter for SSH connections (apache#24090) * Migrate Postgres example DAGs to new design apache#22458 (apache#24148) * Migrate Postgres example DAGs to new design apache#22458 * Fix static checks * Migrate Snowflake system tests to new design apache#22434 (apache#24151) * Migrate Snowflake system tests to new design apache#22434 * Fix flake8 * Migrate Qubole example DAGs to new design apache#22460 (apache#24149) * Migrate Qubole example DAGs to new design apache#22460 * Migrate Microsoft example DAGs to new design apache#22452 - azure (apache#24141) * Migrate Microsoft example DAGs to new design apache#22452 - azure * Migrate Microsoft example DAGs to new design apache#22452 - winrm (apache#24140) * Migrate Microsoft example DAGs to new design apache#22452 - winrm * Fix static checks * Migrate Influx example DAGs to new design apache#22449 (apache#24136) * Migrate Influx example DAGs to new design apache#22449 * Fix static checks * Migrate DingTalk example DAGs to new design apache#22443 (apache#24133) * Migrate DingTalk example DAGs to new design apache#22443 * Migrate Cncf.Kubernetes example DAGs to new design apache#22441 (apache#24132) * Migrate Cncf.Kubernetes example DAGs to new design apache#22441 * Migrate Alibaba example DAGs to new design apache#22437 (apache#24130) * Migrate Alibaba example DAGs to new design apache#22437 * Pass connection extra parameters to wasb BlobServiceClient (apache#24154) * fix BigQueryInsertJobOperator (apache#24165) * Migrate Singularity example DAGs to new design apache#22464 (apache#24128) * Better summary of status of AIP-47 (apache#24169) Result is here: apache#24168 * Remove old Athena Sample DAG (apache#24170) * removed old files (apache#24172) * Chart: Default to Airflow 2.3.2 (apache#24184) * Update 'rich' to latest version across the board. (apache#24186) That Also includes regenerating the breeze output images. * Fix BigQuery system tests (apache#24013) * Change execution_date to data_interval_start in BigQueryInsertJobOperator job_id Change-Id: Ie1f3bba701169ceb2b39d693da320564de145c0c * Change jinja template path to relative path Change-Id: I6cced215124f69e9f4edf8ac08bb71d3ec3c8afc Co-authored-by: Bartlomiej Hirsz <bartomiejh@google.com> * `2.3.2` has been released (apache#24182) * Add verification step to image release process (apache#24177) * Added impersonation_chain for DataflowStartFlexTemplateOperator and DataflowStartSqlJobOperator (apache#24046) * Add key_secret_project_id parameter which specifies a project with KeyFile (apache#23930) * Add built-in Extrenal Link for ExternalTaskMarker operator (apache#23964) * fix: DatabricksSubmitRunOperator and DatabricksRunNowOperator cannot define .json as template_ext (apache#23622) (apache#23641) * fix: StepFunctionHook ignores explicit set `region_name` (apache#23976) * Remove `GithubOperator` use in `GithubSensor.__init__()`` (apache#24214) The constructor for `GithubSensor` was instantiating `GitHubOperator` to use its `execute()` method as the driver for the result of the sensor's `poke()` logic. However, this could yield a `DuplicateTaskIdFound` when used in DAGs. This PR updates the `GithubSensor` to use the `GithubHook` instead. * Mac M1 postgress and doc fix (apache#24200) * AIP-47 - Migrate dbt DAGs to new design apache#22472 (apache#24202) * AIP-47 - Migrate databricks DAGs to new design apache#22442 (apache#24203) * AIP-47 - Migrate hive DAGs to new design apache#22439 (apache#24204) * AIP-47 - Migrate kylin DAGs to new design apache#22439 (apache#24205) * AIP-47 - Migrate drill DAGs to new design apache#22439 (apache#24206) * AIP-47 - Migrate druid DAGs to new design apache#22439 (apache#24207) * AIP-47 - Migrate cassandra DAGs to new design apache#22439 (apache#24209) * AIP-47 - Migrate spark DAGs to new design apache#22439 (apache#24210) * AIP-47 - Migrate apache pig DAGs to new design apache#22439 (apache#24212) * Migrate GitHub example DAGs to new design apache#22446 (apache#24134) * Remove warnings when starting breeze (apache#24183) Breeze when started produced three warnings that were harmless, but we should fix them to remove "false positives". * AIP-47 - Migrate livy DAGs to new design apache#22439 (apache#24208) * Remove escaping which is wrong in latest rich version (apache#24217) Latest rich makes escaping not needed for extra `[` needed in Markdown URLs. * Parse error for task added to multiple groups (apache#23071) This raises an exception if a task already belonging to a task group (including added to a DAG, since such task is automatically added to the DAG's root task group). Also, according to the issue response, manually calling TaskGroup.add() is not considered a supported way to add a task to group. So a meta-marker is added to the function docstring to prevent it from showing up in documentation and users from trying to use it. * Fix xfail test in test_scheduler.py (apache#23731) * Migrate Papermill example DAGs to new design apache#22456 (apache#24146) * Migrate Asana system tests to new design AIP-47 (apache#24226) closes: apache#22428 related: apache#22440 * Migrate Microsoft system tests to new design AIP-47 (apache#24225) closes: apache#22432 related: apache#22452 * Migrate CNCF system tests to new design AIP-47 (apache#24224) closes: apache#22429 related: apache#22441 * Migrate Postgres system tests to new design (apache#24223) closes: apache#22433 related: apache#22458 * AIP-47 - Migrate beam DAGs to new design apache#22439 (apache#24211) * AIP-47 - Migrate beam DAGs to new design apache#22439 * Add explanatory note for contributors about updating Changelog (apache#24229) * Fix backwards-compatibility introduced by fixing mypy problems (apache#24230) There was a backwards-incompatibility introduced by apache#23716 in two providers by using get_mandatory_value config method. This PR corrects that backwards compatibility and updates 2.1 compatibility pre-commit to check for forbidden usage of get_mandatory_value. * Bump moto version (apache#24222) * Bump moto version version 3.1.10 broke main but the issue was fixed since in moto related: getmoto/moto#5165 * fix moto * Add `PrestoToSlackOperator` (apache#23979) * Add `PrestoToSlackOperator` Adding the funcitonality to run a single query against presto and send the result as slack message. Similar to `SnowflakeToSlackOperator` * Fix BigQuery Sensors system test (apache#24245) Co-authored-by: Bartlomiej Hirsz <bartomiejh@google.com> * adding AWS_DEFAULT_REGION to the docs, boto3 expects this to be in the env variables (apache#24181) * Unify return_code interface for task runner (apache#24093) * Update dbt.py (apache#24218) * Fix GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix (apache#24039) * Adding fnmatch type regex to SFTPSensor (apache#24084) * docs: amazon-provider retry modes (apache#23906) * Cloud Storage assets & StorageLink update (apache#23865) Co-authored-by: Wojciech Januszek <januszek@google.com> * Fix useTasks crash on error (apache#24152) * Prevent UI from crashing on Get API error * add test * don't show API errors in test logs * use setMinutes inline * Refactor GlueJobHook get_or_create_glue_job method. (apache#24215) When invoked, create_job takes into account the provided 'Command' argument instead of having it hardcoded. * Fix delete_cluster no use TriggerRule.ALL_DONE (apache#24213) related: apache#24082 * docker new system test (apache#23167) * chore: Refactoring and Cleaning Apache Providers (apache#24219) * Fix await_container_completion condition (apache#23883) * Migrate Apache Beam system tests to new design AIP-47 (apache#24256) closes: apache#22427 * Migrate Apache Beam system tests to new design apache#22427 (apache#24241) * Migrate Google leveldb system tests to new design AIP-47 (apache#24255) related: apache#22447, apache#22430 * Add param docs to KubernetesHook and KubernetesPodOperator (apache#23955) (apache#24054) * Enable dbt Cloud provider to interact with single tenant instances (apache#24264) * Enable provider to interact with single tenant * Define single tenant arg on Operator * Add test for single tenant endpoint * Enable provider to interact with single tenant * Define single tenant arg on Operator * Add test for single tenant endpoint * Code linting from black * Code linting from black * Pass tenant to dbtCloudHook in DbtCloudGetJobRunArtifactOperator class * Make Tenant a connection-level setting * Remove tenant arg from Operator * Make tenant connection-level param that defaults to 'cloud' * Remove tenant param from sensor * Remove leftover param string from hook * Update airflow/providers/dbt/cloud/hooks/dbt.py Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com> * Parameterize test_init_hook to test single and multi tenant connections * Integrate test simplification suggestion * Add connection to TestDbtCloudJobRunSesnor Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com> * Apply per-run log templates to log handlers (apache#24153) * AIP-47 - Migrate google leveldb DAGs to new design #apache#22447 (apache#24233) * Fix choosing backend versions in breeze's command line (apache#24228) Choosing version of backend were broken when command line switches were used. The _VERSION variables were "hard-coded" to defaults rather than taken from command line. This is a remnant of initial implementation and converting the parameters to "cacheable" ones. While looking at the versions we also found that PARAM_NAME_FLAG is not used any more so we took the opportunity to remove it. * Fix link broken after apache#24082 (apache#24276) apache#24082 * Add command to regenerate breeze command output images (apache#24216) * Make numpy effectively an optional dependency for Oracle provider (apache#24272) Better fix to apache#23132 * Add SMAP Energy to list of companies using Airflow (apache#24268) * fix command and typo (apache#24282) * Update doc and sample dag for EMR Containers (apache#24087) * scheduleinterval nullable true added in openapi (apache#24253) * Check that edge nodes actually exist (apache#24166) * Prepare docs for May 2022 provider's release (apache#24231) This documentation update also (following the rule agreed in https://github.com/apache/airflow/blob/main/README.md#support-for-providers) bumps mininimum supported version of Airflow for all providers to 2.2 and it constitutes a breaking change and major version bump for all providers. * pydocstyle D202 added (apache#24221) * Update provider templates for new Airflow 2.2+ req (apache#24291) I imagine we could update this somewhat programmatically and/or add this update to instructions somewhere. Let me know what you think. * Update package description to remove double min-airflow specification (apache#24292) * Airflow UI fix vulnerabilities - Prototype Pollution (apache#24201) * Mention context variables and logging (apache#24304) * Mention context variables and logging * Fix static checks * Remove limit of presto-python-client version (apache#24305) * Fix langauge override in papermill operator (apache#24301) * Also mention airflow 2 only in readme template (apache#24296) * Fix permission issue for dag that has dot in name (apache#23510) How we determine if a DAG is a subdag in airflow.security.permissions.resource_name_for_dag is not right. If a dag_id contains a dot, the permission is not recorded correctly. The current solution makes a query every time we check for permission for dags that has a dot in the name. Not that I like it but I think it's better than other options I considered such as changing how we name dags for subdag. That's not good in UX. Another option I considered was making a query when parsing, that's not good and it's avoided by passing root_dag to resource_name_for_dag Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> * Check bag DAG schedule_interval match tiemtable (apache#23113) This guards against the DAG's timetable or schedule_interval from being changed after it's created. Validation is done by creating a timetable and check its summary matches schedule_interval. The logic is not bullet-proof, especially if a custom timetable does not provide a useful summary. But this is the best we can do. * fix: patches apache#24215. Won't raise KeyError when 'create_job_kwargs' contains the 'Command' key. (apache#24308) * Fix D202 issue (apache#24322) * Check for run_id for grid group summaries (apache#24327) * Workaround job race bug on biguery to gcs transfer (apache#24330) Fixes: apache#24277 * Update release notes for RC2 release of Providers for May 2022 (apache#24307) Also updates links to example dags to work properly following apache#24331 * feat(README): 커스텀 리드미를 추가한다 (#1) * feat(README): 커스텀 리드미를 추가한다 * fix(README): 원본 readme 위에 커스텀 readme 내용을 추가하도록 수정한다 Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> Co-authored-by: Ash Berlin-Taylor <ash@apache.org> Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com> Co-authored-by: Guilherme Martins Crocetti <24530683+gmcrocetti@users.noreply.github.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Dmytro Kazanzhy <dkazanzhy@gmail.com> Co-authored-by: pankajastro <98807258+pankajastro@users.noreply.github.com> Co-authored-by: 서재권(Data Platform) <90180644+jaegwonseo@users.noreply.github.com> Co-authored-by: Sandeep <sandeep.kadyan@gmail.com> Co-authored-by: Sandeep Kadyan <sandeep.kadyan@publicissapient.com> Co-authored-by: Eugene Karimov <13220923+eskarimov@users.noreply.github.com> Co-authored-by: Vedant Bhamare <55763604+Dark-Knight11@users.noreply.github.com> Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Co-authored-by: pierrejeambrun <pierrejbrun@gmail.com> Co-authored-by: sanjayp <sanjaypillai11@gmail.com> Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com> Co-authored-by: raphaelauv <raphaelauv@users.noreply.github.com> Co-authored-by: Tzu-ping Chung <tp@astronomer.io> Co-authored-by: Dev232001 <thedevhooda@gmail.com> Co-authored-by: Karthikeyan Singaravelan <tir.karthi@gmail.com> Co-authored-by: Alex Kruchkov <36231027+alexkruc@users.noreply.github.com> Co-authored-by: alexkru <alexkru@wix.com> Co-authored-by: Sumit Maheshwari <msumit@users.noreply.github.com> Co-authored-by: Mark Norman Francis <norm@201created.com> Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Vincent Koc <koconder@users.noreply.github.com> Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com> Co-authored-by: Igor Tavares <igorborgest@gmail.com> Co-authored-by: Marty Jackson <mfjackson2008@gmail.com> Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com> Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> Co-authored-by: Brent Bovenzi <brent.bovenzi@gmail.com> Co-authored-by: mhenc <mhenc@google.com> Co-authored-by: Kengo Seki <sekikn@apache.org> Co-authored-by: John Green <nhojjohn@users.noreply.github.com> Co-authored-by: David Skoda <dskoda1@binghamton.edu> Co-authored-by: Edith Puclla <58795858+edithturn@users.noreply.github.com> Co-authored-by: Łukasz Wyszomirski <wyszomirski@google.com> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com> Co-authored-by: Hubert Pietroń <94397721+hubert-pietron@users.noreply.github.com> Co-authored-by: Bernardo Couto <35502483+bernardocouto@users.noreply.github.com> Co-authored-by: viktorvia <86823020+viktorvia@users.noreply.github.com> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Co-authored-by: henriqueribeiro <henriqueribeiro@users.noreply.github.com> Co-authored-by: Wojciech Januszek <wjanuszek@sigma.ug.edu.pl> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ishiis <ishii.shunichi@gmail.com> Co-authored-by: Chenglong Yan <alanx.yan@gmail.com> Co-authored-by: François de Metz <francois@2metz.fr> Co-authored-by: Paul Williams <pdw@udel.edu> Co-authored-by: D. Ferruzzi <ferruzzi@amazon.com> Co-authored-by: James Timmins <james@astronomer.io> Co-authored-by: Niko <onikolas@amazon.com> Co-authored-by: chethanuk-plutoflume <chethanuk@outlook.com> Co-authored-by: DataFusion4All <101581331+DataFusion4All@users.noreply.github.com> Co-authored-by: chethanuk-plutoflume <chethan.umesha@tessian.com> Co-authored-by: Maksim <maksimy@google.com> Co-authored-by: Wojciech Januszek <januszek@google.com> Co-authored-by: Paul Williams <pauldalewilliams@gmail.com> Co-authored-by: Tanel Kiis <tanelk@users.noreply.github.com> Co-authored-by: Bowrna <mailbowrna@gmail.com> Co-authored-by: Bartłomiej Hirsz <bartek.hirsz@gmail.com> Co-authored-by: Bartlomiej Hirsz <bartomiejh@google.com> Co-authored-by: Jonathan Simon Prates <jonathan.simonprates@gmail.com> Co-authored-by: Rafael Carrasco <rafacarrasco07@gmail.com> Co-authored-by: Ping Zhang <pingzh@umich.edu> Co-authored-by: GitStart-AirFlow <101595287+gitstart-airflow@users.noreply.github.com> Co-authored-by: akakakakakaa <akstn3023@naver.com> Co-authored-by: Maria Sumedre <maria.sumedre@3pillarglobal.com> Co-authored-by: Elize Papineau <elizepapineau@gmail.com> Co-authored-by: peter-volkov <peter.r.volkov@yandex.ru> Co-authored-by: Hank Ehly <henry.ehly@gmail.com> Co-authored-by: Malthe Borch <mborch@gmail.com> Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com> Co-authored-by: socar-dini <89070514+socar-dini@users.noreply.github.com>

Fixes: #23639 (cherry picked from commit d86ae09)

potiuk · 2022-07-07T23:15:27Z

@humit0 @andrewdanks @socar-humprey - we have a 2.3.3rc3 (see #24863) that is supposed to handle the Trigger deadlocks - is it possible that you test this version to see if the deadlocks has been fixe with it ? That would be great to get confirmation.

humit0 added area:core kind:bug This is a clearly a bug labels May 11, 2022

eladkal added affected_version:2.2 Issues Reported for 2.2 area:async-operators AIP-40: Deferrable ("Async") Operators labels May 12, 2022

potiuk added a commit to potiuk/airflow that referenced this issue Jun 1, 2022

Handle occasional deadlocks in trigger with retries

3e97d17

Fixes: apache#23639

potiuk mentioned this issue Jun 1, 2022

Handle occasional deadlocks in trigger with retries #24071

Merged

potiuk closed this as completed in #24071 Jun 1, 2022

potiuk added a commit that referenced this issue Jun 1, 2022

Handle occasional deadlocks in trigger with retries (#24071)

d86ae09

Fixes: #23639

ephraimbuddy pushed a commit that referenced this issue Jul 5, 2022

Handle occasional deadlocks in trigger with retries (#24071)

4284d03

Fixes: #23639 (cherry picked from commit d86ae09)

ephraimbuddy pushed a commit that referenced this issue Jul 5, 2022

Handle occasional deadlocks in trigger with retries (#24071)

c76b324

Fixes: #23639 (cherry picked from commit d86ae09)

ephraimbuddy mentioned this issue Jul 6, 2022

Status of testing of Apache Airflow 2.3.3rc3 #24863

Closed

74 tasks

NickYadance mentioned this issue Oct 12, 2022

Trigger die with DB deadlock between scheduler #27000

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triggerer process die with DB Deadlock #23639

Triggerer process die with DB Deadlock #23639

humit0 commented May 11, 2022

potiuk commented Jul 7, 2022

Triggerer process die with DB Deadlock #23639

Triggerer process die with DB Deadlock #23639

Comments

humit0 commented May 11, 2022

Apache Airflow version

What happened

What you think should happen instead

How to reproduce

Operating System

Versions of Apache Airflow Providers

Deployment

Deployment details

Anything else

Are you willing to submit PR?

Code of Conduct

potiuk commented Jul 7, 2022