Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running airflow db init raises OperationalError on SQL: ALTER TABLE xcom DROP COLUMN id #18457

Closed
2 tasks done
leonarduschen opened this issue Sep 23, 2021 · 12 comments
Closed
2 tasks done
Labels
affected_version:2.1 Issues Reported for 2.1 area:core area:MetaDB Meta Database related issues. kind:bug This is a clearly a bug pending-response stale Stale PRs per the .github/workflows/stale.yml policy file

Comments

@leonarduschen
Copy link

Apache Airflow version

2.1.3

Operating System

Pop!_OS 21.04 x86_64

Versions of Apache Airflow Providers

apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-http==2.0.1
apache-airflow-providers-imap==2.0.1
apache-airflow-providers-sqlite==2.0.1

Deployment

Other

Deployment details

Using venv on Python 3.9.5

# pip freeze
alembic==1.7.1
anyio==3.3.0
apache-airflow==2.1.3
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-http==2.0.1
apache-airflow-providers-imap==2.0.1
apache-airflow-providers-sqlite==2.0.1
apispec==3.3.2
argcomplete==1.12.3
attrs==20.3.0
autopep8==1.5.7
Babel==2.9.1
blinker==1.4
cattrs==1.5.0
certifi==2021.5.30
cffi==1.14.6
charset-normalizer==2.0.4
click==7.1.2
clickclick==20.10.2
colorama==0.4.4
colorlog==6.4.1
commonmark==0.9.1
croniter==1.0.15
cryptography==3.4.8
cycler==0.10.0
defusedxml==0.7.1
dill==0.3.4
dnspython==2.1.0
docutils==0.16
email-validator==1.1.3
flake8==3.9.2
Flask==1.1.4
Flask-AppBuilder==3.3.2
Flask-Babel==1.0.0
Flask-Caching==1.10.1
Flask-JWT-Extended==3.25.1
Flask-Login==0.4.1
Flask-OpenID==1.2.5
Flask-SQLAlchemy==2.5.1
Flask-WTF==0.14.3
graphviz==0.17
greenlet==1.1.1
gunicorn==20.1.0
h11==0.12.0
httpcore==0.13.6
httpx==0.19.0
idna==3.2
importlib-resources==1.5.0
inflection==0.5.1
iniconfig==1.1.1
iso8601==0.1.16
isodate==0.6.0
itsdangerous==1.1.0
Jinja2==2.11.3
jsonschema==3.2.0
kiwisolver==1.3.2
lazy-object-proxy==1.6.0
lockfile==0.12.2
Mako==1.1.5
Markdown==3.3.4
MarkupSafe==1.1.1
marshmallow==3.13.0
marshmallow-enum==1.5.1
marshmallow-oneofschema==3.0.1
marshmallow-sqlalchemy==0.23.1
matplotlib==3.4.3
mccabe==0.6.1
mypy==0.910
mypy-extensions==0.4.3
mysqlclient==2.0.3
numpy==1.21.2
openapi-schema-validator==0.1.5
openapi-spec-validator==0.3.1
packaging==21.0
pandas==1.3.2
pendulum==2.1.2
Pillow==8.3.2
pluggy==1.0.0
prison==0.2.1
psutil==5.8.0
py==1.10.0
pycodestyle==2.7.0
pycoingecko==2.2.0
pycparser==2.20
pyflakes==2.3.1
Pygments==2.10.0
PyJWT==1.7.1
pyparsing==2.4.7
pyrsistent==0.18.0
pytest==6.2.5
python-daemon==2.3.0
python-dateutil==2.8.2
python-nvd3==0.15.0
python-slugify==4.0.1
python3-openid==3.2.0
pytz==2021.1
pytzdata==2020.1
PyYAML==5.4.1
requests==2.26.0
rfc3986==1.5.0
rich==10.9.0
setproctitle==1.2.2
six==1.16.0
sniffio==1.2.0
SQLAlchemy==1.3.24
SQLAlchemy-JSONField==1.0.0
SQLAlchemy-Utils==0.37.8
swagger-ui-bundle==0.0.8
tabulate==0.8.9
tenacity==6.2.0
termcolor==1.1.0
text-unidecode==1.3
toml==0.10.2
types-docutils==0.17.0
typing-extensions==3.10.0.2
unicodecsv==0.14.1
urllib3==1.26.6
Werkzeug==1.0.1
WTForms==2.3.3

What happened

Running airflow db init gives

DB: mysql+mysqldb://<USERNAME>:***@<ENDPOINT>:<PORT>/airflow_db
[2021-09-23 13:13:07,806] {db.py:702} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl MySQLImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade cf5dc11e79ad -> bbf4a7ad0465, Remove id column from xcom
Traceback (most recent call last):
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/MySQLdb/connections.py", line 259, in query
    _mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (3750, "Unable to create or change a table without a primary key, when the system variable 'sql_require_primary_key' is set. Add a primary key to the table or unset this variable to avoid this message. Note that tables without a primary key can cause performance problems in row-based replication, so please consult your DBA before changing this setting.")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/leonarduschen/src/tracker/venv/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/airflow/cli/commands/db_command.py", line 31, in initdb
    db.initdb()
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/airflow/utils/db.py", line 573, in initdb
    upgradedb()
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/airflow/utils/db.py", line 712, in upgradedb
    command.upgrade(config, 'heads')
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/alembic/command.py", line 320, in upgrade
    script.run_env()
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/alembic/script/base.py", line 563, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/alembic/util/pyfiles.py", line 92, in load_python_file
    module = load_module_py(module_id, path)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/alembic/util/pyfiles.py", line 108, in load_module_py
    spec.loader.exec_module(module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 855, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/airflow/migrations/env.py", line 115, in <module>
    run_migrations_online()
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/airflow/migrations/env.py", line 106, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/alembic/runtime/environment.py", line 851, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/alembic/runtime/migration.py", line 612, in run_migrations
    step.migration_fn(**kw)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/airflow/migrations/versions/bbf4a7ad0465_remove_id_column_from_xcom.py", line 48, in upgrade
    bop.create_primary_key('pk_xcom', ['dag_id', 'task_id', 'key', 'execution_date'])
  File "/usr/lib/python3.9/contextlib.py", line 124, in __exit__
    next(self.gen)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/alembic/operations/base.py", line 369, in batch_alter_table
    impl.flush()
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/alembic/operations/batch.py", line 107, in flush
    fn(*arg, **kw)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/alembic/ddl/impl.py", line 333, in drop_column
    self._exec(base.DropColumn(table_name, column, schema=schema))
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/alembic/ddl/impl.py", line 197, in _exec
    return conn.execute(construct, multiparams)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/sqlalchemy/sql/ddl.py", line 72, in _execute_on_connection
    return connection._execute_ddl(self, multiparams, params)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1068, in _execute_ddl
    ret = self._execute_context(
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/home/leonarduschen/src/tracker/venv/lib/python3.9/site-packages/MySQLdb/connections.py", line 259, in query
    _mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (3750, "Unable to create or change a table without a primary key, when the system variable 'sql_require_primary_key' is set. Add a primary key to the table or unset this variable to avoid this message. Note that tables without a primary key can cause performance problems in row-based replication, so please consult your DBA before changing this setting.")
[SQL: ALTER TABLE xcom DROP COLUMN id]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

What you expected to happen

Initialization succeeds

How to reproduce

Use MySQL version 8

Set env variables

export AIRFLOW__CORE__SQL_ALCHEMY_CONN="<CONNECTION_STRING>"
export AIRFLOW__CORE__LOAD_EXAMPLES="False"

Create database

CREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
CREATE USER 'airflow_user' IDENTIFIED BY 'airflow_pass';
GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user';

Setup my.cnf

!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/mysql.conf.d/

[mysqld]
explicit_defaults_for_timestamp=1

Run airflow db init

Anything else

Every time

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@leonarduschen leonarduschen added area:core kind:bug This is a clearly a bug labels Sep 23, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 23, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@BasPH
Copy link
Contributor

BasPH commented Sep 23, 2021

As the error says "Unable to create or change a table without a primary key, when the system variable 'sql_require_primary_key' is set". This setting was apparently introduced in MySQL 8.0.13 and is OFF by default, so I guess you have enabled it.

The XCom table does include a primary key (dag_id, task_id, key, execution_date), but the migration script first removes the (old) id column, which fails in combination with sql_require_primary_key=ON. Is it possible to disable it temporarily, and enable afterwards?

On the Airflow side, would you be willing to add a check for this setting and/or add this information to the docs?

@leonarduschen
Copy link
Author

leonarduschen commented Sep 23, 2021

It works, thanks!

But here's the thing, I was using a managed database from DigitalOcean where the only way to change the config sql_require_primary_key permanently is to reach out their customer support and ask them to change it

If I change it on my own like so:

SET sql_require_primary_key=OFF

It will reset back to ON for new connections - so there is no way to set the variable to OFF when running airflow db init


So what I ended up doing is:

  • Run airflow db init and let it fail
  • Login and run
SET sql_require_primary_key=OFF;
ALTER TABLE xcom DROP COLUMN id;
  • Run airflow db init again

It works but it feels hacky to me


Maybe on the Airflow side, we can check if the setting is ON, and then if it is ON just set it to OFF right before we run ALTER TABLE xcom DROP COLUMN id - and then set it to ON again? I'd be happy to make a PR

@BasPH
Copy link
Contributor

BasPH commented Sep 23, 2021

Seems like more people bumped into this problem: 1 and 2. This post suggests you can set SET SESSION sql_require_primary_key=0 to disable it within a session, but I would want to properly test that first.

@potiuk
Copy link
Member

potiuk commented Sep 23, 2021

Seems like more people bumped into this problem: 1 and 2. This post suggests you can set SET SESSION sql_require_primary_key=0 to disable it within a session, but I would want to properly test that first.

Yeah. Good one @BasPH -> I am all for "no user/support involvement" if needed. We already do a couple of things automatically to deal with MySQL and MsSQL "configurability":

  • for MySQL we automatically set "utf8-mb3" encoding for IDs (this is the primary reason why the DB works at all when you have utf8mb4 set - because of index size limits. Here we "transparently" fix that for the user (we have configuration parameter for that but it turned out that not everyone reads documentation so now IDs are automatically set to utf8_mb3 ;)
  • for MsSQL we check if READ_COMMITTED_SNAPSHOT is enabled and fail if not (because it can lead to deadlocks if not set)

In this case I think just enabling/disabling it for this migration only would be best solution (and we can actually test it very easily - we could set that option in https://github.com/apache/airflow/blob/main/scripts/ci/mysql/conf.d/airflow.cnf (providing that it is settable there) and our MySQL tests would create a db with sql_require_primary_key=ON

@leonarduschen
Copy link
Author

Seems like more people bumped into this problem: 1 and 2. This post suggests you can set SET SESSION sql_require_primary_key=0 to disable it within a session, but I would want to properly test that first.

Turns out SET SESSION <variable>=<value> and SET <variable>=<value> do the same thing; both keeps variable within the session/ According to https://dev.mysql.com/doc/refman/8.0/en/set-variable.html all the following do the same thing

SET SESSION sql_mode = 'TRADITIONAL';
SET LOCAL sql_mode = 'TRADITIONAL';
SET @@SESSION.sql_mode = 'TRADITIONAL';
SET @@LOCAL.sql_mode = 'TRADITIONAL';
SET @@sql_mode = 'TRADITIONAL';
SET sql_mode = 'TRADITIONAL';

... we could set that option in https://github.com/apache/airflow/blob/main/scripts/ci/mysql/conf.d/airflow.cnf (providing that it is settable there) and our MySQL tests would create a db with sql_require_primary_key=ON

@potiuk I'll go ahead and give it try

@potiuk
Copy link
Member

potiuk commented Sep 23, 2021

@potiuk I'll go ahead and give it try

Just a hint. One good thing to try is to change the setting in MySQL config and see it fail the migration before fixing :). Just to be sure that the setting is effective.

You can very easily test it locally. Just ./breeze stop; ./breeze --db-reset --backend mysql should do the trick - it will wipe out all the local volumes with db and recreate the DB with all migrations

@eladkal
Copy link
Contributor

eladkal commented Dec 19, 2021

Is there an issue here to address? it looks more like a question/answer thread :)

@eladkal eladkal added area:MetaDB Meta Database related issues. pending-response affected_version:2.1 Issues Reported for 2.1 labels Dec 19, 2021
@leonarduschen
Copy link
Author

Ah sorry, completely forgot about it

@eladkal the issue is that the migration doesn't work when the config sql_require_primary_key is set to ON - and some managed database provider (in this case DigitalOcean) doesn't allow changing that config. So what I ended up doing is this #18457 (comment) - which is not really a solution in my opinion

I think this is still a issue, I'll check again and try to make a PR in the next few days if it still is

@potiuk
Copy link
Member

potiuk commented Dec 27, 2021

@leonarduschen - that would be great if you can come back to it.

And the solution with temporary switching off the require_primary_key (when in MYSQL) is actually the only one I can see works (and it is not really hacky - the settings is not "database" setting really. This is runtime in-session check that MySQL adds and it's perfectly OK to disable it if you know what you are doing.

Other than recreating tables with SELECT FROM AS, there is not much you can do. in this case.

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 27, 2022
@github-actions
Copy link

github-actions bot commented Feb 3, 2022

This issue has been closed because it has not received response from the issue author.

@github-actions github-actions bot closed this as completed Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.1 Issues Reported for 2.1 area:core area:MetaDB Meta Database related issues. kind:bug This is a clearly a bug pending-response stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

No branches or pull requests

4 participants