-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In MsSqlHook, SQLAlchemy engine scheme is overriden by the change in #40669 #42664
Comments
Weird, because we are also using MSSQL with ODBC and are using the get_sqlalchemy_engine() method as well and having the same connection config as you posted without issues. |
Are you using |
This error is weird, but now I understand why, what you are doing is wrong, and in the past this wouldn't have caused any issues, but now it is. Don't do this as you show above:
But do this instead, as this will work, and is also better way of doing it:
I will explain why your example fails now. As you stated, you use MSSQL through ODBC connection (and thus ODBC driver), but in your example you use the MsSqlHook instead of OdbcHook, so here you try to us an ODBC connection with a PyMSSQL based hook (e.g. MsSqlHook), which of course won't work. It could have worked in the past but then you were lucky, now it won't as the get_sqlalchemy_engine method is now a generic method in DbApiHook which isn't overridden anymore in MsSqlHook. Important:: It is always a good practice not to instantiate the specialized hook yourself, it's better to use the DbApiHook.get_hook(conn_id="conn_id") classmethod, as this will return you the specialized hook depending on the connection type of the connection id you passed. Depending of the connection type defined in the connection, Airflow will be smart enough to return you the correct Hook. So please try my suggestion and let me know. |
Yes, I also defined the "sqlalchemy_scheme": "mssql+pyodbc" in the extra of the connection as you showed above, the configuration of your connection is correct, but the instantiation of the hook isn't. Please check my answer above. |
Tried this: from airflow.providers.common.sql.hooks.sql import DbApiHook
hook = DbApiHook.get_hook(conn_id='conn1')
engine = hook.get_sqlalchemy_engine()
engine.connect() It results in the same error as above: While I understand As I don't use any other databases, I'm not entirely sure about the benefits of passing the connection to the engine with the def get_conn(self):
"""Return a connection object."""
db = self.get_connection(self.get_conn_id())
return self.connector.connect(host=db.host, port=db.port, username=db.login, schema=db.schema) From SQLAlchemy The resulting engine is a weird object with a pyodbc driver and dialect, but with a pymssql connection underlying. |
Ok now I'm confused... Are you trying to access MSSQL through ODBC or PyMSQL? Because in your initial post you stated ODBC, but now you're referring to PyMSSQL? If you want to connect through PDBC, make sure the connection type is ODBC and not MSSQL which will use pymssql underneath. |
This config is also contradicting:
Connection type is mssql (so pymssql), but in your extras you specify you want to use the ODBC driver and "mssql+pyodbc" sqlalchemy_scheme while the connection type is mssql which is actually pymssql, not ODBC. Try changing the connection type to ODBC. |
I'm trying to access MSSQL through Microsoft ODBC driver, using the It should be possible to change the driver in the MSSQL connections without changing the connection type to ODBC. See MSSQL provider docs: This was always possible, and according to documentation should still be an available option, but this change breaks existing connections that use a different driver. |
As far as I know, you should always use ODBC as connection type when using ODBC connections, not using a native connection type and then still override it as ODBC. @potiuk what do you think? Or I'm seeing/understood this wrongly? This is also the reason why I'm working on this PR, so you could have notion of dialects (used by insert_rows method in DbApiHook) that can be used across different connection types pointing to the same database, so the dialect which would generate the replace statement (e.g. MERGE INTO for MSSQL), would work as well for PyMSSQL as ODBC with MSSQL, as the dialect wouldn't be tied to the connection type, which would also not be feasible for a generic connection type like ODBC, as ODBCHook isn't database aware, the driver is while MsSqlHook is specific to MSSQL. At the moment the replace statement is only supported in native MsSqlHook, but this PR will make it work independently of which connection type you use (native/odbc). |
Think about this: I modified the connection to remove the "contradiction": {
"conn_type": "mssql",
"host": "<redacted>",
"login": "user",
"password": "pass",
"schema": "master",
"port": 1433,
"extra": {
"driver": "ODBC Driver 18 for SQL Server",
"encrypt": "yes"
}
} Creating an MSSQL hook, overriding the sqlalchemy_scheme in the hook instantiation instead, which is an allowed option according to the documentation: from airflow.providers.microsoft.mssql.hooks.mssql import MsSqlHook
hook = MsSqlHook(mssql_conn_id='conn1', sqlalchemy_scheme='mssql+pyodbc')
engine = hook.get_sqlalchemy_engine()
engine.connect() ^ This is not working as well, which means this change is a breaking one. Using pymssql as an internal connection is fine, but it should respect the choice of If this is the way MSSQL connections in Airflow will move forward, and will not allow any other driver to be used than |
It should actually be like this:
This is what we are using and works. |
I've digged a bit deeper regarding following line of code:
This was to support JDBC connections with sqlalchemy, I could remove it from the DbApiHook and override it in the JdbcHook, then you would still be able to use it as you intended to. |
This works, yes (Didn't test further than engine connection, but at least no failure, and uses correct driver and connection type). But it doesn't change the fact that it should be working for mssql connection type as well, and currently it's not.
I believe that would be the correct approach. This JDBC specific change should not break existing connection types. |
Indeed, fully agree on the part that it should only be passed there where needed (e.g. JdbcHook). |
PR succeeded, so it could be merged. |
Apache Airflow Provider(s)
common-sql
Versions of Apache Airflow Providers
apache-airflow-providers-common-sql==1.16.0
apache-airflow-providers-microsoft-mssql==3.9.0
apache-airflow-providers-odbc==4.7.0
Apache Airflow version
2.10.2
Operating System
RHEL 8.10
Deployment
Virtualenv installation
Deployment details
No response
What happened
We're using mssql+pyodbc connections in airflow to connect to our MS SQL Server. We're creating a hook using
MsSqlHook
and using.get_sqlalchemy_engine()
to get an sql alchemy engine object to pass along our database tasks.Prior to apache-airflow-providers-common-sql 1.15.0, this was working as expected (last known working version is 1.14.1, please see below for a working example)
With a change introduced in 1.15.0 (#40669),
.get_sqlalchemy_engine()
passes thecreator
argument with the valueself.get_conn
to the.create_engine()
function, which overrides the engine creation with the default connection scheme, which usespymssql
, even though we specifically define the scheme aspyodbc
in the connection, and in the hook.Resulting engine object becomes a weird combination of two, with a
mssql+pyodbc
scheme in the connection uri, but tries to connect to using thepymssql
dialect internally, which results in fatal errors:As if it's trying to make a
pymssql
connection, but with apyodbc
URL.What you think should happen instead
The connection should work, as it does prior to the change.
If we comment out / delete the line
engine_kwargs["creator"] = self.get_conn
inairflow/providers/common/sql/hooks/sql.py
(f6c7388#diff-6e1b2f961cb951d05d66d2d814ef5f6d8f8bf8f43c40fb5d40e27a031fed8dd7R246), connections works as expected.How to reproduce
This is the connection that's used (values changed for privacy)
Basically use any MSSQL connection with a
mssql+pyodbc
schemeTry to connect through a hook and sqlalchemy engine:
This fails on > 1.15.0
Works on <= 1.14.1
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: