-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
closing connection chunks in DbApiHook.get_pandas_df #22947
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
Below is a test case explaining the problem. The https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html#pandas-read-sql
def test_get_pandas_df_chunksize(self):
import sqlite3
class UnitTestSqliteHook(SqliteHook):
conn_name_attr = 'test_conn_id'
log = mock.MagicMock()
def setup_table(self):
self.conn = sqlite3.connect(":memory:")
cursor = self.conn.cursor()
cursor.execute("create table users(id int, name text)")
cursor.execute("insert into users(id, name) values(1, 'a')")
cursor.close()
def get_conn(self):
self.setup_table()
return self.conn
self.db_hook = UnitTestSqliteHook()
statement = 'select * from users'
df = list(self.db_hook.get_pandas_df(statement, chunksize=1))
assert df[0].columns[0] == 'id'
assert df[0].values.tolist()[0][0] == 1
assert df[0].values.tolist()[0][1] == 'a'
|
Why don't you attempt to fix it ? |
If @bauerfranz is not interested in fixing this bug i would want to work on this |
assigned you |
Apache Airflow version
2.2.5 (latest released)
What happened
Hi all,
Please be patient with me, it's my first Bugreport in git at all :)
Affected function: DbApiHook.get_pandas_df
Short description: If I use DbApiHook.get_pandas_df with parameter "chunksize" the connection is lost
Error description
I tried using the DbApiHook.get_pandas_df function instead of pandas.read_sql. Without the parameter "chunksize" both functions work the same. But as soon as I add the parameter chunksize to get_pandas_df, I lose the connection in the first iteration. This happens both when querying Oracle and Mysql (Mariadb) databases.
During my research I found a comment on a closed issue that describes the same -> #8468
My Airflow version: 2.2.5
I think it's something to do with the "with closing" argument, because when I remove that argument, the chunksize argument was working.
What you think should happen instead
It should give me a chunk of DataFrame
How to reproduce
not working
works
works
Operating System
MacOS Monetäre
Versions of Apache Airflow Providers
apache-airflow 2.2.5
apache-airflow-providers-ftp 2.1.2
apache-airflow-providers-http 2.1.2
apache-airflow-providers-imap 2.2.3
apache-airflow-providers-microsoft-mssql 2.1.3
apache-airflow-providers-mongo 2.3.3
apache-airflow-providers-mysql 2.2.3
apache-airflow-providers-oracle 2.2.3
apache-airflow-providers-salesforce 3.4.3
apache-airflow-providers-sftp 2.5.2
apache-airflow-providers-sqlite 2.1.3
apache-airflow-providers-ssh 2.4.3
Deployment
Virtualenv installation
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: