-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
databricks.sql.exc.RequestError
when inserting more than 255 cells via pandas.to_sql
.
#300
Comments
Possibly related to how the Databricks SQLAlchemy dialect constructs queries, where |
Thanks for your question. As of databricks-sql-connector==3.0.0 there is a major difference to how parameters are handled. This change (and the limitation of 255 parameters per query) is documented extensively here.
This is pretty close. SQLAlchemy hasn't changed its approach, but the connector has. There are now two ways that the connector can send parameters: native and inline. Native is enabled by default and currently there is a limit at the server of no more than 255 parameters per query. While the older inline approach doesn't have this limitation, as documented here, the SQLAlchemy dialect in connector version 3 and above will only work with the native approach. Unfortunately, the only workaround for this is to modify your |
Thank you for the answer.
Is the limit on the number of parameters a limitation on the Databricks SQL endpoint side? Do you happen to know of any public information or information you can share (e.g. ETA) on this? |
Yes the limitation is at the SQL warehouse. Still waiting to hear internally when this will be increased. |
Hi Team. I work at Procore Technologies and am coming across the same issue. I found that I had to set my particular query to a chunksize of 28 rows, which now makes sense due to the 255 cell limit. This dramatically slows down the process of writing to Databricks, to the point where our intended use case may not work. I also haven't seen anywhere in the docs where it shows examples of creating a SQL Alchemy Engine and using pd.to_sql / read_sql. Our code looks just like the code at the beginning of this thread, and it would be great if it was clearly documented somewhere how to use pd.to_sql / read_sql. Alternately, it would be great if Databricks provided an optimized version of these functions with the connector. As an example, Snowflake offers the following functions with their connector: fetch_pandas_all, fetch_bandas_batches, write_pandas. We'd like to use SQLAlchemy 2.0+, which I believe requires that we use the Databricks-sql-connector 3.0+, so here's a +1 to sorting this out quickly. Thanks! |
It's part of the sqlalchemy documentation in this repository: https://github.com/databricks/databricks-sql-python/blob/main/src/databricks/sqlalchemy/README.sqlalchemy.md#usage-with-pandas
Thanks for registering your support for this. Can you send an email to databricks-sql-connector-maintainers@databricks.com so we can follow-up with you? |
Thanks for the link to the docs! I would recommend you link to that page on the main docs page here, as I assumed they would include everything I needed to know. I'll shoot an email to that address. |
Those docs are being updated as we speak :) |
Update for you all: we're re-classifying this as a regression in the latest version of databricks-sql-connector and working to implement a fix that doesn't rely on the server supporting > 255 params. More details soon. |
Hi team, any progress on this? |
Hi, we're also facing the same issue. Could someone please provide an update on this? @susodapop |
Hi @akshay-s-ciq, since I'm no longer maintaining this repository I'd recommend contacting Databricks support directly to get the current status. |
The following example works with
databricks-sql-connector
version2.9.3
, but fails with version3.0.1
:The error is:
OperationalError: (databricks.sql.exc.RequestError) Error during request to server
With setting
chunksize
inrandom_df.to_sql
to any value which is 15 or lower (to make the number of table cells inserted less than 256), the insert runs without issue.What I have tried
random_df.to_sql(table_name, engine, if_exists="replace", index=False, method="multi")
used:-- the observed behavior is the same in all cases.
Versions
Not working:
databricks-sql-connector==3.0.1
,sqlalchemy==2.0.23
,pandas==2.1.4
Working:
databricks-sql-connector==2.9.3
,sqlalchemy==1.4.50
,pandas==2.1.4
The text was updated successfully, but these errors were encountered: