Open
Description
I followed the instructions on this page to create a SQLAlchemy engine and used it with the Pandas to_sql()
method. It's taking around 2 seconds to append one data point to a Delta table in Azure Databricks, and it seems to be scaling linearly. A dataframe containing 1 column and 10 rows is taking ~20 seconds to push to Azure.
Is there a way to make writing back to Databricks from a local machine faster?
Example code:
from sqlalchemy import create_engine
import pandas as pd
import os
server = 'my_server.azuredatabricks.net'
http_path = "/sql/1.0/warehouses/my_warehouse"
access_token = "MY_TOKEN"
catalog = "my_catalog"
schema = "my_schema"
if "NO_PROXY" in os.environ:
os.environ["NO_PROXY"] = os.environ["NO_PROXY"] + "," + server
else:
os.environ["NO_PROXY"] = server
if "no_proxy" in os.environ:
os.environ["no_proxy"] = os.environ["no_proxy"] + "," + server
else:
os.environ["no_proxy"] = server
engine = create_engine(f"databricks://token:{access_token}@{server}?" +
f"http_path={http_path}&catalog={catalog}&schema={schema}"
)
df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3', User 4', 'User 5', 'User 6', User 7', 'User 8', 'User 9', 'User 10']})
# The next command takes around 20 seconds to complete
df.to_sql(name='my_test_table', con=engine, if_exists='append', index=False)
Local specs:
databricks-sql-connector==3.4.0
pandas==2.2.2
Python 3.10.14
Azure specs:
Databricks SQL warehouse cluster
Runtime==13.3 LTS
Metadata
Metadata
Assignees
Labels
No labels