Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySQL Connector split into core and non core part #444

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,3 @@ jobs:
#----------------------------------------------
- name: Run e2e tests
run: poetry run python -m pytest tests/e2e
- name: Run SQL Alchemy tests
run: poetry run python -m pytest src/databricks/sqlalchemy/test_local
78 changes: 78 additions & 0 deletions .github/workflows/publish-manual.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
name: Publish to PyPI Manual [Production]

# Allow manual triggering of the workflow
on:
workflow_dispatch: {}

jobs:
publish:
name: Publish
runs-on: ubuntu-latest

steps:
#----------------------------------------------
# Step 1: Check out the repository code
#----------------------------------------------
- name: Check out repository
uses: actions/checkout@v2 # Check out the repository to access the code

#----------------------------------------------
# Step 2: Set up Python environment
#----------------------------------------------
- name: Set up python
id: setup-python
uses: actions/setup-python@v2
with:
python-version: 3.9 # Specify the Python version to be used

#----------------------------------------------
# Step 3: Install and configure Poetry
#----------------------------------------------
- name: Install Poetry
uses: snok/install-poetry@v1 # Install Poetry, the Python package manager
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true

# #----------------------------------------------
# # Step 4: Load cached virtual environment (if available)
# #----------------------------------------------
# - name: Load cached venv
# id: cached-poetry-dependencies
# uses: actions/cache@v2
# with:
# path: .venv # Path to the virtual environment
# key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ github.event.repository.name }}-${{ hashFiles('**/poetry.lock') }}
# # Cache key is generated based on OS, Python version, repo name, and the `poetry.lock` file hash

# #----------------------------------------------
# # Step 5: Install dependencies if the cache is not found
# #----------------------------------------------
# - name: Install dependencies
# if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true' # Only run if the cache was not hit
# run: poetry install --no-interaction --no-root # Install dependencies without interaction

# #----------------------------------------------
# # Step 6: Update the version to the manually provided version
# #----------------------------------------------
# - name: Update pyproject.toml with the specified version
# run: poetry version ${{ github.event.inputs.version }} # Use the version provided by the user input

#----------------------------------------------
# Step 7: Build and publish the first package to PyPI
#----------------------------------------------
- name: Build and publish databricks sql connector to PyPI
working-directory: ./databricks_sql_connector
run: |
poetry build
poetry publish -u __token__ -p ${{ secrets.PROD_PYPI_TOKEN }} # Publish with PyPI token
#----------------------------------------------
# Step 7: Build and publish the second package to PyPI
#----------------------------------------------

- name: Build and publish databricks sql connector core to PyPI
working-directory: ./databricks_sql_connector_core
run: |
poetry build
poetry publish -u __token__ -p ${{ secrets.PROD_PYPI_TOKEN }} # Publish with PyPI token
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/

# End of https://www.toptal.com/developers/gitignore/api/python,macos

Expand Down
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# Release History

# 4.0.0

- Split the connector into two separate packages: `databricks-sql-connector` and `databricks-sqlalchemy`. The `databricks-sql-connector` package contains the core functionality of the connector, while the `databricks-sqlalchemy` package contains the SQLAlchemy dialect for the connector.
- Pyarrow dependency is now optional in `databricks-sql-connector`. Users needing arrow are supposed to explicitly install pyarrow

# 3.6.0 (2024-10-25)

- Support encryption headers in the cloud fetch request (https://github.com/databricks/databricks-sql-python/pull/460 by @jackyhu-db)

# 3.5.0 (2024-10-18)

- Create a non pyarrow flow to handle small results for the column set (databricks/databricks-sql-python#440 by @jprakash-db)
- Fix: On non-retryable error, ensure PySQL includes useful information in error (databricks/databricks-sql-python#447 by @shivam2680)

# 3.4.0 (2024-08-27)

- Unpin pandas to support v2.2.2 (databricks/databricks-sql-python#416 by @kfollesdal)
- Make OAuth as the default authenticator if no authentication setting is provided (databricks/databricks-sql-python#419 by @jackyhu-db)
- Fix (regression): use SSL options with HTTPS connection pool (databricks/databricks-sql-python#425 by @kravets-levko)

# 3.3.0 (2024-07-18)

- Don't retry requests that fail with HTTP code 401 (databricks/databricks-sql-python#408 by @Hodnebo)
Expand Down
3 changes: 0 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,6 @@ The `PySQLStagingIngestionTestSuite` namespace requires a cluster running DBR ve

The suites marked `[not documented]` require additional configuration which will be documented at a later time.

#### SQLAlchemy dialect tests

See README.tests.md for details.

### Code formatting

Expand Down
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
[![PyPI](https://img.shields.io/pypi/v/databricks-sql-connector?style=flat-square)](https://pypi.org/project/databricks-sql-connector/)
[![Downloads](https://pepy.tech/badge/databricks-sql-connector)](https://pepy.tech/project/databricks-sql-connector)

The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a [SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like `pandas` and `alembic` which use SQLAlchemy to execute DDL. Use `pip install databricks-sql-connector[sqlalchemy]` to install with SQLAlchemy's dependencies. `pip install databricks-sql-connector[alembic]` will install alembic's dependencies.
The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/).

This connector uses Arrow as the data-exchange format, and supports APIs to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time.
This connector uses Arrow as the data-exchange format, and supports APIs (e.g. `fetchmany_arrow`) to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time. [PyArrow](https://arrow.apache.org/docs/python/index.html) is required to enable this and use these APIs, you can install it via `pip install pyarrow` or `pip install databricks-sql-connector[pyarrow]`.

You are welcome to file an issue here for general use cases. You can also contact Databricks Support [here](help.databricks.com).

Expand All @@ -22,7 +22,12 @@ For the latest documentation, see

## Quickstart

Install the library with `pip install databricks-sql-connector`
### Installing the core library
Install using `pip install databricks-sql-connector`

### Installing the core library with PyArrow
Install using `pip install databricks-sql-connector[pyarrow]`


```bash
export DATABRICKS_HOST=********.databricks.com
Expand Down Expand Up @@ -60,6 +65,18 @@ or to a Databricks Runtime interactive cluster (e.g. /sql/protocolv1/o/123456789
> to authenticate the target Databricks user account and needs to open the browser for authentication. So it
> can only run on the user's machine.

## SQLAlchemy
Starting from `databricks-sql-connector` version 4.0.0 SQLAlchemy support has been extracted to a new library `databricks-sqlalchemy`.

- Github repository [databricks-sqlalchemy github](https://github.com/databricks/databricks-sqlalchemy)
- PyPI [databricks-sqlalchemy pypi](https://pypi.org/project/databricks-sqlalchemy/)

### Quick SQLAlchemy guide
Users can now choose between using the SQLAlchemy v1 or SQLAlchemy v2 dialects with the connector core

- Install the latest SQLAlchemy v1 using `pip install databricks-sqlalchemy~=1.0`
- Install SQLAlchemy v2 using `pip install databricks-sqlalchemy`


## Contributing

Expand Down
22 changes: 13 additions & 9 deletions examples/custom_cred_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,27 @@
from databricks.sdk.oauth import OAuthClient
import os

oauth_client = OAuthClient(host=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
client_id=os.getenv("DATABRICKS_CLIENT_ID"),
client_secret=os.getenv("DATABRICKS_CLIENT_SECRET"),
redirect_url=os.getenv("APP_REDIRECT_URL"),
scopes=['all-apis', 'offline_access'])
oauth_client = OAuthClient(
host=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
client_id=os.getenv("DATABRICKS_CLIENT_ID"),
client_secret=os.getenv("DATABRICKS_CLIENT_SECRET"),
redirect_url=os.getenv("APP_REDIRECT_URL"),
scopes=["all-apis", "offline_access"],
)

consent = oauth_client.initiate_consent()

creds = consent.launch_external_browser()

with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
credentials_provider=creds) as connection:
with sql.connect(
server_hostname=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path=os.getenv("DATABRICKS_HTTP_PATH"),
credentials_provider=creds,
) as connection:

for x in range(1, 5):
cursor = connection.cursor()
cursor.execute('SELECT 1+1')
cursor.execute("SELECT 1+1")
result = cursor.fetchall()
for row in result:
print(row)
Expand Down
26 changes: 14 additions & 12 deletions examples/insert_data.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,23 @@
from databricks import sql
import os

with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
access_token = os.getenv("DATABRICKS_TOKEN")) as connection:
with sql.connect(
server_hostname=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path=os.getenv("DATABRICKS_HTTP_PATH"),
access_token=os.getenv("DATABRICKS_TOKEN"),
) as connection:

with connection.cursor() as cursor:
cursor.execute("CREATE TABLE IF NOT EXISTS squares (x int, x_squared int)")
with connection.cursor() as cursor:
cursor.execute("CREATE TABLE IF NOT EXISTS squares (x int, x_squared int)")

squares = [(i, i * i) for i in range(100)]
values = ",".join([f"({x}, {y})" for (x, y) in squares])
squares = [(i, i * i) for i in range(100)]
values = ",".join([f"({x}, {y})" for (x, y) in squares])

cursor.execute(f"INSERT INTO squares VALUES {values}")
cursor.execute(f"INSERT INTO squares VALUES {values}")

cursor.execute("SELECT * FROM squares LIMIT 10")
cursor.execute("SELECT * FROM squares LIMIT 10")

result = cursor.fetchall()
result = cursor.fetchall()

for row in result:
print(row)
for row in result:
print(row)
8 changes: 5 additions & 3 deletions examples/interactive_oauth.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,14 @@
token across script executions.
"""

with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path = os.getenv("DATABRICKS_HTTP_PATH")) as connection:
with sql.connect(
server_hostname=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path=os.getenv("DATABRICKS_HTTP_PATH"),
) as connection:

for x in range(1, 100):
cursor = connection.cursor()
cursor.execute('SELECT 1+1')
cursor.execute("SELECT 1+1")
result = cursor.fetchall()
for row in result:
print(row)
Expand Down
12 changes: 7 additions & 5 deletions examples/m2m_oauth.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,19 @@ def credential_provider():
# Service Principal UUID
client_id=os.getenv("DATABRICKS_CLIENT_ID"),
# Service Principal Secret
client_secret=os.getenv("DATABRICKS_CLIENT_SECRET"))
client_secret=os.getenv("DATABRICKS_CLIENT_SECRET"),
)
return oauth_service_principal(config)


with sql.connect(
server_hostname=server_hostname,
http_path=os.getenv("DATABRICKS_HTTP_PATH"),
credentials_provider=credential_provider) as connection:
server_hostname=server_hostname,
http_path=os.getenv("DATABRICKS_HTTP_PATH"),
credentials_provider=credential_provider,
) as connection:
for x in range(1, 100):
cursor = connection.cursor()
cursor.execute('SELECT 1+1')
cursor.execute("SELECT 1+1")
result = cursor.fetchall()
for row in result:
print(row)
Expand Down
47 changes: 27 additions & 20 deletions examples/persistent_oauth.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,37 +17,44 @@
from typing import Optional

from databricks import sql
from databricks.sql.experimental.oauth_persistence import OAuthPersistence, OAuthToken, DevOnlyFilePersistence
from databricks.sql.experimental.oauth_persistence import (
OAuthPersistence,
OAuthToken,
DevOnlyFilePersistence,
)


class SampleOAuthPersistence(OAuthPersistence):
def persist(self, hostname: str, oauth_token: OAuthToken):
"""To be implemented by the end user to persist in the preferred storage medium.
def persist(self, hostname: str, oauth_token: OAuthToken):
"""To be implemented by the end user to persist in the preferred storage medium.

OAuthToken has two properties:
1. OAuthToken.access_token
2. OAuthToken.refresh_token
OAuthToken has two properties:
1. OAuthToken.access_token
2. OAuthToken.refresh_token

Both should be persisted.
"""
pass
Both should be persisted.
"""
pass

def read(self, hostname: str) -> Optional[OAuthToken]:
"""To be implemented by the end user to fetch token from the preferred storage
def read(self, hostname: str) -> Optional[OAuthToken]:
"""To be implemented by the end user to fetch token from the preferred storage

Fetch the access_token and refresh_token for the given hostname.
Return OAuthToken(access_token, refresh_token)
"""
pass
Fetch the access_token and refresh_token for the given hostname.
Return OAuthToken(access_token, refresh_token)
"""
pass

with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
auth_type="databricks-oauth",
experimental_oauth_persistence=DevOnlyFilePersistence("./sample.json")) as connection:

with sql.connect(
server_hostname=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path=os.getenv("DATABRICKS_HTTP_PATH"),
auth_type="databricks-oauth",
experimental_oauth_persistence=DevOnlyFilePersistence("./sample.json"),
) as connection:

for x in range(1, 100):
cursor = connection.cursor()
cursor.execute('SELECT 1+1')
cursor.execute("SELECT 1+1")
result = cursor.fetchall()
for row in result:
print(row)
Expand Down
Loading
Loading