Skip to content

shawnz42/adx-db-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What's this?

  • This is a SQLAlchemy dialect for ADX (Azure Data Explorer), which wraps azure-kusto-data api to sqlalchemy interface.

  • Another enhancement is supporting transformation from sql to kql(Kusto query language), for example , it can transform

select * 
from storm
where col = 'a' 
limit 10

to

storm 
| where col == 'a'
| limit 10
  • Helpful for data scientist, see integrate with pandas

Enjoy it!

How to install this package?

  • first prepare your python environment and install necessary packages
mkdir adx-db-api-demo
cd adx-db-api-demo
virtualenv venv -p `which python3`
source venv/bin/activate
pip install sqlalchemy
pip install azure-kusto-data==2.3.1
  • then install adx-db-api:
cd adx-db-api
python setup.py install
  • if you want to clean the asset of install:
cd adx-db-api
python setup.py clean

How to use it?

  • first you may read this page to create client_id, client_secret and authority_id for your adx and get knowledge of your cluster url, then you can get data using sqlachemy interface.
  • create engine:
from sqlalchemy import create_engine

cluster = "<insert here your cluster name>"
client_id = "<insert here your AAD application id>"
client_secret = "<insert here your AAD application key>"
authority_id = "<insert here your AAD tenant id>"

engine = create_engine('adx://client_id:client_secret@cluster/authority_id/database')
  • support kql(Kusto query language)
result = engine.execute("table |  limit 3")

for row in result:
    print(row)
  • also support sql
result = engine.execute("select * from table limit 3")

for row in result:
    print(row)

transform sql to kql

from adx_db import get_kql
print(get_kql('select a, count(b) as cnt from t group by a'))
# t | summarize cnt=count(b) by a

you may view SQL to Kusto cheat sheet or the test cases.

How to integrate with pandas?

import pandas as pd
sql = """table |  limit 3"""  
# sql = """select * from table limit 3"""  # both kql or sql are ok

df = pd.read_sql(sql, con=engine)

print(df.head())

How to test it?

cd adx-db-api
virtualenv venv -p `which python3`
source venv/bin/activate
pip install -r requirements.txt

then execute the test cases.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages