The primary purpose of Norm is to make it easier to generate SQL.
In [1]: from norm.norm_sqlite3 import SQLI_SELECT as SELECT
In [2]: s = (SELECT('val')
...: .FROM('foos'))
In [3]: print(s.query)
SELECT val
FROM foos;
In [4]: s2 = s.WHERE(val = 5)
In [5]: s2 = s2.SELECT('foos_id')
In [6]: print(s2.query)
SELECT val,
foos_id
FROM foos
WHERE val = :val_bind_0;
In [7]: print(s.query)
SELECT val
FROM foos;
Bind parameters can be automatically handled, for example in .WHERE(val=5)
In [8]: print(s2.binds)
{'val_bind_0': 5}
Using .query and .binds seperately lets you use norm wherever you can execute SQL. For example, with a SQLAlchemy Session object:
res = Session.execute(s.query, s.binds)
In addition to the simple, static queries above, it is possible to add query clauses.
In [9]: print(s.query)
SELECT val
FROM foos;
In [10]: s = s.WHERE('val * 2 = 4')
In [11]: print(s.query)
SELECT val
FROM foos
WHERE val * 2 = 4;
In [12]: s = s.JOIN('bars', ON='foos.val = bars.bar_id')
In [13]: print(s.query)
SELECT val
FROM foos
JOIN bars
ON foos.val = bars.bar_id
WHERE val * 2 = 4;
Of course you can put it all together:
In [14]: s = (SELECT('val')
...: .FROM('foos')
...: .JOIN('bars', ON='foos.val = bars.bar_id')
...: .WHERE(val=5)
...: .WHERE('val * 2 = 4'))
In [15]: print(s.query)
SELECT val
FROM foos
JOIN bars
ON foos.val = bars.bar_id
WHERE val = :val_bind_0 AND
val * 2 = 4;
Or you can evolve queries dynamically:
def get_users(cursor, user_ids, only_girls=False, minimum_age=0):
s = (SELECT('first_name', 'age')
.FROM('people')
.WHERE('user_ids IN :user_ids')
.bind(user_ids=user_ids))
if only_girls:
s = s.WHERE(gender='f')
if minimum_age:
s = (s.WHERE('age >= :minimum_age')
.bind(minimum_age=minimum_age))
return cursor.run_query(s)
You can also add SELECT
statements dynamically.
def get_users(cursor, user_ids, has_dog=False):
s = (SELECT('p.first_name', 'p.age')
.FROM('p.people')
.WHERE('user_ids IN :user_ids')
.bind(user_ids=user_ids))
if has_dog:
s = (s.SELECT('d.dog_name', 'd.breed')
.JOIN('dogs as d', ON='p.person_id = d.owner_id'))
return cursor.run_query(s)
Calling methods on a query object does not change the object, it returns a new query object.
In other words, query objects are immutable. This means it is always safe to create a base query and add clauses without modifying it.
_user_query = (SELECT('first_name', 'last_name')
.FROM('people'))
def get_old_people(conn, min_age=65):
old_people_query = s.WHERE('age > :min_age').bind(min_age=min_age)
return conn.run_query(old_people_query)
def get_karls(conn):
karl_query = s.WHERE(first_name='Karl')
return conn.run_query(karl_query)
UPDATE and DELETE work basically the same as SELECT
In [1]: from norm import UPDATE
In [2]: from norm import SELECT
In [3]: fix_karls = (UPDATE('people')
...: .SET(first_name='Karl')
...: .WHERE(first_name='karl'))
In [4]: print(fix_karls.query)
UPDATE people
SET first_name = %(first_name_bind)s
WHERE first_name = %(first_name_bind_1)s;
In [5]: print(fix_karls.binds)
{'first_name_bind': 'Karl', 'first_name_bind_1': 'karl'}
In [8]: from norm import DELETE
In [9]: remove_karls = (DELETE('people')
...: .WHERE(first_name='Karl'))
In [10]: print(remove_karls.query)
DELETE FROM people
WHERE first_name = %(first_name_bind_0)s;
In [11]: print(remove_karls.binds)
{'first_name_bind_0': 'Karl'}
Inserts just take dictionaries and treat them like rows.
All the rows are inserted as one large INSERT statement with many bind parameters. This means that if your database or library doesn't support large numbers of bind parameters, you may have to break the rows you wish to insert into several batches.
rows = [dict(first_name='bob', last_name='dobs', age=132),
dict(first_name='bill, last_name='gates, age=67),
dict(first_name='steve', last_name='jobs', age=60),
dict(first_name='bob', last_name='jones'),
dict(first_name='mike', last_name='jones', age=15)]
i = INSERT('people', rows)
conn.execute(i)
The behavior for missing keys depends on the database/library norm backend you are using. For psycopg2/postgres it will fill in mising keys with DEFAULT. For most databases which do not provide an AsIs DBAPI wrapper, the default default is None (NULL). This can be overridden:
i = INSERT('people', default=AsIs('DEFAULT'))
This should not be used with a value like 5
or something, it is meant to be a way to specify the DEAULT keyword for the library/database you are using. For psycopg2/postgresql, it will automatically fill in DEFAULT, using http://initd.org/psycopg/docs/extensions.html#psycopg2.extensions.AsIs For inferior databases there may not be a defined way to do this safely. To allow literal SQL to be included as part of an insert, there is norm.NormAsIs
.
For WITH
, a WITH
object can be used to wrap other queries into CTE tables. The final query in the CTE is provided by calling the WITH instance.
In [1]: from norm import WITH
In [2]: from norm import SELECT
In [3]: all_active_players = (SELECT('player_id')
...: .FROM('players')
...: .WHERE(status='ACTIVE'))
In [4]: career_runs_scored = (SELECT('player_id',
...: 'SUM(runs_scored) AS total_runs')
...: .FROM('games')
...: .GROUP_BY('player_id'))
In [5]: w = WITH(all_active_players=all_active_players,
...: career_runs_scored=career_runs_scored)
In [6]: active_players_total_runs = (SELECT('crs.player_id AS player_id',
...: 'crs.total_runs AS total_runs')
...: .FROM('all_active_players aap')
...: .JOIN('career_runs_scored crs',
...: ON='crs.player_id = aap.player_id'))
In [7]: w = w(active_players_total_runs)
In [8]: print(w.query)
WITH all_active_players AS
(SELECT player_id
FROM players
WHERE status = %(status_bind_0)s),
career_runs_scored AS
(SELECT player_id,
SUM(runs_scored) AS total_runs
FROM games
GROUP BY player_id)
SELECT crs.player_id AS player_id,
crs.total_runs AS total_runs
FROM all_active_players aap
JOIN career_runs_scored crs
ON crs.player_id = aap.player_id;
In [9]: outer_w = WITH(active_players_total_runs=w)
In [10]: outer_w = outer_w(SELECT('aptr.player_id')
...: .FROM('active_players_total_runs aptr')
...: .WHERE('aptr.total_runs > 500'))
In [11]: print(outer_w.query)
WITH active_players_total_runs AS
(WITH all_active_players AS
(SELECT player_id
FROM players
WHERE status = %(status_bind_0)s),
career_runs_scored AS
(SELECT player_id,
SUM(runs_scored) AS total_runs
FROM games
GROUP BY player_id)
SELECT crs.player_id AS player_id,
crs.total_runs AS total_runs
FROM all_active_players aap
JOIN career_runs_scored crs
ON crs.player_id = aap.player_id)
SELECT aptr.player_id
FROM active_players_total_runs aptr
WHERE aptr.total_runs > 500;
# This example is a little contrived, there are obviously
# better ways to do this query
In [1]: from norm import SELECT
In [2]: s = (SELECT('FirstName as first_name',
...: 'LastName as last_name')
...: .FROM('people')
...: .LIMIT(1))
In [3]: print(s.query)
SELECT FirstName as first_name,
LastName as last_name
FROM people
LIMIT 1;
In [5]: s = s.bind(my_limit=5)
In [6]: print(s.query)
SELECT FirstName as first_name,
LastName as last_name
FROM people
LIMIT %(my_limit)s;
In [7]: print(s.binds)
{'my_limit': 5}
LIMIT and OFFSET can only appear in a given SQL statement one time. Rather than being built up like the SELECT columns, WHERE clauses, etc, the Norm query will take the final value or expression provided.
In [2]: s = (SELECT('FirstName as first_name',
...: 'LastName as last_name')
...: .FROM('people')
...: .LIMIT('%(my_limit)s'))
In [3]: s = s.LIMIT(250)
In [4]: print(s.query)
SELECT FirstName as first_name,
LastName as last_name
FROM people
LIMIT 250;
In [5]: s = s.OFFSET(10)
In [6]: print(s.query)
SELECT FirstName as first_name,
LastName as last_name
FROM people
LIMIT 250
OFFSET 10;
In [7]: s = s.OFFSET(99)
In [8]: print(s.query)
SELECT FirstName as first_name,
LastName as last_name
FROM people
LIMIT 250
OFFSET 99;
While many SQL flavors prefer LIMIT
, MS SQL Server favors TOP
.
In [1]: from norm.norm_pymssql import PYMSSQL_SELECT as SELECT
In [2]: s = (SELECT('FirstName as first_name',
...: 'LastName as last_name')
...: .FROM('people')
...: .TOP(1))
In [3]: print(s.query)
SELECT TOP 1
FirstName as first_name,
LastName as last_name
FROM people;
These methods work much like WHERE in that they can be stacked.
In [4]: s = SELECT('u.user_id', 'u.first_name').FROM('users u')
In [5]: print(s.query)
SELECT u.user_id,
u.first_name
FROM users u;
In [6]: s = SELECT('u.first_name').FROM('users u')
In [7]: print(s.query)
SELECT u.first_name
FROM users u;
In [8]: s = s.GROUP_BY('u.first_name')
In [9]: s = s.SELECT('COUNT(*) AS cnt')
In [10]: print(s.query)
SELECT u.first_name,
COUNT(*) AS cnt
FROM users u
GROUP BY u.first_name;
In [11]: s = s.HAVING('COUNT(*) > 3')
In [12]: print(s.query)
SELECT u.first_name,
COUNT(*) AS cnt
FROM users u
GROUP BY u.first_name
HAVING COUNT(*) > 3;
In [13]: s = s.ORDER_BY('COUNT(*)')
In [14]: print(s.query)
SELECT u.first_name,
COUNT(*) AS cnt
FROM users u
GROUP BY u.first_name
HAVING COUNT(*) > 3
ORDER BY COUNT(*);
In [15]: s = s.ORDER_BY('u.first_name')
In [16]: print(s.query)
SELECT u.first_name,
COUNT(*) AS cnt
FROM users u
GROUP BY u.first_name
HAVING COUNT(*) > 3
ORDER BY COUNT(*),
u.first_name;
In [17]: u = UPDATE('users').SET(first_name='Bob').WHERE(first_name='Robert')
In [18]: print(u.query)
UPDATE users
SET first_name = %(first_name_bind)s
WHERE first_name = %(first_name_bind_1)s;
In [19]: print(u.binds)
{'first_name_bind': 'Bob', 'first_name_bind_1': 'Robert'}
In [20]: u = u.RETURNING('u.user_id')
In [21]: print(u.query)
UPDATE users
SET first_name = %(first_name_bind)s
WHERE first_name = %(first_name_bind_1)s
RETURNING u.user_id;
The norm connection factory is a helper to produce norm.connection.ConnectionProxy wrapped DB-API connection objects.
import sqlite3
# notice that there is a specific class for each type of database
from norm.norm_sqlite3 import SQLI_ConnectionFactory as ConnectionFactory
def _make_connection():
conn = sqlite3.connect(':memory:')
return conn
my_connection_factory = ConnectionFactory(_make_connection)
# now we can get connections and use them
conn = my_connection_factory()
row = conn.run_queryone('SELECT 1 AS test_column')
# row == {'test_column': 1}
A norm connection factory will return connection objects that look a lot like whatever dbapi connection object you are used to from the library you use to create connections (psycopg2, pymssql, sqlite3, etc) but with some important exceptions. While it passes on any method call to the actual connection object, it intercepts .cursor. Additionally, it adds .run_query, .run_queryone and .execute.
the .cursor(...) method passes all arguments to the .cursor method of the actual connection object. However, it wraps the cursor which is returned inside a CursorProxy object.
Calling .execute on a ConnectionProxy creates a cursor, executes the sql provided, and then closes the cursor. It is meant as a convenience to avoid creating a cursor for queries where you do not care about any data returned.
Calling this returns a generator which produces rows in the form of dictionaries.
import sqlite3
from norm.norm_sqlite3 import SQLI_ConnectionFactory as ConnectionFactory
def conn_maker():
conn = sqlite3.connect(':memory:')
conn.execute(
'''CREATE TABLE users (
user_id INTEGER PRIMARY KEY AUTOINCREMENT,
first_name VARCHAR(64)
)''')
conn.commit()
return conn
cf = ConnectionFactory(conn_maker)
conn = cf()
conn.execute(
'''CREATE TABLE foos (
val VARCHAR(64)
)''')
for val in range(10):
conn.execute('INSERT INTO foos VALUES (:value)',
dict(value=val))
rows = conn.run_query('SELECT val FROM foos')
print(rows)
# prints: <norm.rows.RowsProxy object at 0x7f0ae07191d0>
print(list(rows))
# prints: [{'val': '0'}, {'val': '1'}, {'val': '2'}, ...]
see benchmarks/shuttle_data_example.py
A run on my desktop produced:
Moved 1000000 rows in 20.09 seconds
Streaming memory usage peak was 0.05MB
Moved 1000000 rows in 16.58 seconds
Fetchall memory usage peak was 212.13MB
see benchmarks/benchmark.py
When you are inserting many rows, most Python DB-API libraries will produce 1 INSERT statement for each row. There is a cursor.executemany method that many libraries provide, but this usually still produces a seperate INSERT statement per row inserted, and simply does these in a for loop.
Lets say you are inserting 1000 rows into a table. With Norm you can batch these into a single INSERT object and execute it, and a single large SQL statement is produced. Back and forth trips to the database == 1.
Lets say you use SQLAlchemy or even directly use one of the sql libraries. Even if you use executemany, this will produce 1000 back and forth trips to the database.
In a datacenter environment, ping times to the db will be short, lets estimate the time to send a query to a database, plus the time to parse a short INSERT statement to be 1ms. For 1000 rows, just the round trips to the database will add 1 second to the execution time.
A run on my desktop, writing to a local postgresql database (latency is effectively 0), showed the INSERT was still completed in 1/4th the time. Adding network latency will make the improvement more dramatic.
*** Begin sqlalchemy_insert_bench
Elapsed Time: 3.4474
Faster than SQLA factor: 1.0000
*** Begin norm_insert_bench
Elapsed Time: 0.7697
Faster than SQLA factor: 4.4790