Skip to content

Tips and Tricks by Database Platform

Gord Thompson edited this page Aug 6, 2024 · 34 revisions

The following is a collection of tips for working with specific database platforms. For additional tips that apply to pyodbc as a whole, see Features beyond the DB API.

Microsoft Access  |  Microsoft SQL Server  |  Sybase

Microsoft Access

Databases containing ODBC Linked Tables

pyodbc can work with Access databases that contain ODBC Linked Tables if we disable connection pooling before connecting to the Access database:

import pyodbc
pyodbc.pooling = False
cnxn = pyodbc.connect(r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ= ... ")

Creating a new database file

Access DDL does not support CREATE DATABASE (or similar). If you need to create a new, empty database file you can use the (free) third-party msaccessdb module.

Limitations of the Access ODBC driver

The Access ODBC driver is not able to fully support some of the "complex" column types like "(multi-valued) Lookup" fields and "Attachment" fields. In some cases the ODBC driver can read values from such a column, but is unable to perform INSERT/UPDATE/DELETE operations. If you have to perform such tasks then you may need to use Access DAO (Data Access Objects) along with pywin32.

There is a list of more general Microsoft Access specifications and limitations here.

UnicodeDecodeError when calling Cursor.columns

There is an issue with the Access ODBC driver that can cause a UnicodeDecodeError when trying to use the Cursor.columns method if the table definition in Access includes optional column "Description" information:

Field Name  Data Type   Description
----------  ----------  --------------------
ID          AutoNumber  identity primary key
LastName    Text        Family name
FirstName   Text        Given name(s)
DOB         Date/Time   Date of Birth

For a discussion of the issue and possible workarounds, see issue #328.

Microsoft SQL Server

Authenticate using an Access Token from Azure Active Directory

The ODBC Driver for SQL Server, in addition to the authentication methods built-in to the driver such as username/password, Windows Integrated (Kerberos), Azure Active Directory Password, Managed Service Identity, etc., also supports authenticating to Azure SQL and DW directly using an access token obtained from any of the authentication methods provided by Azure Active Directory; since pyODBC 4.0.24, this can be specified as the driver-specific pre-connect attribute 1256 using the following example code to supply the token in the correct format:

(Python 2.x)

token = "eyJ0eXAiOi..."
exptoken = ""
for i in token:
    exptoken += i
    exptoken += chr(0)
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken
conn = pyodbc.connect(connstr, attrs_before = { 1256:bytearray(tokenstruct) })

(Python 3.x)

token = b"eyJ0eXAiOi..."
exptoken = b"".join(bytes([i, 0]) for i in token)
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken
conn = pyodbc.connect(connstr, attrs_before = { 1256:tokenstruct })

(See also: original issue where this feature was requested)

Stored Procedures with output parameters and/or return values

See the Calling Stored Procedures page for an example of how to use a bit of T-SQL to retrieve these values.

Connecting to a named instance of SQL Server from a Linux client

Microsoft's SQL Server ODBC Driver for Linux is unable to resolve SQL Server instance names. However, if the SQL Browser service is running on the target machine we can use the (free) third-party sqlserverport module to look up the TCP port based on the instance name.

DATETIMEOFFSET columns (e.g., "ODBC SQL type -155 is not yet supported")

Use an Output Converter function to retrieve such values. See the examples on the Using an Output Converter function wiki page.

Query parameters for DATETIMEOFFSET columns currently must be sent as strings. Note that SQL Server is rather fussy about the format of the string:

# sample data
dto = datetime(2018, 8, 2, 0, 28, 12, 123456, tzinfo=timezone(timedelta(hours=-6)))

dto_string = dto.strftime("%Y-%m-%d %H:%M:%S.%f %z")  # 2018-08-02 00:28:12.123456 -0600
# Trying to use the above will fail with
#   "Conversion failed when converting date and/or time from character string."
# We need to add the colon for SQL Server to accept it
dto_string = dto_string[:30] + ":" + dto_string[30:]  # 2018-08-02 00:28:12.123456 -06:00

TIME columns

Due to legacy considerations, pyodbc uses the ODBC TIME_STRUCT structure for datetime.time query parameters. TIME_STRUCT does not understand fractional seconds, so datetime.time values have their fractional seconds truncated when passed to SQL Server.

crsr.execute("CREATE TABLE #tmp (id INT, t TIME)")
t = datetime.time(hour=12, minute=23, second=34, microsecond=567890)
crsr.execute("INSERT INTO #tmp (id, t) VALUES (1, ?)", t)
rtn = crsr.execute("SELECT CAST(t AS VARCHAR) FROM #tmp WHERE id=1").fetchval()
print(rtn)  # 12:23:34.0000000

The workaround is to pass the query parameter as a string

crsr.execute("INSERT INTO #tmp (id, t) VALUES (1, ?)", str(t))
rtn = crsr.execute("SELECT CAST(t AS VARCHAR) FROM #tmp WHERE id=1").fetchval()
print(rtn)  # 12:23:34.5678900

Note that TIME columns retrieved by pyodbc have their microseconds intact

rtn = crsr.execute("SELECT t FROM #tmp WHERE id=1").fetchval()
print(repr(rtn))  # datetime.time(12, 23, 34, 567890)

SQL Server Numeric Precision vs. Python Decimal Precision

Python's decimal.Decimal type can represent floating point numbers with greater than 35 digits of precision, which is the maximum supported by SQL server. Binding parameters that exceed this precision will result in an invalid precision error from the driver ("HY104 [Microsoft][...]Invalid precision value").

Using fast_executemany with a #temporary table

fast_executemany can have difficulty identifying the column types of a local #temporary table under some circumstances (#295).

-- Workaround 1: ODBC Driver 17 for SQL Server and UseFMTONLY=Yes or ColumnEncryption=Enabled

Use "ODBC Driver 17 for SQL Server" (or newer) and include UseFMTONLY=Yes or ColumnEncryption=Enabled in the connection string, e.g.,

cnxn_str = (
    "Driver=ODBC Driver 17 for SQL Server;"
    "Server=192.168.1.144,49242;"
    "UID=sa;PWD=_whatever_;"
    "Database=myDb;"
    "UseFMTONLY=Yes;"
)

or

cnxn_str = (
    "Driver=ODBC Driver 17 for SQL Server;"
    "Server=192.168.1.144,49242;"
    "UID=sa;PWD=_whatever_;"
    "Database=myDb;"
    "ColumnEncryption=Enabled;"
)

-- Workaround 2: pyodbc 4.0.24 and Cursor.setinputsizes

Upgrade to pyodbc 4.0.24 (or newer) and use setinputsizes to specify the parameter type, etc..

crsr.execute("""\
CREATE TABLE #issue295 (
    id INT IDENTITY PRIMARY KEY, 
    txt NVARCHAR(50), 
    dec DECIMAL(18,4)
    )""")
sql = "INSERT INTO #issue295 (txt, dec) VALUES (?, ?)"
params = [('Ώπα', 3.141)]
# explicitly set parameter type/size/precision
crsr.setinputsizes([(pyodbc.SQL_WVARCHAR, 50, 0), (pyodbc.SQL_DECIMAL, 18, 4)])
crsr.fast_executemany = True
crsr.executemany(sql, params)

-- Workaround 3: Use a global ##temporary table

If neither of the previous workarounds is feasible, simply use a global ##temporary table instead of a local #temporary table.

Always Encrypted

SQL Server 2016 and later support client-side encryption and decryption of data, when used with the ODBC Driver 13.1 for SQL Server or later. Retrieving data from encrypted columns through pyODBC has no special considerations; however, inserting data into encrypted columns is subject to some limitations of the ODBC driver itself, the two most important in pyODBC usage being:

  • Data to be inserted into or compared with encrypted columns must be passed through a parameterised query; literals in the T-SQL directly will not work.

  • Due to the lack of server-side type conversion, it may be necessary to use the setinputsizes() function extension (see https://github.com/mkleehammer/pyodbc/pull/280 for details) on specific columns to cause pyODBC to send the correct data type to the server for insertion or comparison with encrypted data. If you receive 'Operand type clash' errors, this is likely to be the case.

Non-numeric float values

SQL Server float columns cannot store non-numeric floating point values like "Infinity". Attempting to do so …

crsr.execute("CREATE TABLE #tmp (id int primary key, f float)")
crsr.execute("INSERT INTO #tmp (id, f) VALUES (?, ?)", (1, float("Infinity")))

… results in an error like this:

pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 5 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)')

This is not a deficiency of pyodbc or ODBC; the same error occurs if we try to save float.PositiveInfinity to a float column using C# and System.Data.SqlClient.

Passing row-oriented parameter data as a JSON string

With SQL Server 2016+ we can use OPENJSON() and pass row-oriented parameter data as a JSON string instead of passing a list of tuples. Not only can this be considerably faster but it can also work around issues with tuple-based TVP data like the one discussed here.

For example, given a user-defined table type

CREATE TYPE dbo.my_table_type AS TABLE 
(
    id int NOT NULL, 
    txt nvarchar(50) NULL, 
    PRIMARY KEY (id)
)

and a stored procedure that consumes it

CREATE PROCEDURE echo_tvp 
    @tvp dbo.my_table_type READONLY
AS
BEGIN
    SET NOCOUNT ON;
    SELECT * FROM @tvp;
END

this works (no None in the first row)

tvp_data = [(1, "Alfa"), (2, "Bravo")]
results = crsr.execute("EXEC echo_tvp ?", (tvp_data,)).fetchall()
print(results)
# [(1, 'Alfa'), (2, 'Bravo')]

but this fails

tvp_data = [(1, None), (2, "Bravo")]
results = crsr.execute("EXEC echo_tvp ?", (tvp_data,)).fetchall()
# Invalid SQL data type (0) (SQLBindParameter)

However, with SQL Server 2016+ this works

tvp_data = [(1, None), (2, "Bravo")]
tvp_json = [dict(zip(["id", "txt"], row)) for row in tvp_data]

sql = """\
SET NOCOUNT ON;
DECLARE @tvp dbo.my_table_type;
INSERT INTO @tvp (id, txt)
SELECT id, txt FROM OPENJSON(?)
WITH (
    id int '$.id',
    txt nvarchar(50) '$.txt'
);
EXEC echo_tvp @tvp
"""
results = crsr.execute(sql, json.dumps(tvp_json, default=str)).fetchall()
print(results)
# [(1, None), (2, 'Bravo')]

Speeding up filtering of indexed varchar columns

By default, pyodbc sends string parameter values as Unicode. Therefore

    results = crsr.execute(
        "SELECT id FROM million_rows WHERE varchar_col = ?", "row012345"
    ).fetchall()

sends the equivalent of

SELECT id FROM million_rows WHERE varchar_col = N'row012345'

Unfortunately, this causes SQL Server to perform an "index scan" because it cannot compare Unicode (nvarchar) and SBCS (varchar) characters directly.

SQL Server will be able to perform and "index seek" (much faster) if we use setinputsizes() to tell pyodbc to send the string parameter values as varchar

    crsr.setinputsizes([(pyodbc.SQL_VARCHAR, 255)])
    results = crsr.execute(
        "SELECT id FROM million_rows WHERE varchar_col = ?", "row012345"
    ).fetchall()

which sends the equivalent of

SELECT id FROM million_rows WHERE varchar_col = 'row012345'

unixODBC connection pooling

The most recent (as of August 2024) updates to both unixODBC and pyodbc have changed the behaviour of connection pooling under unixODBC. With libmsodbcsql-18.4.so.1.1 (ODBC Driver 18 for SQL Server) and pyodbc >= 5.1.0:

unixODBC <= 2.3.9 — Connection pooling does not work at all. pyodbc uses wide (Unicode) connection functions and unixODBC did not start supporting those for connection pooling until 2.3.10.

unixODBC == 2.3.11 — pyodbc defaults to pyodbc.pooling = True but unixODBC ignores that setting, so connection pooling is disabled by default. Adding Pooling = Yes to the [ODBC] section of odbcinst.ini enables connection pooling. Details in the unixODBC documentation here.

unixODBC >= 2.3.12 — unixODBC now respects the pyodbc.pooling setting, so connection pooling is enabled by default. Pooling can be completely disabled by setting pyodbc.pooling = False before the first connection. Pooling can be disabled for a specific driver by adding CPTimeout = -1 (not 0, as currently documented) to the driver's section of odbcinst.ini. Note that pyodbc.pooling = True takes precedence over the Pooling = Yes/No setting in odbcinst.ini.

For more detailed discussion, see Issue 774.

Sybase

Parameterized INSERTs and @@IDENTITY

Sybase ASE treats parameterized queries as LightWeight Procedures, or LWPs, which are basically stored procedures with a cached query plan. Hence, the session-global @@IDENTITY value is reset to zero when the LWP exits:

cnxn = pyodbc.connect('DSN=mysybasedsn', autocommit=True)
crsr = cnxn.cursor()
crsr.execute("INSERT INTO dbo.mytable (foo) VALUES (?)", 'bar')
crsr.execute("SELECT @@IDENTITY AS id")
id = crsr.fetchone()[0]  # always returns zero

To get the @@IDENTITY value, add the query after the INSERT, separated by a newline character:

cnxn = pyodbc.connect('DSN=mysybasedsn', autocommit=True)
crsr = cnxn.cursor()
crsr.execute("INSERT INTO dbo.mytable (foo) VALUES (?)\nSELECT @@IDENTITY", 'bar')
id = crsr.fetchone()[0]

Ref: https://github.com/FreeTDS/freetds/issues/337#issuecomment-640070962

Clone this wiki locally