String right truncation pyodbc.ProgrammingError when using `fast_executemany` #380

seanfdnn · 2018-04-27T15:13:31Z

Environment

To diagnose, we usually need to know the following, including version numbers. On Windows, be
sure to specify 32-bit Python or 64-bit:

Python: 3.6.3
pyodbc: 4.0.23
OS: Debian Stretch
DB: SQL Server
driver: Microsoft ODBC Driver 17 for SQL Server libmsodbcsql-17.0.so.1.1

Issue

When executing a parameterized "UPDATE" statement with fast_executemany, the following exception is thrown.

ERROR - encoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_encode> returned a result with an error set)              
pyodbc.ProgrammingError: ('String data, right truncation: length 666 buffer 510', 'HY000')

The column widths in the target database have more than enough size to hold the data.
This seems equivalent to Issue #337 except that in this case, I'm using 4.0.23 of pyodbc and MS ODBC Driver 17, which fixed the issue in that post but continue to be a problem for me.

I've trimmed down the data to just two rows and can consistently reproduce the problem with just those two rows.

I'm using SQL Alchemy to construct the statements and am using the connection.execute(stmt, [row1, row2]) method to execute.

Turning fast_executemany off fixes the problem, but unacceptably impacts performance.

The text was updated successfully, but these errors were encountered:

seanfdnn · 2018-04-27T19:29:17Z

Some more details:
The column causing the issue is VARCHAR(2000)

After reading #295 , I considered that, even though a TEMP table is not the target of the insert, there is one used in the statement. Changing to a global temp table with ## instead of # fixes in this specific instance, but then the same error happens again on a different table, different column with a different set of data. The TEMP table does not use any varchars, only one integer column, and everywhere in the DB, all columns are varchar, not nvarchar

SET NOCOUNT ON;
BEGIN TRY;
BEGIN TRANSACTION
-- Execute statement here
IF @@ROWCOUNT = '0' THROW 5100, 'FAiled to update or delete record'
COMMIT 
END TRY
BEGIN CATCH
ROLLBACK
-- Statement for inserting index of rejected row into temporary table
END CATCH

gordthompson · 2018-04-27T22:18:47Z

Can you try adding ColumnEncryption=Enabled to your connection string (ref: wiki)?

seanfdnn · 2018-05-01T14:55:32Z

ColumnEncryption made no difference, unfortunately.

seanfdnn · 2018-06-07T17:22:30Z

A common thread seems to be the buffer is capped at 510, which is 2 x 255 -- which suggests fast_executemany sometimes assumes a maximum string length of 255 characters (510 bytes)

seanfdnn · 2018-06-08T01:40:12Z

I've narrowed this down to be a variation of #280

When the SQL statement contains more than one statement (i.e. more complex transactions, try catch blocks, etc...) and fast_executemany=True, the SQLDescribeParam call doesn't work and thus VARCHARs are allocated a default buffer size of 510 bytes. This seems to be a known behaviour of SQLDescribeParam:

When processing a batch of Transact-SQL statements, the driver also does not support calling SQLDescribeParam for parameter markers in statements after the first statement in the batch.

The following code will reproduce the error (note the "print" statement in the SQL, but this could be TRY/CATCH BLOCKS, etc...

import pyodbc
import sys

print('Python version ' + sys.version)
print('pyodbc version ' + pyodbc.version)

driver = 'ODBC Driver 17 for SQL Server'
server = 'localhost'
database = 'db'
user = 'sa'
password = 'password'

conn_string = f'DRIVER={{{driver}}};' \
                   f'PORT=1433;SERVER={server};' \
                   f'DATABASE={database};' \
                   f'UID={user};' \
                   f'PWD={password};' \
                   f'ColumnEncryption=Enabled;' \
                   f'TrustServerCertificate=yes;'

cnxn = pyodbc.connect(conn_string, autocommit=True)

crsr = cnxn.cursor()

crsr.fast_executemany = True

crsr.execute('DROP TABLE IF EXISTS tmp')
crsr.execute('CREATE TABLE tmp (txt VARCHAR(1024))')

value_to_insert = 'a' * 256

params = [(value_to_insert,), (value_to_insert,)]

sql = """
print 'do nothing';
INSERT INTO tmp (txt) VALUES (?);
"""

crsr.setinputsizes((512,))
crsr.executemany(sql, params)

print(crsr.execute('SELECT txt from tmp').fetchall())

(note that in the actual application, the SQL is significantly more complicated; this is the simplest case that will reproduce the error)

This produces the output with errors:

Python version 3.6.3 (default, Nov  6 2017, 10:55:10)
[GCC 6.3.0 20170516]
pyodbc version 4.0.23
pyodbc.ProgrammingError: ('String data, right truncation: length 512 buffer 510', 'HY000')

The above exception was the direct cause of the following exception:

SystemError: <built-in function utf_16_le_encode> returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError: encoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_encode> returned a result with an error set)

The above exception was the direct cause of the following exception:

SystemError: <built-in function utf_16_le_encode> returned a result with an error set

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pypirls/memerrortest.py", line 40, in <module>
    crsr.executemany(sql, params)
SystemError: encoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_encode> returned a result with an error set)

However, changing any one of the below will avoid the exception:

Removing the "print" statement from the SQL
Setting the number of 'a' characters in the inserted varchar to 255 (fits in buffer size)
Setting fast_executemany to False

Note that unlike the solution in the wiki, a temporary table is not being used, and ColumnEncryption=Enabled

It's unclear to me why this problem happens when fast_executemany=True but not when False as I would think that SQLDescribeParameter would have the same behaviour in either case, but I'm going to guess it's because in the latter case, a memory array doesn't need to be pre-allocated.

Looking at @v-chojas 's setinputsizes PR -- but as far as I can tell, it hasn't been merged into master, and it doesn't look like it is impacting the array allocation in params.cpp::ExecuteMulti? Any ideas for a workaround?

seanfdnn · 2018-06-08T02:55:26Z

Having read through params.cpp I think I understand now what's happening.
As @mkleehammer mentions in #280 , there's essentially two different parameter binding logic. The one for fast_executemany doesn't check whether any input sizes have been overridden via setinputsizes (but the execute version, BindParam does)

I think the solution would be to add this setinputsizes override into ExecMulti -- probably all the binding logic should be in once place, but in the short term, I'm in a bit of a .... bind 🤣 ... so I'm going to start on a PR.

Although I'm still a little unclear on why it works with fast_executemany=False -- I would have expected the same problem with SQLDescribeParam

…ehammer#380

v-chojas · 2018-06-14T16:23:33Z

SQLDescribeParam is not used when not using Fast Executemany, since each cell is bound using only its Python type to determine the binding parameters. When using Fast Executemany, parameters are bound as an array and SQLDescribeParam is used to determine the parameters as well as the size of the array.

seanfdnn · 2018-06-14T16:33:10Z

Thanks, @v-chojas that clears things up. The problem with fast_executemany then is when SQLDescribeParam fails due to limitations of when it's supported, the default action (allocate a string of width 255) is not always sufficient.

Options I see:

Use setinputsizes to override (in PR) although this doesn't help set the parameter types
Iterate through all rows of bound parameters to try to infer the Python type of each column. Because a NoneType doesn't confer the type to use, iterating through all rows would be necessary.
At least in my particular case, I could prepare a statement that I know would succeed on SQLDescribeParam, then make some additions to the list of parameters, and pass that parameter list into another statement, although I think this would be very particular for my use case.

* enabling setinputsizes to also work with fast_executemany. Issue #380 * Refactored the copied code into a method to override input sizes in slow and fast mode. Extended the existing code, so that it is now possible to also override the column types, not only the sizes. Fixed some warnings. * Updated doc.

mkleehammer · 2018-08-15T17:24:39Z

I'm going to close this now that #415 has been released in 4.0.24.

However, I do want to get the binding behavior of fast and regular the same. If someone sets an encoding or other configuration, they will expect it to work.

skeetor mentioned this issue May 22, 2018

Can not insert NULL with fast_executemany = True and UNICODE string (possible memory corruption) #395

Closed

seanfdnn pushed a commit to seanfdnn/pyodbc that referenced this issue Jun 8, 2018

enabling setinputsizes to also work with fast_executemany. Issue mkle…

2f28ec0

…ehammer#380

seanfdnn mentioned this issue Jun 8, 2018

enabling setinputsizes to also work with fast_executemany. #405

Closed

mkleehammer closed this as completed Aug 15, 2018

vchalmel mentioned this issue Jun 19, 2020

fast_executemany causes insert failures #785

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String right truncation pyodbc.ProgrammingError when using `fast_executemany` #380

String right truncation pyodbc.ProgrammingError when using `fast_executemany` #380

seanfdnn commented Apr 27, 2018

seanfdnn commented Apr 27, 2018

gordthompson commented Apr 27, 2018

seanfdnn commented May 1, 2018

seanfdnn commented Jun 7, 2018

seanfdnn commented Jun 8, 2018 •

edited

Loading

seanfdnn commented Jun 8, 2018 •

edited

Loading

v-chojas commented Jun 14, 2018

seanfdnn commented Jun 14, 2018

mkleehammer commented Aug 15, 2018

String right truncation pyodbc.ProgrammingError when using fast_executemany #380

String right truncation pyodbc.ProgrammingError when using fast_executemany #380

Comments

seanfdnn commented Apr 27, 2018

Environment

Issue

seanfdnn commented Apr 27, 2018

gordthompson commented Apr 27, 2018

seanfdnn commented May 1, 2018

seanfdnn commented Jun 7, 2018

seanfdnn commented Jun 8, 2018 • edited Loading

seanfdnn commented Jun 8, 2018 • edited Loading

v-chojas commented Jun 14, 2018

seanfdnn commented Jun 14, 2018

mkleehammer commented Aug 15, 2018

String right truncation pyodbc.ProgrammingError when using `fast_executemany` #380

String right truncation pyodbc.ProgrammingError when using `fast_executemany` #380

seanfdnn commented Jun 8, 2018 •

edited

Loading

seanfdnn commented Jun 8, 2018 •

edited

Loading