Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String right truncation pyodbc.ProgrammingError when using fast_executemany #380

Closed
seanfdnn opened this issue Apr 27, 2018 · 9 comments
Closed

Comments

@seanfdnn
Copy link

Environment

To diagnose, we usually need to know the following, including version numbers. On Windows, be
sure to specify 32-bit Python or 64-bit:

  • Python: 3.6.3
  • pyodbc: 4.0.23
  • OS: Debian Stretch
  • DB: SQL Server
  • driver: Microsoft ODBC Driver 17 for SQL Server libmsodbcsql-17.0.so.1.1

Issue

When executing a parameterized "UPDATE" statement with fast_executemany, the following exception is thrown.

ERROR - encoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_encode> returned a result with an error set)              
pyodbc.ProgrammingError: ('String data, right truncation: length 666 buffer 510', 'HY000')

The column widths in the target database have more than enough size to hold the data.
This seems equivalent to Issue #337 except that in this case, I'm using 4.0.23 of pyodbc and MS ODBC Driver 17, which fixed the issue in that post but continue to be a problem for me.

I've trimmed down the data to just two rows and can consistently reproduce the problem with just those two rows.

I'm using SQL Alchemy to construct the statements and am using the connection.execute(stmt, [row1, row2]) method to execute.

Turning fast_executemany off fixes the problem, but unacceptably impacts performance.

@seanfdnn
Copy link
Author

Some more details:
The column causing the issue is VARCHAR(2000)

After reading #295 , I considered that, even though a TEMP table is not the target of the insert, there is one used in the statement. Changing to a global temp table with ## instead of # fixes in this specific instance, but then the same error happens again on a different table, different column with a different set of data. The TEMP table does not use any varchars, only one integer column, and everywhere in the DB, all columns are varchar, not nvarchar

SET NOCOUNT ON;
BEGIN TRY;
BEGIN TRANSACTION
-- Execute statement here
IF @@ROWCOUNT = '0' THROW 5100, 'FAiled to update or delete record'
COMMIT 
END TRY
BEGIN CATCH
ROLLBACK
-- Statement for inserting index of rejected row into temporary table
END CATCH

@gordthompson
Copy link
Collaborator

Can you try adding ColumnEncryption=Enabled to your connection string (ref: wiki)?

@seanfdnn
Copy link
Author

seanfdnn commented May 1, 2018

ColumnEncryption made no difference, unfortunately.

@seanfdnn
Copy link
Author

seanfdnn commented Jun 7, 2018

A common thread seems to be the buffer is capped at 510, which is 2 x 255 -- which suggests fast_executemany sometimes assumes a maximum string length of 255 characters (510 bytes)

@seanfdnn
Copy link
Author

seanfdnn commented Jun 8, 2018

I've narrowed this down to be a variation of #280

When the SQL statement contains more than one statement (i.e. more complex transactions, try catch blocks, etc...) and fast_executemany=True, the SQLDescribeParam call doesn't work and thus VARCHARs are allocated a default buffer size of 510 bytes. This seems to be a known behaviour of SQLDescribeParam:

When processing a batch of Transact-SQL statements, the driver also does not support calling SQLDescribeParam for parameter markers in statements after the first statement in the batch.

The following code will reproduce the error (note the "print" statement in the SQL, but this could be TRY/CATCH BLOCKS, etc...

import pyodbc
import sys

print('Python version ' + sys.version)
print('pyodbc version ' + pyodbc.version)

driver = 'ODBC Driver 17 for SQL Server'
server = 'localhost'
database = 'db'
user = 'sa'
password = 'password'

conn_string = f'DRIVER={{{driver}}};' \
                   f'PORT=1433;SERVER={server};' \
                   f'DATABASE={database};' \
                   f'UID={user};' \
                   f'PWD={password};' \
                   f'ColumnEncryption=Enabled;' \
                   f'TrustServerCertificate=yes;'

cnxn = pyodbc.connect(conn_string, autocommit=True)

crsr = cnxn.cursor()

crsr.fast_executemany = True

crsr.execute('DROP TABLE IF EXISTS tmp')
crsr.execute('CREATE TABLE tmp (txt VARCHAR(1024))')

value_to_insert = 'a' * 256

params = [(value_to_insert,), (value_to_insert,)]

sql = """
print 'do nothing';
INSERT INTO tmp (txt) VALUES (?);
"""

crsr.setinputsizes((512,))
crsr.executemany(sql, params)

print(crsr.execute('SELECT txt from tmp').fetchall())

(note that in the actual application, the SQL is significantly more complicated; this is the simplest case that will reproduce the error)

This produces the output with errors:

Python version 3.6.3 (default, Nov  6 2017, 10:55:10)
[GCC 6.3.0 20170516]
pyodbc version 4.0.23
pyodbc.ProgrammingError: ('String data, right truncation: length 512 buffer 510', 'HY000')

The above exception was the direct cause of the following exception:

SystemError: <built-in function utf_16_le_encode> returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError: encoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_encode> returned a result with an error set)

The above exception was the direct cause of the following exception:

SystemError: <built-in function utf_16_le_encode> returned a result with an error set

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pypirls/memerrortest.py", line 40, in <module>
    crsr.executemany(sql, params)
SystemError: encoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_encode> returned a result with an error set)

However, changing any one of the below will avoid the exception:

  • Removing the "print" statement from the SQL
  • Setting the number of 'a' characters in the inserted varchar to 255 (fits in buffer size)
  • Setting fast_executemany to False

Note that unlike the solution in the wiki, a temporary table is not being used, and ColumnEncryption=Enabled

It's unclear to me why this problem happens when fast_executemany=True but not when False as I would think that SQLDescribeParameter would have the same behaviour in either case, but I'm going to guess it's because in the latter case, a memory array doesn't need to be pre-allocated.

Looking at @v-chojas 's setinputsizes PR -- but as far as I can tell, it hasn't been merged into master, and it doesn't look like it is impacting the array allocation in params.cpp::ExecuteMulti? Any ideas for a workaround?

@seanfdnn
Copy link
Author

seanfdnn commented Jun 8, 2018

Having read through params.cpp I think I understand now what's happening.
As @mkleehammer mentions in #280 , there's essentially two different parameter binding logic. The one for fast_executemany doesn't check whether any input sizes have been overridden via setinputsizes (but the execute version, BindParam does)

I think the solution would be to add this setinputsizes override into ExecMulti -- probably all the binding logic should be in once place, but in the short term, I'm in a bit of a .... bind 🤣 ... so I'm going to start on a PR.

Although I'm still a little unclear on why it works with fast_executemany=False -- I would have expected the same problem with SQLDescribeParam

@v-chojas
Copy link
Contributor

SQLDescribeParam is not used when not using Fast Executemany, since each cell is bound using only its Python type to determine the binding parameters. When using Fast Executemany, parameters are bound as an array and SQLDescribeParam is used to determine the parameters as well as the size of the array.

@seanfdnn
Copy link
Author

Thanks, @v-chojas that clears things up. The problem with fast_executemany then is when SQLDescribeParam fails due to limitations of when it's supported, the default action (allocate a string of width 255) is not always sufficient.

Options I see:

  • Use setinputsizes to override (in PR) although this doesn't help set the parameter types
  • Iterate through all rows of bound parameters to try to infer the Python type of each column. Because a NoneType doesn't confer the type to use, iterating through all rows would be necessary.
  • At least in my particular case, I could prepare a statement that I know would succeed on SQLDescribeParam, then make some additions to the list of parameters, and pass that parameter list into another statement, although I think this would be very particular for my use case.

mkleehammer pushed a commit that referenced this issue Aug 8, 2018
* enabling setinputsizes to also work with fast_executemany. Issue #380

* Refactored the copied code into a method to override input sizes in slow and fast mode.
Extended the existing code, so that it is now possible to also override the column types, not only the sizes.
Fixed some warnings.

* Updated doc.
@mkleehammer
Copy link
Owner

I'm going to close this now that #415 has been released in 4.0.24.

However, I do want to get the binding behavior of fast and regular the same. If someone sets an encoding or other configuration, they will expect it to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants