Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode issues with Impala #291

Closed
Arkoniak opened this issue Jun 4, 2020 · 4 comments
Closed

Unicode issues with Impala #291

Arkoniak opened this issue Jun 4, 2020 · 4 comments

Comments

@Arkoniak
Copy link

Arkoniak commented Jun 4, 2020

I am trying to execute simple query in Impala and get weird errors.

using ODBC
ODBC.adddriver("Impala", "/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so")
cstr = "DRIVER=Impala;HOST=<host>;PORT=<port>;UID=<uid>;PWD=<password>;AuthMech=3;SSL=0;"
conn = ODBC.Connection(cstr)
DBInterface.execute(conn, "SELECT 42")

ERROR: 42000: [Cloudera][ImpalaODBC] (360) Syntax error occurred during query execution: [H
Y000] : AnalysisException: Syntax error in line 1:
�����罝
^
Encountered: Unexpected character
Expected: ALTER, COMPUTE, CREATE, DELETE, DESCRIBE, DROP, EXPLAIN, GRANT, INSERT, INVALIDAT
E, LOAD, REFRESH, REVOKE, SELECT, SET, SHOW, TRUNCATE, UPDATE, UPSERT, USE, VALUES, WITH

CAUSED BY: Exception: Syntax error
P, EXP

I am using the same connection string in pyodbc and it works just fine, so it seems like something is different in python and Julia strings and how driver interpret them. I was thinking about using StringEncodings.jl, but encode function returns Vector{UInt8} and I wasn't able to figure out what to do next.

OS: Ubuntu 18.04.4 LTS
Impala driver: 2.6.4

@quinnj
Copy link
Member

quinnj commented Jun 5, 2020

Looking into this; looks like I don't see any issues on OSX; going to try on linux now.

@quinnj
Copy link
Member

quinnj commented Jun 5, 2020

Ok, I tracked down the issue (for me at least) on linux:
Screen Shot 2020-06-05 at 12 03 37 PM

the docs mention that the driver is configurable between UTF-32 and UTF-16, with the default being UTF-32 for some reason (even though default unixODBC is UTF-16). So editing /opt/cloudera/impalaodbc/lib/64/cloudera.impalaodbc.ini and changing the line to DriverManagerEncoding=UTF-16 now everything works fine.

These kind of driver-specific .ini files are annoying because they're like hidden configuration we can't quite control and are not aware of. I'm going to start a "troubleshooting" section to the docs where we can collect general strategies for debugging these kinds of issues and list specific cases and what to do.

@Arkoniak
Copy link
Author

Arkoniak commented Jun 5, 2020

Thank you very much! This is amazing work, I can't even guess how you were able to figure it out.

Only one question: is it possible to override this behaviour without changing "cloudera.impalaodbc.ini" file? I've tried to add DriverManagerEncoding to connection string, but it has no effect. The reason why I ask is that it is not always possible to edit "/opt" files, and also it is something that one can easily forget with the driver upgrade.

@quinnj
Copy link
Member

quinnj commented Jun 6, 2020

Hmmmm, I tried all the tricks I know, but it doesn't seem like there's a way to override it w/o directly editing the file. Sorry. Some of these drivers can be such a pain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants