Skip to content

to_sql double quoting in create index #9393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dashesy opened this issue Feb 1, 2015 · 10 comments · Fixed by #9411
Closed

to_sql double quoting in create index #9393

dashesy opened this issue Feb 1, 2015 · 10 comments · Fixed by #9411
Labels
IO SQL to_sql, read_sql, read_sql_query
Milestone

Comments

@dashesy
Copy link
Contributor

dashesy commented Feb 1, 2015

I have table names that include _, . and -, up until Pandas .14 I used to quote the table name like this:

    def quote_identifier(s, errors="ignore"):
        encodable = s.encode("utf-8", errors).decode("utf-8")

        nul_index = encodable.find("\x00")

        if nul_index >= 0:
            error = UnicodeEncodeError("NUL-terminated utf-8", encodable,
                                       nul_index, nul_index + 1, "NUL not allowed")
            error_handler = codecs.lookup_error(errors)
            replacement, _ = error_handler(error)
            encodable = encodable.replace("\x00", replacement)

        return "\"" + encodable.replace('"', '""') + "\""

then call:

    pd_sql.to_sql(group, quote_identifier(name), con, if_exists='replace')

But since 0.15 if I do not quote CREATE TABLE (or DROP TABLE) will give me an error:

    OperationalError: near "-": syntax error

And if I do quote CREATE INDEX will give me an error:

    OperationalError: near ""_b.m_test_01-30"": syntax error

The problem is that in CREATE INDEX ix_{tbl}_{cnames} ON {tbl} ({cnames_br}) CREATE INDEX thinks it is the only one trying to prepend a namespace with _ and thus does not check if {tbl} is already quoted.

The proper fix should do what quote_identifier does with " and escape them, or just use quote_identifier and then prepend ix_.

This is the query that it tries and gives error:

CREATE INDEX ix_"_b.m_test_01-30"_index ON "_b.m_test_01-30" ([index])
@dashesy
Copy link
Contributor Author

dashesy commented Feb 1, 2015

Here is the patch I am now using in my monkey patch and works fine:

    if len(ix_cols):
        cnames = "_".join(ix_cols)
        cnames_br = ",".join([br_l + c + br_r for c in ix_cols])
        idx_name = quote_identifier("ix_{tbl}_{cnames}".format(
            tbl=self.name, cnames=cnames
        ))
        create_stmts.append(
            "CREATE INDEX {idx_name} ON {tbl} ({cnames_br})".format(
            idx_name=idx_name, tbl=self.name, cnames_br=cnames_br))

I will be glad to submit proper PR, but wanted to know if you think it might make even more sense to apply the quote automatically so that one does not have to quote table name beforehand and call to_sql?

@jorisvandenbossche jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Feb 1, 2015
@jorisvandenbossche jorisvandenbossche added this to the 0.16.0 milestone Feb 1, 2015
@jorisvandenbossche
Copy link
Member

This does indeed looks like a bug.

But, for the record:

cc @artemyk

@dashesy
Copy link
Contributor Author

dashesy commented Feb 2, 2015

This is sqlite3.
I tried the master branch '0.15.2-146-gef48c6f' and I do not even need quote_identifier anymore!
However if I do use quote_identifier then the quotations themselves will end up used in table name, which is a little weird to have a table that its name starts with ".
However I think this issue is resolved, I cannot wait to get this in the release, thanks!

This is a simple test case btw:

    import pandas as pd
    import sqlite3 as db
    from pandas.io import sql as pd_sql

    dbname = '/home/dashesy/test.sqlite'
    con = db.connect(dbname)
    df = pd.DataFrame(np.ones(3))
    pd_sql.to_sql(df, '_b.m_go_test_01-30', con, if_exists='replace')

    pd_sql.to_sql(df, quote_identifier('_b.m2_go_test_01-30'), con, if_exists='replace')

I think this would be still an issue if one has a table created in 0.14 and wants to access it in the fixed 0.15 and still quotes the table name, but this is a little too stretched.

@jorisvandenbossche
Copy link
Member

@dashesy Thanks for testing!

We can add your test case (or if you want to do that in a PR, certainly welcome!)

@dashesy
Copy link
Contributor Author

dashesy commented Feb 4, 2015

For PR I did not find something related so added a new module. This is my first PR here, so please let me know if I need to revise for coding style or anything.
I think the docs should also mention that there is no need to quote the table name if there are special characters in it.

@jorisvandenbossche
Copy link
Member

@dashesy I pointed you to the correct place for sql tests

About the docs, you can certainly add somewhere a note that table names do not have to be quoted

@jreback
Copy link
Contributor

jreback commented Feb 4, 2015

@jorisvandenbossche did you mean to close this? (as the PR is still open/not merged)

@jorisvandenbossche
Copy link
Member

ah no, that button ..

@dashesy
Copy link
Contributor Author

dashesy commented Feb 4, 2015

I think this one is the right place, I get many errors on AttributeError: '_TestSQLApi' object has no attribute 'connect' is there anyway to silence nose about the base class? I also added a line to the docs, again let me know if that was the right place.

@artemyk
Copy link
Contributor

artemyk commented Feb 5, 2015

Great to hear that #8986 fixed this pre-emptively ;-) Yes, you can have table names with quotes since those appear to be legal identifier names in sqlite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants