Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic parametrised query generation (WIP) #201

Merged
merged 7 commits into from
Jan 15, 2019

Conversation

grigi
Copy link
Contributor

@grigi grigi commented Jan 7, 2019

This PR adds parametrized query building by adding a Parameter(':1') term that will be the parameter placeholder.
Due to placeholders in SQL for parametrized queries being so varied (even between drivers or setup for the same dialect) that I decided to leave implementation as explicit as possible.
Whatever is indicated as the string is included verbatim into the SQL.

An example of this madness:
PostgreSQL → $number OR %s + :name (depending on driver)
MySQL → %s
SQLite → ?
Vertica → :name
Oracle → :number + :name
MSSQL → %(name)s OR :name + :number (depending on driver)

This PR is a work-in-progress:

  • Added Python3.7 and PyPy3.5 to Travis builds.
  • Added some tests of when to NOT parametrize.
  • Added some parameter tests.
  • Implemented a Parameter() term, and added to root __init__.py for convenience
  • Removed __str__() where the parent has the same variant implemented.
  • Update documentation

Fixes #113, potential workaround for some of #3

@grigi grigi requested review from twheys and a team as code owners January 7, 2019 21:35
@coveralls
Copy link

coveralls commented Jan 7, 2019

Coverage Status

Coverage increased (+0.07%) to 98.088% when pulling 6ce8072 on grigi:parametrized_query into 38f1d6d on kayak:master.

@twheys
Copy link
Contributor

twheys commented Jan 8, 2019

Hi, thanks for the PR. This feature definitely a big ask from a lot of people, so appreciate you contributing it. So far the change looks good, let me know if you need anything in order to check off the remaining boxes.

@twheys twheys self-assigned this Jan 8, 2019
@grigi
Copy link
Contributor Author

grigi commented Jan 9, 2019

I did some more digging on the 110 tests that fail due to kwargs['params'] not existing is all lower level unit tests.
Which would mean either rewriting all 110 tests or leaving the test-specific hack in:
https://github.com/kayak/pypika/pull/201/files#diff-425db19c78eea4c6ae89f8b70c621969R244

I started changing some tests, (I estimate ~500loc diff just to change each of these tests) and found that negation (from test_negation.py) will now generate -? which I'm not sure is valid SQL.
Another concern is that doing str() on anything that is not a QueryBuilder may be an illegal operation now, and it is used everywhere in the tests.

Please advise what you think is the best course of action re this?
I feel removing __str__() from anything but QueryBuilder is the safe thing to do, but could break backwards compatibility. But leaving a condition that should only be triggerd by tests is not right either. This issue could also be outside the scope of this PR and should instead be considered a deprecation issue?

@grigi
Copy link
Contributor Author

grigi commented Jan 9, 2019

Other than API breakage, removing __str__() from all terms is only 132 changes, but in there it generates a default param of quote_char='"'. Which begs the question, can we not also just add params and then do string interpolation? But then what is the difference between leaving the hack in as is?

@twheys
Copy link
Contributor

twheys commented Jan 9, 2019

I think it would be really helpful if you would start with some basic tests covering the syntax you would like to use for parameters so it is clear how the pypika query should be built up and the how the SQL generated by it should look.

Using How to use variables in SQL statement in Python? as a reference, it appears that different db vendors use different markers for parameters, so should aim for something that gives the user control over that.

I'm not sure why you would need a second get_sql function to generate these queries. To support paramterized queries, you basically need a query element that serializes as the marker in the query like ?, %s, or :1. Perhaps, instead of mixing this logic in with ValueWrapper, it would make sense to define it as it's own class that can be used by the end user directly in expressions.

I think a syntax like the following would work real well and be easy to implement without needing to change so much of the internals:

customers = Table('customers')
q = Query.from_(customers)\
    .select(customers.id, 
            customers.fname, 
            customers.lname, 
            customers.phone)\
    .where(customers.age >= Parameter('?'))

would generate SQL like

SELECT id,fname,lname,phone FROM customers WHERE age>=?

Then the user can decide which marker to use for each variable, can freely reuse them, and use them directly with the python db API.

cursor.execute(q, 18)

@grigi
Copy link
Contributor Author

grigi commented Jan 9, 2019

Yes, having a Parameter() function would allow us to differentiate between something static and something parametrized.

The question is, right now (whilst I was writing tests) it seems that evaluation happens in the order one does calls, but will this always be the case?
If there isn't a guarantee, one would need an option to identify the order of the parameters.
In that case we would need a way to determine its order, e.g. by naming the parameters or some other way of identifying order.

@grigi
Copy link
Contributor Author

grigi commented Jan 10, 2019

So far I found:
PostgreSQL → $number OR %s + :name (depending on driver)
MySQL → %s
SQLite → ?
Vertica → :name
Oracle → :number + :name
MSSQL → %(name)s OR :name + :number (depending on driver)

That is a lot of variance as to how the parameters get defined.
I don't think one can realistically expect that PyPika can generate a valid value, especially when the only option is named parameters...

Maybe your suggestion of just expecting a string value with the right value is the most sane one?

@twheys
Copy link
Contributor

twheys commented Jan 10, 2019

I would hesitate to try and generate which parameter is which because you ultimately can never know how the user will want to use it. For example, what if they want to reference the same parameter in two places within the query. Giving them control over which reference is used for each parameter will 1) force them to use the right one for their database platform and 2) give them full control over how the parameters are used in the query.

To answer your question, I had it in mind to refactor the way the SQL is generated at some point since that is by a long shot the most complex part of this library, but I don't think that would change the order of how constituents of the query are serialized. You can probably safely assume, for example, that if multiple WHERE-clauses are used, they will be serialized in the SQL in the exact same order.

@grigi
Copy link
Contributor Author

grigi commented Jan 11, 2019

I was thinking the same. I had to do some research to get data, and two things make it a bad idea to generate inside PyPika:

  1. different syntax per driver/driver-config (for the same dialect)
  2. named parameters

So I think a very simple Parameter('&2') is actually the best solution.
I'll redo the PR, and it will be a lot less code. Maybe default it to '?'

@grigi
Copy link
Contributor Author

grigi commented Jan 14, 2019

@twheys Please review.

@twheys twheys merged commit ac939d9 into kayak:master Jan 15, 2019
@twheys twheys added the 0.21.0 label Jan 15, 2019
@twheys
Copy link
Contributor

twheys commented Jan 15, 2019

Looks great. Approved, merged, and published. Appreciate you added py37 as well. Thanks 👍

@imbolc
Copy link

imbolc commented May 14, 2020

I see the point of having explicit Parameter term. But it's still so annoying, that I ended up mostly avoiding it with something like:

q = Query.from_(table).select("*").where(table.id == "?")
sql = str(q).replace("'?'", "?")

Maybe we add some kind of hook to the end of get_sql, which will trigger a globally defined function? So instead of adding replace to each query, it would be just:

pipika.get_sql_patch = lambda s: s.replace(...)

This way people themselves could decide how to parse placeholders for their particular driver, with regex or something. What do you think?

@mvanderlee
Copy link
Contributor

mvanderlee commented Apr 8, 2021

The current implementation does not work for us. We have a complex where clause builder that would need to be completely rewritten to support the current parameterization.
The first commit that modified the ValueWrapper was on the right track in my opinion. It just needs some minor changes to still be able to support different parameter formats.

We solved our issue by patching the ValueWrapper as follows:

from typing import Any, List, Optional
from pypika.terms import ValueWrapper
from pypika.utils import format_alias_sql

class ParameterizedValueWrapper(ValueWrapper):
    
    def get_sql(self, quote_char: Optional[str] = None, secondary_quote_char: str = "'", param_wrapper: ParameterWrapper = None, **kwargs: Any) -> str:
        if param_wrapper is None:
            sql = self.get_value_sql(quote_char=quote_char, secondary_quote_char=secondary_quote_char, **kwargs)
            return format_alias_sql(sql, self.alias, quote_char=quote_char, **kwargs)
        else:
            value_sql = self.get_value_sql(quote_char=quote_char, **kwargs)
            param_sql = param_wrapper.get_sql(**kwargs)
            param_wrapper.update_parameters(param_key=param_sql, param_value=value_sql, **kwargs)

            return format_alias_sql(param_sql, self.alias, quote_char=quote_char, **kwargs)


class ParameterWrapper:
  
    def update_parameters(self, param_key: Any, param_value: Any, **kwargs):
        pass

    def get_sql(self, **kwargs):
        pass


class QParameterWrapper(ParameterWrapper):
    def __init__(self, parameters: List[Any]):
        self.parameters = parameters

    def update_parameters(self, param_key: Any, param_value: Any, **kwargs):
        self.parameters.append(param_value)

    def get_sql(self, **kwargs):
        return '?'


class NamedParameterWrapper(ParameterWrapper):
    def __init__(self, parameters: Dict[str, Any]):
        self.parameters = parameters

    def update_parameters(self, param_key: Any, param_value: Any, **kwargs):
        self.parameters[param_key[1:]] = param_value

    def get_sql(self, **kwargs):
        return f':param{len(self.parameters) + 1}'

Usage:

pypika.terms.ValueWrapper = ParameterizedValueWrapper
q = pypika.Query.from_('test').select('*').where((pypika.Field('name') == 'foobar') & (pypika.Field('foo') != 'bar'))

parameters = []
sql = q.get_sql(param_wrapper=QParameterWrapper(parameters))
print(sql)
print(parameters)

parameters = {}
sql = q.get_sql(param_wrapper=NamedParameterWrapper(parameters))
print(sql)
print(parameters)

Output:

SELECT * FROM "test" WHERE "name"=? AND "foo"<>?
['foobar', 'bar']
SELECT * FROM "test" WHERE "name"=:param1 AND "foo"<>:param2
{'param1': 'foobar', 'param2': 'bar'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for parametrized queries
5 participants