Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend query (filter) language to include substring comparison operators #42

Closed
sauliusg opened this issue Jun 15, 2018 · 7 comments
Closed
Labels
status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. topic/filtering-language Issue discussing changes and improvements to the query and filtering language type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus.

Comments

@sauliusg
Copy link
Contributor

Let's add the following operators to the filter language:

  • string_property LIKE "value" # as in SQL 'select string_property from data where string_property like "%value%"'

  • string_properties STARTS WITH "value" # as in SQL 'select string_property from data where string_property like "value%"'

  • string_properties ENDS WITH "value" # as in SQL 'select string_property from data where string_property like "%value"'

  • string_properties UNLIKE "value" # as in SQL 'select string_property from data where string_property NOT like "%value%"'

@dwinston
Copy link
Contributor

I agree to extend the query filter language to include the capability for substring comparison. I propose an alternative to the above (or perhaps including the above as syntactic sugar): we can add a single operator, REGEX, to the filter language. The above four cases would map to the following equivalents:

  • string_property REGEX "value"
  • string_property REGEX "^value"
  • string_property REGEX "value$"
  • NOT string_property REGEX "value"

Furthermore, filters more powerful than substring comparison are enabled by REGEX. I propose the value for this operator be interpreted by the server as a Perl compatible regular expression (i.e. “PCRE” ) version 8.39 with UTF-8 support.

@dwinston dwinston added status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus. topic/query-string Issues relating to the query string sent to the OPTIMaDe api, excluding the filter language. topic/filtering-language Issue discussing changes and improvements to the query and filtering language and removed topic/query-string Issues relating to the query string sent to the OPTIMaDe api, excluding the filter language. labels Jun 15, 2018
@rartino
Copy link
Contributor

rartino commented Jun 21, 2018

@dwinston, What do you suggest an API implementation should do if the underlying backend does not allow string queries using specifically PCRE version 8.39 with UTF-8 support? But, say, some other REGEX format? I'm imagining this would be the typical case, and it seems nasty for essentially all OPTIMaDe implementations to do some form of REGEX translations?...

@dwinston
Copy link
Contributor

I suggest an optional attribute "operator_notes" returned by the base URL info endpoint. This attribute is a dictionary, with keys being operators and values being short notes of interest to a client. For the REGEX operator, if an OPTIMaDe implementation uses PCRE 8.39 with UTF-8 support, they MAY provide the field. However, if an implementation interprets a regex differently, it MUST include "operator_notes.regex" and a corresponding value understandable to a human.

I don't see another way around having a default spec for what a regex is (in this case, that supported by MongoDB 3.2+), and, if we insist on implementors being able to deviate from that and still claim to support the operator of the same name, to provide metadata in /info to inform a client implementation.

@rartino
Copy link
Contributor

rartino commented Jun 23, 2018

What you suggest seems problematic from a user perspective. Someone who wants to send a regex-type query to many databases will now need to manually deal with these differences in support. They would have to go through all target databases to check compatibility, and possibly manually translate between different regex formats. A user that isn't careful will easily end up with a mix of data resulting from different interpretations of their regex.

Individual databases can already support an extended filtering language on a query parameter like _exmpl_filter=..... Maybe it is reasonable to stay with the simpler substring operators in the standard filter language (which hopefully can be supported as specified in any reasonable backend), and defer full support for REGEX to database-specific extended filtering?

@dwinston
Copy link
Contributor

Okay, I agree that lack of unity wrt REGEX interpretation makes adding it to the filter syntax too complex at this time. For now, I drop my advocacy for adding it to the standard filter language. I am for adding the operators as @sauliusg proposed.

@giovannipizzi
Copy link
Contributor

This is partially addressed by #69, what remains to do are "LIKE" operators

@sauliusg sauliusg mentioned this issue Jun 12, 2019
@merkys
Copy link
Member

merkys commented Jun 26, 2019

Closing this issue as what remains of it is fully covered in #87.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. topic/filtering-language Issue discussing changes and improvements to the query and filtering language type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus.
Projects
None yet
Development

No branches or pull requests

5 participants