Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chaining FilterExpressions #280

Open
developer992 opened this issue Jul 8, 2024 · 12 comments
Open

Chaining FilterExpressions #280

developer992 opened this issue Jul 8, 2024 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@developer992
Copy link

developer992 commented Jul 8, 2024

Hello,

Trying to follow this PR: #113

love this change, very good implementation but am having some problems chaining AND and OR operators together.

item = {
    "id": 1,
    "name": 'item name',
    "description": "foo-bar",
    "city": "1000",
}

# Query should look like this:
term="foo"      # search by string term on two fields - OR on name and description
city=1000       # AND search only those with city = 1000


get_entities().filter(Q(name__contains="foo") | Q(description__contains="foo"), city="1000")

# It does this:
{'$filter': ["city eq '1000' and (substringof('foo', name) eq true) or (substringof('foo', description) eq true)"]}

But it needs another ( ) around substring filters to fetch correct items.

Is this possible using these filters?

Many thanks!

@developer992
Copy link
Author

developer992 commented Jul 8, 2024

wait, I GOT IT

this works:

get_entities().filter(Q(name__contains="foo") | Q(description__contains="foo"), Q(city="1000"))

Not sure if it's a bug or just documentation wrong.

Anyway, thanks!

Edit: No, problem persist

@developer992
Copy link
Author

developer992 commented Jul 11, 2024

Actually, no

I think this is a bug:

The problem is in this function:

def _combine_expressions(self, expressions):
    return ' and '.join(expressions)

This function gets called several times during the process and the problem is only in the last call when it calls it with all expressions.

I've improved it like so:

Basically it wraps expressions in "( )"

    def _combine_expressions(self, expressions):
        if len(expressions) > 1:
            combined = ""
            for counter, expr in enumerate(expressions):
                combined += f"{' and ' if counter > 0 else ''}({expr})"

            return combined
        
        return expressions[0]

it then calls the service with such url:

/v2/service/odata-service/odata-table?%24top=100&%24skip=0&%24filter=%28plant+eq+%271000%27%29+and+%28h5+eq+%279%27+or+h5+eq+%27VSA113211%27+or+h5+eq+%27EVL110022%27%29+and+%28mr+eq+%27N40%27%29&%24inlinecount=allpages

which looks like so when parsed:

(plant eq '1000') and (h5 eq '9' or h5 eq 'VSA113211' or h5 eq 'EVL110022') and (mr eq 'N40')

which seems to return the correct items.

I hope i haven't broke anything else though.

@developer992 developer992 reopened this Jul 11, 2024
@phanak-sap
Copy link
Contributor

phanak-sap commented Jul 14, 2024

Hi @developer992 - seems you put lot of time and effort into this.

I must say I am not that much familiar with the part of the code from the PR #113. Could you modify the comment to a pull request with new tests, reproducing your problem? Then the fix from part "I've improved it like so:" would be in the same PR and either all passes or maybe "you broke anything else" :)

Is the problem ONLY about "chaining AND and OR operators " or it would be reproducible by chaining any combinations of filter expressions in this ORM style?

Also, I see it is practically all there in the comment, but I would like you to take credit for the bugfix in git history :)

@phanak-sap phanak-sap added the bug Something isn't working label Jul 15, 2024
@developer992
Copy link
Author

developer992 commented Jul 15, 2024

ok cool :) i will try to run test and create a proper PR, thanks!

yes i think the problem is only when combining AND and OR filters, consider this simple example, if you wanted to chain 3 or 4 filters

filter1 eq 'foo' and filter2 eq true or filter3 eq 'bar'
filter1 eq 'foo' and ( filter2 eq true or filter3 eq 'bar' )

The first one will take items with filter3 eq 'bar' OR items with filter1 and filter2
which is different set of items than 2nd example, because it will take items with filter1 AND one of filter2 or filter3

but we'll know more after running the tests :)

@phanak-sap
Copy link
Contributor

@developer992
Copy link
Author

Hej thanks .. i haven't forgotten, was just busy with project, am still planing on preparing the PR :)

Have some trouble understanding a few things tho, not sure where to ask, perhaps just here?

i wanted to write a test on master branch code which would prove the existence of said bug. Then, after fixing the code, i'd write another test or improve existing one which would show the fix actually works.

But i don't quite understand these existing tests. For instance, responses library records/caches responses so the subsequent test runs would run against cached responses, correct? Am still navigating through their documentation to clear things up a bit, how to register a new remote url and re-cache the responses...

and 2nd thing, take this test for instance, tests/test_service_v2.py::test_count_with_chainable_filter_multiple

it filters some Employees and verifies the count of returned items.
The metadata that the tests use, is this tests/metadata.xml but how does it know there are in fact 3 employees that fit the criteria ? In fact, pretty much all tests verify that the count == 3 but i can't find if this is true or just hardcoded value?

i figured, if i wanted to write a test that would prove filtering is wrong, then i need to know for certain how many items there are and that way, write filters that would return the wrong count, but if its just hardcoded value, then this becomes hard to prove on these test services.

@phanak-sap
Copy link
Contributor

phanak-sap commented Aug 1, 2024

Hi @developer992

  • understandable, no rush. I would probably somewhere in the future dig into that myself, but also did not have enough time yet :)
  • responses library provide IMHO just mock HTTP response per test, as used in the reference module by @responses.activate. You can just copy-paste the test_count_with_chainable_filter_multiple test with new name and modify the responses.add section. In your case is the problem how the URL should look like, correct?
  • not sure about the caching between subsequent test runs, you got me here, my understanding is that they are idempotent both each test and each test run. But it should not be anyway a problem if there is stored the cached mock.. since the mocked HTTP response (URL, payload) should be always same in multiple test runs, when you modify the code of the service. And if you modify code of the mock, it should invalidate the cache anyway, if there is any usage of such caching mechanism.
  • the question about count of employees - yes it is hardcoded. The fixture with metadata provide the structure of the odata service. But there is no actual real odata service with database that would provide data for specific request. So this is part of the responses mock, in your example test in line . The focus is no on the data itself (it could be anything, we do not have database), but if the URL is correct for the pythonic employees.filter(ID=23, NickName="Steve").count() and if relevant pyodata fields are filled from the mock odata/http response as expected.
  • in your bug of wrong filter chaining, the point is primary that the URL should be as you expected, not as (currently) pyodata generates. So if you copy and modify the test for the example you provided here in comment, it should fail because nothing will try reach the expected mock URL. "how many items there are" is not the point here. It would be basically problem of database content, which we ignore in this case entirely. We do not actually care if there are 3 or 5 or 100 "Steves". That would be actually test that would be covering the specific Odata service backend (are for correct URL returned expected data). We are focusing with pyodata solely on the client side (is for this python code generated by the library expected HTTP request and parsed correctly mocked response).

@phanak-sap
Copy link
Contributor

different point so different comment
"i wanted to write a test on master branch code which would prove the existence of said bug. Then, after fixing the code, i'd write another test or improve existing one which would show the fix actually works."

Not sure if I understand this. But you should create a branch in your fork, create a new test(s) that will cover the scenario you described in the comment. Existing tests are not covering this obviously.

When you run pytest, it will fail, since there is your bug in pyodata. You fix the bug in the service.py module, so your test(s) for filterExpression chaining will start passing. You commit the tests and fix together in same branch and create one PR.

@developer992
Copy link
Author

Ah ok, I see,.. so these tests basically just verify that the generated query url is correct (the premise being, if we query with the correct url we should receive back the correct items ) and not that the service actually returns the right items .... I misunderstood what the responses library does ... i was comparing it to pytest-vcr library ( https://pytest-vcr.readthedocs.io/en/latest/ ) which i am more familiar with for testing remote services

ok, so here's the pickle .... my "fix" changes how the url is constructed:

before: startswith(NameFirst, 'Steve') eq true and ID eq 23 or ID eq 25 or ID eq 28
after: (startswith(NameFirst, 'Steve') eq true) and (ID eq 23 or ID eq 25 or ID eq 28)

This would require changing urls in pretty much all the tests though and we still wouldn't be certain if it works, we would just know the urls include brackets now but not how the underlying odata services handles these.

the one I use appears to apply them correctly but i do not know enough about odata protocol to claim this is generic enough for all odata services

@phanak-sap
Copy link
Contributor

phanak-sap commented Aug 8, 2024

Hi @developer992

  1. Could you provide an url for what you have so far? e.g. branch in your fork, with new test reproducing the problem, perhaps with the current idea of a fix? I see nothing here on github.com

  2. Resources for odata protocol related to this issue:

Odata V2: https://www.odata.org/documentation/odata-version-2-0/uri-conventions/
section 4.5. Filter System Query Option ($filter)

Odata V4: - which IMHO for $filter section does not have backward incompatible changes to V2, just option to be more extended. So valid also for the problem of how the URL should be constructed.
http://docs.oasis-open.org/odata/odata/v4.0/errata03/os/complete/part2-url-conventions/odata-v4.0-errata03-os-part2-url-conventions-complete.html#_Toc453752358

  1. I expect that as a workaround you should be able use the .filter() directly with grouping operator inside a string, without the ORM style.

e,g:
.filter("filter1 eq 'foo' and ( filter2 eq true or filter3 eq 'bar' )")

This does not mean that this issue is not a valid, just pointing this option out.
Maybe you will settle with just this, compared to what seems to be more complex problem to cover :)

  1. The grouping operator seems not to be supported at all. In fact, most of the operators and functions from the section 4.5 are missing in the
    OPERATORS = [

But we do not have any kind of telemetry of the code usage, other than the issues. And it seems this is the first bug report related to the ORM style of filters. Maybe it is not used that much at all, maybe due its limiting factors people tend to write the query directly as in point 2.

  1. last is the MOST IMPORTANT point - IMHO you cannot with your fix automatically put "(" and ")" in the query string. You cannot be sure what query the developer wants to execute. In your simpler comment example, should it be :

filter1 eq 'foo' and ( filter2 eq true or filter3 eq 'bar' )

or

(filter1 eq 'foo' and filter2 eq true) or filter3 eq 'bar'

both should most probably return different set of data in the response.

Correct me if I am wrong, but my understanding of the _combine_expressions() from comment (and no other code I can check) does exactly that,

This IMHO leads into the question about if all URLs should be fixed - e.g. (startswith , but this not always necessary. So definitely not. On the contrary, every existing URL you would like to change in the tests is valid Odata request URL - when no grouping operator is even used in the query. Examples are in the odata specification and years of working of pyodata for such simpler queries correctly. There should be IMHO only missing complex urls generation, as pointed out by the original problem of chaining AND and OR operators together

@developer992
Copy link
Author

Well I apologize, This is my first project with OData services and SAP cloud services so i am a bit noobie ...

This library doesn't support v4 services so that's out of the question even though that was my first try.

It still seems to me, the current implementation doesn't work with __in operator and i cannot write a test reproducing the problem because we don't have integration tests. Just comparing URLs won't prove anything.

Here's is the method i am using:

    # pylint: disable=no-self-use
    def _combine_expressions(self, expressions):
        if len(expressions) > 1:
            combined = ""
            for counter, expr in enumerate(expressions):
                combined += f"{' and ' if counter > 0 else ''}({expr})"

            return combined

        return expressions[0]

And here the test which doesn't work due to URL mismatch:

@responses.activate
def test_chaining_fix(service):
    from pyodata.v2.service import FilterExpression as Q

    url = f"{service.url}/Employees?%24filter=startswith%28NameFirst%2C+%27Steve%27%29+eq+true+and+ID+eq+23+or+ID+eq+25+or+ID+eq+28"
    
    responses.add(
        responses.GET,
        url,
        json=3,
        status=200)
    
    employees = service.entity_sets.Employees.get_entities()
    request = employees.filter(Q(NameFirst__startswith="Steve"), Q(ID__in=[23, 25, 28]))

    assert isinstance(request, pyodata.v2.service.GetEntitySetRequest)

    assert request.execute() == 3

I tried pushing my branch but I don't have write permissions.

If you have ideas for a better fix, please go ahead, SAP is by all means big enough to afford fixing this on their own :)

Thanks

@phanak-sap
Copy link
Contributor

phanak-sap commented Aug 23, 2024

Hi @developer992
I reopened this because it is a valid bug/enhancement request - regardless of who will actually do the change.

There is no problem that somebody is noobie with odata services. I am not much expert of everything in the odata specification either. If you wish to continue this even as first issue on github, it is actually a good one.

Perhaps as a first small step to familiarize with the workflow, I would recommend first just add support for one or few filter operators that are missing. For example "not" operator, since we are here with "and" and "or", or the arithmetic part from https://www.odata.org/documentation/odata-version-2-0/uri-conventions/. Writing new test for those will be simpler. Grouping is as we explored a bit more complicated.

"I tried pushing my branch but I don't have write permissions." - you are not allowed to push directly to this repository. The way you do it is to Fork it (this create your own github repository), push your branch there and create pull request. This is usually a standard on github/Open-source, so good to learn. Refere to Github docs or ton of videos - https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo + https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork

"SAP is by all means big enough to afford fixing this on their own" - yes, but this is not my primary work assignment. We just open-sourced to community something we wrote and used as inner-source for writing integration test and found that every existing python odata libraries does not cover our needs. In other words, this has not same level of support as a customer facing products.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants