Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.count() can return a different result when search has already been executed #1900

Open
knoebber opened this issue Sep 3, 2024 · 2 comments
Labels

Comments

@knoebber
Copy link

knoebber commented Sep 3, 2024

Reproduction:

>>> from elasticsearch_dsl import Search
>>> search_with_post_filter = Search(index='product').post_filter('terms', categories=['B'])
>>> search_with_post_filter.count()
2818
>>> search_with_post_filter.execute()
<Response: [...]>
>>> search_with_post_filter.count()
1

The first count() call does a normal count request, which does not respect post filters. The second .count() call uses the cached response's total hits property, which does respect post filters.

I see that this is somewhat intentional based on the comments from #518 . But, In my opinion, this behavior is a bug because the value of a method call changes depending on hidden internal state, which doesn't align with this libraries immutable API design.

I'm not sure what the ideal behavior would be - maybe never used the cached result for .count() if the search has a post filter?

@miguelgrinberg
Copy link
Collaborator

Thanks, I'll think about what can be done here. I actually think the best solution was the way it was before #518, which is to not allow count() to work if there is a post filter. That is the most consistent behavior.

Why don't you just separate the two queries? That solves your problem I think?

>>> from elasticsearch_dsl import Search
>>> search = Search(index='product')
>>> search_with_post_filter = search.post_filter('terms', categories=['B'])
>>> search_with_post_filter.execute()
<Response: [...]>
>>> search.count()
2818

This is the best solution. Both queries are cached, and both return accurate results no matter how many times you ask and no matter the order of execution.

@knoebber
Copy link
Author

knoebber commented Sep 4, 2024

Yeah, I agree it's easy to work around, and it's not causing any problems on the code base that I'm working on. That snippet of code was just meant to highlight the issue.

Thanks for the response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants