Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan with filtering stopping on an empty page #180

Closed
nyh opened this issue May 29, 2024 · 9 comments · Fixed by #225
Closed

Scan with filtering stopping on an empty page #180

nyh opened this issue May 29, 2024 · 9 comments · Fixed by #225
Assignees
Labels

Comments

@nyh
Copy link

nyh commented May 29, 2024

A user in stackoverflow, https://stackoverflow.com/questions/78544921/gocql-allow-filtering-returns-empty-records-with-page-size reported a bug when using Scylla and gocql, and scanning an entire partition with filtering (ALLOW FILTERING): If an entire page of results did not match the filter, then Scylla returns an empty page and gocql incorrectly stops the iteration immediately instead of continuing for the next page. His report has example code he used. I have not verified this report myself (I never used gocql myself).

If this report is true, I suspect it is a gocql bug, not a Scylla bug. We used to have this bug in the Python driver but it was fixed years ago (see scylladb/scylladb#8203, and the fix datastax/python-driver@1d9077d), and in the Java driver (apache/cassandra-java-driver#1544) and now I suspect we have the same bug in gocql.

We have a test in Scylla's test suite, test/cql-pytest/test_filtering.py::test_filtering_contiguous_nonmatching_partition_range and test_filtering_contiguous_nonmatching_single_partition, using the fixed Python driver, which checks exactly this situation: a filtered scan of a table or partition with a large number of rows, only the very last of them actually matching the filter, so the first pages of results are empty. This is why I'm guessing the bug is in gocql, not in Scylla itself. By the way, if you want you can translate one of these tests into Go and see if it fails with gocql and later use it as a gocql unit test.

By the way, after we solved this bug in the Python and Java drivers, we started to rely more and more on sending back empty pages. For example, we now send back an empty page also if we find ourselves scanning a very long list of tombstones, to avoid timeouts. Our drivers need to handle this case correctly.

@mykaul
Copy link

mykaul commented Jun 16, 2024

@sylwiaszunejko - did we reproduce this? Do we have an existing test in other drivers (Python and friends) that we can 'translate' it?

@sylwiaszunejko
Copy link
Collaborator

@mykaul I haven't started working on this issue yet, but I see that there is a test for empty pages in python-driver added datastax/python-driver@1d9077d

@moksh-samespace
Copy link

Any update on the above issue?

@dkropachev
Copy link
Collaborator

dkropachev commented Jul 7, 2024

Update:

  1. Implemented test for it in Have a test to make sure paging feature works properly with allow filtering queries #212
  2. Behavior that is mentioned on stackoverflow is not observed
  3. But I have found what could have probably confused the user, which is iterator can produce empty result, which is expected behavior in a sence that since it is ALLOW FILTERING query, every shard executes it and if there is no such record it returns empty result, which driver pass down to the user.

Quick fix for user: update iteration logic to stop iterating only when len(iter.PageState()) == 0

On our side we can probably find a way to automatically retry on iterators with no results, or just mention the fact of posibility of having empty results.

I have quickly investigated posibility of automatically retry on empty iterators:
In order to do that we will need to somehow pass iter.PageState() to a Query and reexecute it.
Since it is manual pagination case, it should be done only by user and doing implicitly will confuse code and probably user, so I strongly believe only way to properly handly is to mention posibility of having iterators with no results in the documentation.

@nyh
Copy link
Author

nyh commented Jul 7, 2024

Is the iterator in the Go API for pages or for individual rows?
If it's for rows, it can't return nothing, it needs to return a row. Even if jt needs to perform multiple queries.

@dkropachev
Copy link
Collaborator

dkropachev commented Jul 7, 2024

Is the iterator in the Go API for pages or for individual rows? If it's for rows, it can't return nothing, it needs to return a row. Even if jt needs to perform multiple queries.

I have given it my best shot here, please take a look.
It still can produces empty iterator, but only for last page.
Otherwise it makes sure that Query.Iter() does not give away empty result.

@dkropachev
Copy link
Collaborator

Fixed in #212

@sylwiaszunejko
Copy link
Collaborator

@nyh Did you see @dkropachev PR? Are you satisfied with the outcome?

@roydahan
Copy link
Collaborator

@dkropachev please check if we want to add some documentation about previous behavior vs new.
Then, we can close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants