Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time_modified pagination can fail if modifications occur during traversal #972

Closed
ElectricNroff opened this issue Dec 21, 2022 · 3 comments · Fixed by #987
Closed

time_modified pagination can fail if modifications occur during traversal #972

ElectricNroff opened this issue Dec 21, 2022 · 3 comments · Fixed by #987
Assignees

Comments

@ElectricNroff
Copy link
Contributor

The pagination implementation for the GET /cve-id and GET /cve endpoints, in the time_modified case, can produce incomplete data if an item is modified during traversal across the multiple pages of results. For example, if the user selects a bounded date range such as time_modified.gt=date1 and time_modified.lt=date2, where date2 is earlier than the time that traversal started, it is still possible for an item to exist such that time_modified.lt=date2 was true for page=N but false for page=N+1. In other words, an item can move forward in time such that it is outside of the bounded interval.

As a specific example, the production server was queried 20 times, starting at about 1800Z, for GET /cve?time_modified.gt=2022-10-09T01:00:00.000Z&time_modified.lt=2022-12-21T16:44:00.000Z. Each set of queries needed to traverse through 14 pages. The first three sets had "totalCount":6633 and the last seventeen had "totalCount":6632. This occurred because CVE-2022-3691 has:

  • "dateUpdated":"2022-12-21T18:12:09.217Z"
  • "datePublished":"2022-11-21T00:00:00"

In other words, it jumped out of the bounded range because of the modification at 18:12:09.217Z.

For most of the later sets of queries, the page layout was correct but different: specifically, CVE-2022-39957 moved from the top of page 9 to the bottom of page 8. However, for the third set of queries, CVE-2022-39957 was simply never found, although it was always in the bounded interval with no recent changes; it has:

  • "datePublished":"2022-09-20T00:00:00"
  • "dateUpdated":"2022-11-14T00:00:00"

This happened because, during the third set of queries, the traversal captured page 8 of the old page layout and page 9 of the new page layout, neither of which included CVE-2022-39957.

I have not investigated what fix options are feasible, but (untested) possibilities include:

  • if the client is using time_modified.gt, then the sort order must be the modification time, oldest to newest. It is not acceptable for the sort order to be the CVE ID string in alphabetical order. This should guarantee that newly modified items occur on the last page or last pages, which may be sufficient for some client use cases (e.g., the client avoids using time_modified.lt to prevent any item from jumping outside the range, and adds client-side code to ignore the newly modified items).
  • The client would be expected to provide time_modified.lt, but the server would use this value in a different way. Specifically, time_modified.lt would be ignored when the server retrieves items from the database. However, when the server determines that the page requested is the last page, it would inspect the data before sending the response to the client, and possibly either send a subset of the last page or an error code. If some (but not all) of the items on the last page have timestamps after the time_modified.lt value, then those items would be removed from the page before sending the response to the client. If all of the items on the last page have timestamps after the time_modified.lt value, then the integrity of the previous page is also in question, and the server would therefore provide a 4xxx HTTP status. (In a REST API, the server obviously cannot tell the client that it has a revised response to an earlier request.) In other words, client designers would need to know that, if there is a 4xx response for any of the pages, then the entire collection of returned data is erroneous, and the client would need to repeat the complete set of paginated queries. In practice, a client that chooses recent time_modified.lt values would eventually "get lucky" and complete its set of paginated queries at a time when no modifications are occurring.
@brettp brettp self-assigned this Dec 28, 2022
brettp added a commit that referenced this issue Jan 12, 2023
…nexpect sort order and possible race condition
brettp added a commit that referenced this issue Jan 12, 2023
…nexpect sort order and possible race condition
jdaigneau5 added a commit that referenced this issue Jan 12, 2023
Resolves #972 Sort /cve and /cve-id by time.created to mitigate unexpected order and possible race condition
@slubar
Copy link
Contributor

slubar commented Jan 17, 2023

Re-opening for further discussion

@jdaigneau5
Copy link
Collaborator

Related to #1012 and #1050

@jdaigneau5
Copy link
Collaborator

Resolved by #1050

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants