Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-389319: Wait and retry for GET_UPDATES_IN_PROGRESS #5617

Merged
merged 1 commit into from
May 8, 2024

Conversation

minglumlu
Copy link
Member

The query on HTTP endpoint /updates will return the available updates in JSON format. Prior to the changes in this commit, if a query arrives when another query is being handled, a "GET_UPDATES_IN_PROGRESS" error will be returned immediately. This behaviour is not friendly to GUI client XenCenter.

In this commit, the behaviour is changed to wait and retry in handling the query in xapi since the "*_IN_PROGGRESS" error is a transient failure. Tolerating it in xapi (server) side avoids error handling in client side.

@lindig
Copy link
Contributor

lindig commented May 7, 2024

Where is the waiting part the title talks about? It also looks like this is not single retry - so could this end up in a loop?

Copy link
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't a delay be added to avoid consuming CPU while waiting?

@minglumlu
Copy link
Member Author

Where is the waiting part the title talks about? It also looks like this is not single retry - so could this end up in a loop?

  let state = queue_thread (fun () -> Policy.wait ~__context state e) in
  (loop [@tailcall]) state

It's the policy parameter of let retry ~__context ~doc ?(policy = Policy.standard) f.
It will go into the non Early_wakeup.wait branch:

( match e with
| Api_errors.Server_error (code, [cls; objref])
  when code = Api_errors.other_operation_in_progress ->
    Early_wakeup.wait (cls, objref) this_timeout
| _ ->
    Thread.delay this_timeout
) ;

So it will wait up to 20 seconds for a single delay in maximum.

Eventually, it will exit from the loop by:

if state.wait_so_far >= state.max_total_wait then raise e ;

@minglumlu
Copy link
Member Author

minglumlu commented May 7, 2024

Shouldn't a delay be added to avoid consuming CPU while waiting?

Yes, there is a delay indeed in module Repeat_with_uniform_backoff

    Thread.delay this_timeout

Copy link
Contributor

@Vincent-lau Vincent-lau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this a race condition when two clients are trying to request updates at the same time? Looking at the ticket it looks like this is not an intermittent failure but reproducible whenever two clients send get_updates request. Are these requests sent periodically, and how long does it take for xapi to handle one request?

when code = Api_errors.other_operation_in_progress
|| code = Api_errors.get_updates_in_progress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding a special case here, could we move the following in xapi_pool_helpers.ml

  ; (`get_updates, Api_errors.get_updates_in_progress)

from the blocking_ops list to wait_ops? Then we can also delete the definition of Api_errors.get_updates_in_progress completely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I oversaw the wait_ops and Helpers.Early_wakeup.broadcast in with_pool_operation.
On the other hand, indeed, it would be odd if the Api_errors.get_updates_in_progress would not be reported to users but still an error.

The query on HTTP endpoint /updates will return the available updates in
JSON format. Prior to the changes in this commit, if a query arrives
when another query is being handled, a "GET_UPDATES_IN_PROGRESS" error
will be returned immediately. This behaviour is not friendly to GUI
client XenCenter.

In this commit, the behaviour is changed to wait and retry in handling
the query in xapi since the "*_IN_PROGGRESS" error is a transient
failure. Tolerating it in xapi (server) side avoids error handling in
client side.

With the change, the "GET_UPDATES_IN_PROGRESS" will not be an error
exposed to users any more. Therefore it is removed.

Signed-off-by: Ming Lu <ming.lu@cloud.com>
@minglumlu
Copy link
Member Author

So is this a race condition when two clients are trying to request updates at the same time? Looking at the ticket it looks like this is not an intermittent failure but reproducible whenever two clients send get_updates request. Are these requests sent periodically, and how long does it take for xapi to handle one request?

There is in-memory cache in xapi for the result. So for the first query after the xapi starting up, it would take a few seconds (depending on the number of hosts) to fill up the cache. After that, until the next xapi restart, the queries will return by reading from the cache quickly. But anyway, it may hit the race condition. So the choice is either returning a "*_IN_PROGRESS" error immediately, or waiting for retry. Since this is a transient failure, so it is changed to waiting for retry in this commit.

@robhoes robhoes merged commit bdbf705 into xapi-project:master May 8, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants