-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SnapshotIT testCreateSnapshot fails possibly due to simultaneous deletions #37581
Comments
Pinging @elastic/es-distributed |
@jaymode this looks like some actual bug in snapshotting, there shouldn't be any concurrent actions on this file. nevermind figured out where this is coming from, no it :) |
* The repo id was determined wrong when the delete picked up on an in progress snapshot * NOTE: This solution is still a best-effort fix and there's a slight chance of running into concurrency issues here when multiple create and delete requests for the same snapshot name are happening concurrently, but these require a sequence of multiple cluster state updates between the changed method reading the genId and submitting its cluster state update task * Added test reproduced the issue reliably in about 50% of runs * Closes elastic#37581
Fix incoming in #37612 |
* The repo id was determined wrong when the delete picked up on an in progress snapshot * NOTE: This solution is still a best-effort fix and there's a slight chance of running into concurrency issues here when multiple create and delete requests for the same snapshot name are happening concurrently, but these require a sequence of multiple cluster state updates between the changed method reading the genId and submitting its cluster state update task * Added test reproduced the issue reliably in about 50% of runs * Closes #37581
This failed again today in a PR build: https://gradle-enterprise.elastic.co/s/67kanac5qy4ok but this time with a slightly different error:
I was unable to reproduce this with
@original-brownbear I think this is something you've fixed before, does it just need retries? |
@dakrone huh, this funny enough is a completely unrelated bug to the one initially fixed here :) The problem is that we're looping over all the files in a directory and then reading their sizes one by one when doing -> I'll open a PR to fix that in a bit. |
If we list out files in a folder via the lazily computed directory stream, we have to deal with concurrent deletes when reading the file attributes since we don't have a lock on the directory in any way. Closes elastic#37581
#49142 Should do the trick here |
…lastic#49142) * Make FsBlobContainer Listing Resilient to Concurrent Modifications If we list out files in a folder via the lazily computed directory stream, we have to deal with concurrent deletes when reading the file attributes since we don't have a lock on the directory in any way. Closes elastic#37581
…lastic#49142) * Make FsBlobContainer Listing Resilient to Concurrent Modifications If we list out files in a folder via the lazily computed directory stream, we have to deal with concurrent deletes when reading the file attributes since we don't have a lock on the directory in any way. Closes elastic#37581
…49142) (#49176) * Make FsBlobContainer Listing Resilient to Concurrent Modifications If we list out files in a folder via the lazily computed directory stream, we have to deal with concurrent deletes when reading the file attributes since we don't have a lock on the directory in any way. Closes #37581
…49142) (#49177) * Make FsBlobContainer Listing Resilient to Concurrent Modifications If we list out files in a folder via the lazily computed directory stream, we have to deal with concurrent deletes when reading the file attributes since we don't have a lock on the directory in any way. Closes #37581
I hit this failure locally multiple times but it does not reproduce.
Reproduce line:
Test runner output with test ordering:
Log from integ test cluster:
The text was updated successfully, but these errors were encountered: