Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail writing snapshot data to filesystem repo if space running out #67790

Open
DaveCTurner opened this issue Jan 20, 2021 · 22 comments
Open

Fail writing snapshot data to filesystem repo if space running out #67790

DaveCTurner opened this issue Jan 20, 2021 · 22 comments
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jan 20, 2021

A user reported running out of disk space in their shared filesystem repository which left it completely stuck, unable to take any further actions since everything that might delete any existing data (even repository cleanup AFAICT) starts by writing another metadata file to the repository before proceeding and there wasn't even the space to do that.

Perhaps we should refuse to write data blobs (but not metadata blobs) to a shared filesystem repository when it is nearly full, leaving at least a few MB of wiggle room for cleanup and recovery from filling up the disk.


Workaround

  1. When space runs out:
    a. disable SLM
    b. ensure there are no ongoing snapshots
    c. extend the filesystem that contains the repo by 100MiB or so
    d. delete some snapshots to free up space
    e. shrink the filesystem to its original size.

Alternative workaround

  1. Ahead of time, create a ~100MiB file in the same filesystem as the repo to reserve some space.
  2. When space runs out:
    a. disable SLM
    b. ensure there are no ongoing snapshots
    c. delete the reserved-space file created in step 1
    d. delete some snapshots to free up space
    e. create another ~100MiB reserved-space file
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jan 20, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@original-brownbear
Copy link
Member

It's a little tricky to estimate the wiggle room here because we'd effectively need to make sure that we have enough space to write a new index-N at the root as well as a new index-uuid blob for each shard the combined size of which can vary widely, but a IMO a best effort guess of say 100MB should cover pretty much all cases I guess.

@DaveCTurner
Copy link
Contributor Author

Relates #26730 and associated ERs

@DaveCTurner
Copy link
Contributor Author

Another user reported the same issue.

@DaveCTurner
Copy link
Contributor Author

Thinking about this some more, it's very unlikely that a successful snapshot would leave the repo too full to do a delete, and it's pretty likely that we ran out of space in the middle of writing some data blobs. Today I think if that happens then we don't clean the dangling blobs up, we just leave everything in place, so it seems that there's a pretty good chance we can free up enough space to delete some snapshots even if the repo completely runs out of space.

@lost2
Copy link

lost2 commented Mar 28, 2022

While waiting for some fix regarding this problem, I use to create a "dummy-file-to-delete" with 500MB on repository directory, that could be manually deleted to free up space, allowing "Delete snapshot" to work again.

@jkervine
Copy link

Just encountered this on one of my non-production systems today. Because this was not production, I did the following:

  • From the snapshot volume/filesystem, moved files from indices directory, which had modification date in ancient history to another filesystem to make room
  • Delete some snapshots (through snapshot API) which are much newer that ancient history
  • move those ancient files back to their original place
  • deleted some more snapshots, now from ancient history

Seemed to work, no errors in logs at least and new snapshots are created and restored ok... am I in for a lot of surprises in the future?

@DaveCTurner
Copy link
Contributor Author

Seemed to work, no errors in logs at least and new snapshots are created and restored ok... am I in for a lot of surprises in the future?

Maybe. I don't think we can give a more confident answer than that - this is definitely not a supported or tested workflow, and "seemed to work" is a very weak indicator of repository integrity. As the docs say:

Don’t modify anything within the repository or run processes that might interfere with its contents. If something other than Elasticsearch modifies the contents of the repository then future snapshot or restore operations may fail, reporting corruption or other data inconsistencies, or may appear to succeed having silently lost some of your data.

It'd be better if we had an option to do dry-run restores (#54940) or other kinds of integrity checks (#52622) but these proposed features are not under active development right now.

@Leaf-Lin Leaf-Lin added the good first issue low hanging fruit label Jun 2, 2022
@Leaf-Lin Leaf-Lin removed the good first issue low hanging fruit label Jun 15, 2022
@lost2
Copy link

lost2 commented Sep 20, 2022

Any news from this "enhancement" request? We're on V8 by now and snapshots keeps filling up file system to no avail. Thanks

@maggieghamry
Copy link

@DaveCTurner is there a known workaround for this situation?

@DaveCTurner
Copy link
Contributor Author

I added some notes on ahead-of-time protection to the OP.

Once you reach this situation then the only workaround is to extend the filesystem (temporarily) and delete some snapshots.

@jerrac
Copy link

jerrac commented May 26, 2023

Is there any current movement on this? For various reasons I wasn't able to address my snapshot disk space issues before they hit 100%. Extending the disk is currently not an option. Right now I can't even get the list of snapshots to load via Kibana or curl. I'm working on getting more space, but for now I'm stuck.

I've use a lot of systems that make how much space to reserve an option. I'd say a simple way to partially fix this is to just add an option for that. Then, before taking a snapshot, check if the threshold is met, if it is, just don't take the snapshot. I'd think that'd be fairly straightforward to implement.

@DaveCTurner
Copy link
Contributor Author

It's not on anyone's roadmap right now, but it sounds like you're volunteering to take it on @jerrac? If so, a PR would be very welcome. IMO it'd be better to focus on cleaning up dangling blobs after a failed snapshot, as per my earlier comment, but if you'd prefer to try the reserved-space route then we'd appreciate that too.

@jerrac
Copy link

jerrac commented May 30, 2023

@DaveCTurner Er, Java is not really my forte. Haven't actually touched it since an internship 10+ years ago... I did go poke around and found this section of code:

private static void validate(String repositoryName, String snapshotName, ClusterState state) {
RepositoriesMetadata repositoriesMetadata = state.getMetadata().custom(RepositoriesMetadata.TYPE, RepositoriesMetadata.EMPTY);
if (repositoriesMetadata.repository(repositoryName) == null) {
throw new RepositoryMissingException(repositoryName);
}
validate(repositoryName, snapshotName);
}
private static void validate(final String repositoryName, final String snapshotName) {
if (Strings.hasLength(snapshotName) == false) {
throw new InvalidSnapshotNameException(repositoryName, snapshotName, "cannot be empty");
}
if (snapshotName.contains(" ")) {
throw new InvalidSnapshotNameException(repositoryName, snapshotName, "must not contain whitespace");
}
if (snapshotName.contains(",")) {
throw new InvalidSnapshotNameException(repositoryName, snapshotName, "must not contain ','");
}
if (snapshotName.contains("#")) {
throw new InvalidSnapshotNameException(repositoryName, snapshotName, "must not contain '#'");
}
if (snapshotName.charAt(0) == '_') {
throw new InvalidSnapshotNameException(repositoryName, snapshotName, "must not start with '_'");
}
if (snapshotName.toLowerCase(Locale.ROOT).equals(snapshotName) == false) {
throw new InvalidSnapshotNameException(repositoryName, snapshotName, "must be lowercase");
}
if (Strings.validFileName(snapshotName) == false) {
throw new InvalidSnapshotNameException(
repositoryName,
snapshotName,
"must not contain the following characters " + Strings.INVALID_FILENAME_CHARS
);
}
}
After I got over the oddness that is having the same method defined twice in the same class, that is where I'd imagine it might (emphasis on might) make sense to add a disk space check.

Maybe add a "RepositoryFullException" class, and then somehow throw one when the disk is getting full.

No idea how I'd actually add an option to repository creation that would let users set the threshold.

Anyway, I might poke at setting up and actual dev environment, but I'm not sure I'll have time very soon. :\

Hopefully someone else will have time, and skill, to jump on this soon. :)

@ywangd ywangd self-assigned this Sep 19, 2023
ywangd added a commit to ywangd/elasticsearch that referenced this issue Sep 20, 2023
When shard level data files fail to write for snapshots, these data
files become dangling and will not be referenced anywhere. Today we
leave them as is. If the failure is disk full, the repository will
become unusable because even the delete snapshot operation requires
write a small metadata file first.

This PR adds a clean up step so that the shard data files are removed if
the shard data files fail to snapshot.

Resolves: elastic#67790
ywangd added a commit that referenced this issue Sep 27, 2023
When shard level data files fail to write for snapshots, these data
files become dangling and will not be referenced anywhere. Today we
leave them as is. If the failure is disk full, the repository will
become unusable because even the delete snapshot operation requires
write a small metadata file first.

This PR adds a clean up step so that the shard data files are removed if
the shard data files fail to snapshot.

Relates: #67790

Co-authored-by: David Turner <david.turner@elastic.co>
@DaveCTurner
Copy link
Contributor Author

DaveCTurner commented Sep 27, 2023

#99694 should effectively solve this in practice with high probability since it's very likely that you hit the disk-full situation while writing data blobs which are now cleaned up on failure, leaving enough space in the repository for the metadata operations needed to delete snapshots.

That leaves open the (much more remote) possibility that the disk fills up when writing metadata. However most repository implementations do not have a meaningful notion of "space running out" so it turns out to be fairly tricky to implement the idea suggested in the OP in a general-purpose way:

Perhaps we should refuse to write data blobs (but not metadata blobs) to a shared filesystem repository when it is nearly full, leaving at least a few MB of wiggle room for cleanup and recovery from filling up the disk.

Instead maybe we should consider extending #81352 to allow storing data and metadata in wholly different locations (e.g. different filesystems or S3 buckets) so that we can be sure to have space for metadata operations even if the data location is full. It would also likely help to do #75623 and #100115, and #52622 would also be able to identify dangling data.

Since we're not currently planning to address the remaining possibility that the disk fills up when writing metadata, and there are other open issues to track alternative ideas, I'm closing this.

@jerrac
Copy link

jerrac commented Sep 27, 2023

I have to admit I'm confused. Why not just actually check that there is enough space before even starting a snapshot?

The proposed solution is that a failed write due to full disk will somehow fix our problems because the blob that failed to write will get deleted, which might leave enough space for metadata to be written.

That's relying on something breaking in order to stop something else from breaking.

If airlines relied on planes failing to take off to determine if the plane had too much stuff in it, would that be an acceptable solution?

Shouldn't it be that we try to stop something from breaking in the first place?

I mean, Elasticsearch has that kind of logic in other places. It limits the total number of shards and will stop allocating indices to filesystems that are almost full. That's all to prevent a problem before it occurs. Right?

@DaveCTurner
Copy link
Contributor Author

Why not just actually check that there is enough space before even starting a snapshot?

The word "just" is loadbearing in that question :) We can't accurately determine the space needed up front, or at least it would be significant extra computation, because of having to account for deduplication. And then filesystems don't really guarantee that the free space they report means we can actually write that many bytes, because of overheads lost to incomplete blocks and so on. And then there's other users of the same filesystem consuming or freeing space too. And finally none of the cloud repo APIs have a way to even query the available free space.

If airlines had a way to handle failed-to-take-off as gracefully as we now handle disk-full in a repository then I expect they would indeed use that rather than all the effort and procedures (and capacity lost to safety margins) they have today.

@jerrac
Copy link

jerrac commented Sep 27, 2023

I can get that calculating how much space you need beforehand is not feasible.

But, just like with storing indices, you can check for a percentage of free space and then refuse to start a snapshot if that percentage isn't available.

Would it really matter for s3 apis? I thought the whole point of that kind of storage was to not have to deal with running out of space, you just keep paying for more as you use more.

Anyway, I'll leave it at that. I'm probably not going to bother with snapshots in the future anyway. This issue, plus the fact they require snapshotting live data and can't be limited to just old data you want to archive (at least as far as I can tell...), means they don't do the job I want.

@DaveCTurner
Copy link
Contributor Author

But, just like with storing indices, you can check for a percentage of free space and then refuse to start a snapshot if that percentage isn't available.

We do this with indices because the consequences of hitting disk-full while writing indices is rather severe. If we could safely do so, we'd run disks up to capacity in this area too.

Would it really matter for s3 apis? I thought the whole point of that kind of storage was to not have to deal with running out of space, you just keep paying for more as you use more.

Very much so, a substantial fraction of users store their snapshots in on-prem storage which claims some level of S3-compatibility, but none of those on-prem systems correctly emulate S3's lack of space constraints. (Whether they should be doing this is a whole other question, but unfortunately not one whose answer really matters in practice.)

@lost2
Copy link

lost2 commented Sep 30, 2023

Having reported this back in jan 2021 (18 versions ago - v7.10 -> v8.10), I'm glad to know this issue is finally being addressed.

Also, I'm surprised that this is even possible to happen: "When shard level data files fail to write for snapshots, these data files become dangling and will not be referenced anywhere. Today we leave them as is". What's causing this "files fail to write" anyway?
Thx

@DaveCTurner
Copy link
Contributor Author

What's causing this "files fail to write" anyway?

It could be anything really (you might be amazed how flaky some users' storage is) but in the context of this issue the problem that matters most is running out of disk space.

piergm pushed a commit to piergm/elasticsearch that referenced this issue Oct 2, 2023
…#99694)

When shard level data files fail to write for snapshots, these data
files become dangling and will not be referenced anywhere. Today we
leave them as is. If the failure is disk full, the repository will
become unusable because even the delete snapshot operation requires
write a small metadata file first.

This PR adds a clean up step so that the shard data files are removed if
the shard data files fail to snapshot.

Relates: elastic#67790

Co-authored-by: David Turner <david.turner@elastic.co>
@Sergi-GC
Copy link

Sergi-GC commented Apr 6, 2024

Added the workarounds mentioned here in KB article https://support.elastic.co/knowledge/b1186c52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

10 participants