-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[extension/storage/filestorage] Default fsync to true #31201
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the component is in beta, I am in support of this change because it makes the default configuration much safer. However, I would like to hear other opinions as well. cc: @open-telemetry/collector-contrib-approvers
If we do make the change, we'll need to roll it out with a feature gate. (Example)
I'm in support of this change.
I fully agree this is a breaking change, but I'm not convinced that flipping a feature gate is easier/more feasible than changing the configuration value? |
I think the benefit is to give users warning that the default will change, but maybe it's just more hassle then it's worth. |
I think we should clearly list it as a breaking change in the changelog, but I don't think we need to feature flag it. It won't break anything, just potentially slow things down slightly. |
I'd like to see some benchmarks before we merge this. The storage extension is used by the persistent queue in core, which writes each request to the queue. Enabling or disabling fsync - which in this case only protects against hardware failure and similar events - should be a decision the user makes consciously, as it has potentially major performance implications depending on the storage medium in question. |
@swiatekm-sumo you can see some benchmarks on the original file storage PR: #3087 |
Based on this comment, we observed a 3x difference in performance. We had a longer discussion about the risks of this setting on that thread as well. Based on rereading, it's not clear to me that this change should happen. Perhaps we should consider a third option which is to add more detailed documentation about the parameter and remove the default value, forcing users to make a decision about it. I will say that in the absence of a clear consensus I think we need to default to no breaking change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm blocking this until we establish consensus. I'll call attention to this discussion at next week's SIG meeting.
cc: @open-telemetry/collector-contrib-approvers in case there are additional opinions.
Part of the problem is that the value of this setting should likely depend on what the extension is used for. The persistent queue use case is performance-sensitive, but use cases where we just periodically store some relatively small state, like in filelog receiver, would probably be better off with fsync on. There's also some effort to make the persistent queue less taxing on the storage, so once that lands we could revisit whether enabling fsync remains painful there. |
It's a shame that bbolt panics on corruption instead of allowing a graceful way to truncate the database |
We could also consider other storage engines. Our API is very simple and maps readily to any embedded transactional key-value store. I once ran some benchmarks across a variety of them and didn't need to do much work to replace bbolt. Bbbolt also has some other properties which are undesirable from our perspective, like not dealing with a full disk very gracefully. |
@swiatekm-sumo, I'd be interested to see your findings on other storage engines. I seem to recall writing up a comparison vs other options when first implementing the extension but can't seem to find it now. |
@djaglowski I'll see if I can dig it up. Even if I can't, I don't think it'll be difficult to reproduce. |
It was pointed out in SIG today that the performance impact of enabling fsync was original stated as "3 orders of magnitude", rather than the 3x I stated above. In any case, we agreed that given the tradeoffs here, we should serious consider switching away from bbolt. I suggest we should wait until @swiatekm-sumo has a chance to look for his previous analysis. Otherwise, if anyone has potential alternatives in mind, please feel free to call them out. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Description:
See #31200, setting fsync to false by default is dangerous and will lead to data loss.
Link to tracking Issue: #31200