-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage disk: Fix badger.ErrTxnTooBig when too many partition/path a… #5937
storage disk: Fix badger.ErrTxnTooBig when too many partition/path a… #5937
Conversation
09ab206
to
2032325
Compare
The core of the Truncate work was that it's only happening in one specific, mutex-protected code path, and would side-step the existing database triggers and all that. It's the only place where we can commit and re-new a transaction, because there are no triggers set up for the commit on the new store. After the data has been written to the new store, it replaces the old one. So I don't think that the change proposed here is the way to go. It's similar to 528836a which was consciously removed in 516dd47#diff-c91ae2bef36ff9b230ca0dd967684cea2c38dc920b32fe67faf49395579d96d7L170 because of the trigger callback issue. |
2032325
to
cee5ef3
Compare
Hi @srenatus Thanks for quick reply. According to what I have understand of your comment and source code:
The most simple solution I have found: If that solution is better than precedent and validated :
I have tested this solution with my bundle (160_000 partition) and it's ok |
I'm still not sure that this is the right approach...I think you might be fighting against a fundamental limitation of how Badger works in OPA. Have you tried other things, notably fiddling with the partitions to reduce the txn size? |
Hi @srenatus I see what you mean. Thank you for your response, it helped me a lot. I give up this error and I have continued my investigation. And I have 3 questions I discovered: when lazy mode is enabled, the error no longer occurred. Lazy mode is enabled (only?) when the bundle is downloaded! It's very good news (because long polling in production!). I also discover when lazy mode is available, the ram of cpu + with disc store became ok for sidecar usage, except during bundle activation.
(3) I haven't got very good knowledge of |
@floriangasc let me try to answer some of your queries
The purpose of lazy mode is to avoid deserializing data as it can be memory consuming and especially when disk storage is enabled you don't want memory to be a bottleneck. We started to apply this first when OPA downloads bundles as it seemed like a good start and we can use it in more places in the future where it makes sense to do so. Exposing this via config could also be considered if there is a valid use-case for it. On 3) it's possible that this was a temp spike before gc happened but it depends on your environment, data/policy etc. So you would have to experiment a little and allocate resources accordingly. If you suspect there is a leak, feel free to open an issue and provide steps to repro. I'm closing this PR as your use-case seems to be addressed and as Stephan has previously explained these changes don't seem necessary. Also if you have more questions, please use OPA Slack or start a GitHub Discussion. Thanks! |
Thanks à lot for your work and to take time to answer to my questions. I continue to investigate. |
storage/disk/txn.go
Why the changes in this PR are needed?
This change is to avoid
badger.ErrTxnTooBig
when load «too» many partitions.What are the changes in this PR?
I handle badger.ErrTxnTooBig like define in the official doc https://dgraph.io/docs/badger/get-started/#transactions
Notes to assist PR review:
I am not sure if it's best solution because of #5721 and also there is Truncate method/concept. It's most naive/simple solution according badger doc. If you confirm the solution (implementation) is ok, I will add test. (I plan to write some tests like previous issue #3879 : generate enough temp data at runtime for proving it's ok).
For my use case it's happen every ~30_000. I have a bundle with 160_000 file JSON, one by user. Each file is small ~1k.