Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream envelopes to file directly #1001

Closed
Swatinem opened this issue Jun 3, 2024 · 3 comments · Fixed by #1021
Closed

Stream envelopes to file directly #1001

Swatinem opened this issue Jun 3, 2024 · 3 comments · Fixed by #1021
Assignees
Labels
area: core enhancement New feature or request

Comments

@Swatinem
Copy link
Member

Swatinem commented Jun 3, 2024

The sentry_envelope_write_to_path was always built on top of sentry_envelope_serialize purely for convenience, and has always had a comment suggesting it should stream directly to the output file:

sentry_envelope_write_to_path(
const sentry_envelope_t *envelope, const sentry_path_t *path)
{
// TODO: This currently builds the whole buffer in-memory.
// It would be nice to actually stream this to a file.
size_t buf_len = 0;
char *buf = sentry_envelope_serialize(envelope, &buf_len);

One customer is now running into the problem that the SDK writes invalid envelopes to disk on OOM, most likely because it is unable to allocate memory itself.

Such envelopes may have item headers (with correct length), but no item contents, most likely because writing the contents would grow the buffer which fails, so the serialization code then just silently ignores that error and continues writing the next item header.

In another case, the envelope contains attachments, but no event. Most likely because the code to builds envelopes currently reads attachments into memory first, before constructing the event datastructure, which also increases memory pressure.


Likely the most reasonable effort change would be to resolve that TODO which has existed ever since the code was written to truely stream envelope writes to disk, which should halve the memory requirement for the whole operation.

Second, but probably more effort, and also with an observable side-effect would be to load attachments on-demand when serializing, vs when building the envelope in-memory.

@supervacuus
Copy link
Collaborator

Second, but probably more effort, and also with an observable side-effect would be to load attachments on-demand when serializing, vs when building the envelope in-memory.

What would be the observable side-effect (you mean to users, right?) besides deferred/lower memory usage? We do not expose envelope items to clients except for events or transactions. It could make a difference if we decide to implement something like this: #978

@Swatinem
Copy link
Member Author

Swatinem commented Jun 3, 2024

The "observable" side-effect would be related to the timing of filesystem access. Already we advertise that users should flush any attachments in the on-crash hook, because thats when attachments are being read.
If this would happen at some point afterwards, somewhere in the transport code, that could come as a surprise.

@supervacuus
Copy link
Collaborator

#1021 fixed

Likely the most reasonable effort change would be to resolve that TODO which has existed ever since the code was written to truely stream envelope writes to disk, which should halve the memory requirement for the whole operation.

but we probably should keep

Second, but probably more effort, and also with an observable side-effect would be to load attachments on-demand when serializing, vs when building the envelope in-memory.

on the agenda since attachments typically pale the effects of all other envelope items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: core enhancement New feature or request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants