Proposal: store CPU samples in nanoseconds #1602

petethepig · 2022-10-05T23:30:01Z

This came up during the discussion in #1589 and vasi-stripe#1

@kolesnikovae mentioned that there's a flaw in our storage engine:

Sample rate is stored per segment (series) and gets overridden with the value ingested last. This could be a problem if there are two profiles with distinct time units (e.g seconds and milliseconds).

@vasi-stripe proposed to store CPU samples in nanoseconds instead

Silly idea: What if we always chose nanosecond sampling rate? Would that be worse in any way?

I think this a great idea. Before we commit to it we need to figure out two things:

space requirements
how to handle old data already stored on disk

Space requirements

I imagine this new way of storing data will be less efficient because we use varints when we serialize integers to disk, and so storing 1,000,000,000 is less efficient than storing 100 (5 bytes instead of 1). But without experimentation it's hard to say how much less efficient. We do compress data when it lands on disk so in theory this shouldn't be too much of a problem, but we should still do some measurements before we commit to this.

Also, we might want to consider other alternative units, e.g:

microseconds
milliseconds

milliseconds are only 1 digit less efficient than storing samples at 100Hz, and are good enough for most of our integrations I imagine.

How to handle old data already stored on disk

I think the easiest approach would be to introduce a new version for trees. New data will be written with nanosecond precision, and the old data read in v1 format will be normalized to nanoseconds during deserialization process.

The text was updated successfully, but these errors were encountered:

petethepig added the backend Mostly go code label Oct 5, 2022

petethepig mentioned this issue Oct 6, 2022

adds time units handling for speedscope format vasi-stripe/pyroscope#1

Merged

Rperry2174 closed this as completed Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: store CPU samples in nanoseconds #1602

Proposal: store CPU samples in nanoseconds #1602

petethepig commented Oct 5, 2022 •

edited

Loading

Proposal: store CPU samples in nanoseconds #1602

Proposal: store CPU samples in nanoseconds #1602

Comments

petethepig commented Oct 5, 2022 • edited Loading

Space requirements

How to handle old data already stored on disk

petethepig commented Oct 5, 2022 •

edited

Loading