Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(metrics): Prevent NaNs in sampling from occuring #270

Merged
merged 1 commit into from
May 5, 2023

Conversation

cvonelm
Copy link
Member

@cvonelm cvonelm commented Apr 6, 2023

A code path could lead to a NaN if diff_running is 0 as this would result in a division-by-zero.

This commit solves this problem by just deleting that code path, as there is no evidence that the bug it addresses has existed in recent times, if ever.

This fixes #267

Copy link
Member

@bmario bmario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit might fix the issue, but the reasoning and commit message need to be sound. A division-by-zero by itself doesn't produce NaN values. To get a Nan here, diff_value must be zero, as a product of plus or minus infinity, and zero would give NaN.

@cvonelm cvonelm force-pushed the issue-267-nan-collection branch from cbda419 to 06f7025 Compare April 27, 2023 10:33
@bmario
Copy link
Member

bmario commented Apr 27, 2023

I don't want to be that picky, but the commit message could need some touches still. But keep that aside for a moment.

What I'm wondering is this:

  • There is forgotten knowledge passed down from the ancients of the SVN galaxy that there might have been a perf bug.
  • The bug is: diff_enabled and diff_running might be switched
  • The comment in the current code says: diff_enabled is always smaller than diff_running
  • This implies that the bug is present if diff_running is not smaller than diff_enabled
  • And presumably, we are going into that code path (otherwise, removing it wouldn't do anything)
  • So, we are hitting the bug from the tale of the ancients? Which you claim never happened.

I don't understand it.

Another thing, does the situation occur that diff_running is zero, but neither diff_enabled nor diff_value is?
The current implementation would result in infinity. The current patch would change that to zero. What should be the desired outcome? Why don't we handle the case that diff_running is zero in the first if condition? Hence the outcome would be diff_value.

@cvonelm
Copy link
Member Author

cvonelm commented Apr 27, 2023

There is forgotten knowledge passed down from the ancients of the SVN galaxy that there might have been a perf bug.

A bug for which there is zero information of it ever existing, neither in kernel changelogs, Robert and Thomas' knowledge and even the ancient scriptures of the lo2s SVN repository. During trial runs of lo2s, it also did not occur.

What should be the desired outcome?

Zero, I think. I don't see a case where a metric that was running 0% of the time during that sample interval should produce something else than diff_value of 0 occurences.

@tilsche
Copy link
Member

tilsche commented May 4, 2023

Is anything blocking this? I kinda need it :/

@cvonelm
Copy link
Member Author

cvonelm commented May 4, 2023

someone should have a final look at the explanatory comment I have included in counter_buffer.hpp to make sure that it doesn't contain obvious bullshit.

Other than that I think we can ship it.

@cvonelm cvonelm force-pushed the issue-267-nan-collection branch 3 times, most recently from 6f1eecf to b265e55 Compare May 5, 2023 08:34
A code path in the metric readout code could lead to the generation of
a metric value of "NaN" (see counter_buffer.hpp explanatory
comment for more details).

As metrics are recorded in an accumulated fashion, a NaN metric value
will break the metric for the rest of the recording.

This commit deletes that code path, as there is no record of the bug it
tries to fix ever existing.

This commit further improves the resilence of the metric readout code by
introducing further checks for values of diff_running/enabled/value that
can generate a NaN.
@cvonelm cvonelm merged commit 810a5cf into master May 5, 2023
@cvonelm cvonelm deleted the issue-267-nan-collection branch May 5, 2023 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nan collection in counter buffer
3 participants