Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nan collection in counter buffer #267

Closed
tilsche opened this issue Apr 5, 2023 · 2 comments · Fixed by #270
Closed

Nan collection in counter buffer #267

tilsche opened this issue Apr 5, 2023 · 2 comments · Fixed by #270

Comments

@tilsche
Copy link
Member

tilsche commented Apr 5, 2023

I encountered a trace in which a collected metric uncore_clock/clockticks/ became NaN at some point and never reverted to a valid state.

After a brief look at the code, it appears that CounterBuffer's state could become NaN in case diff_enabled > diff_running and diff_running == 0. We need to avoid those cases. Note that this happend on a Kernel, 5.19.1. Not sure what is special with time_enabled / time_runing for PMU counters - and not sure if that "swap" bug is still in effect. Maybe we can also simplify the code now and only consider non-broken kernels? In any case there should be logging / handling of cases that result in NaN.

@cvonelm
Copy link
Member

cvonelm commented Apr 5, 2023

As far as I'm concerned, we can delete that code path. Neither Google nor a search in the Linux kernel code finds anything regarding such an issue ever existing and git blame on the comment about the swap bug points to the initial import from svn.

@tilsche
Copy link
Member Author

tilsche commented Apr 5, 2023

Given that all information points to it being fixed for at least 7 years (the git-preserved history of lo2s), I tend to agree.

But at the same time I have to admit that I am utterly curious what it is/was.

cvonelm added a commit that referenced this issue Apr 27, 2023
A code path could lead to a NaN if diff_running is 0 is multiplying 0
with +/- infinity results in a NaN.

This commit solves this problem by just deleting that code path, as there is no evidence that the bug it addresses has existed in recent times, if ever.

This fixes #267
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants