-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CycleClock: Add support for Alpha architecture #1753
Conversation
src/cycleclock.h
Outdated
// which means it overflows every ~4sec depending on CPU frequency. This | ||
// tick count will only be reliable for short intervals. This implementation | ||
// is the same as what fftw uses. | ||
uint64_t cc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two thoughts: if we know it's 32-bits why do we read into a 64-bit and mask?
also if it's only 4 seconds i'm not sure it's going to be very useful. is there no alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does gettimeofday()
(or an alternative) work on that arch? There's precedent for using that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two thoughts: if we know it's 32-bits why do we read into a 64-bit and mask?
also if it's only 4 seconds i'm not sure it's going to be very useful. is there no alternative?
Hi, I helped work on this. It's read into a 64-bit because the register is 64-bits wide, the lower 32 bits are the actual counter, while the high 32 bits are "operating system dependent". Linux does not do anything with them today.
Does
gettimeofday()
(or an alternative) work on that arch? There's precedent for using that.
It works, the problem is that clock precision is too low. You can see this for example in the python bindings:
In [2]: time.clock_getres(time.CLOCK_MONOTONIC)
Out[2]: 0.000976563
This has been the cause of many a PR of mine, as on alpha two calls to gettimeofday()
less than ~1ms apart will return the same timestamp.
That said, the rpcc
technique seems to sufficient to pass the test suite. I could include both options and make it configurable at compile-time perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to make this configurable.
It should just work right (by the definition of "right",
that is expected by the users of that architecture),
or not at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to make this configurable. It should just work right (by the definition of "right", that is expected by the users of that architecture), or not at all.
Then rpcc
is the way to go. There isn't any method of measuring cycle counts for arbitrary periods on the architecture, so this is the best compromise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test suite is optimised to run quickly so that's not surprising.
i don't know if we have real world users that have per iteration timings of >4s but it wouldn't surprise me if we did. i think until there's a 64-bit high precision timer on arch, gettimeofday is going to be the better option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/per-iteration/per-repetition/, which is much worse
src/sysinfo.cc
Outdated
@@ -550,7 +550,9 @@ int GetNumCPUs() { | |||
"CPU ID assignments in /proc/cpuinfo seem messed up." | |||
" This is usually caused by a bad BIOS.\n"); | |||
} | |||
return num_cpus; | |||
// If we are unable to identify the number of CPUs, default to one rather | |||
// than zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be a global fix-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate what you mean? It should continue to default to zero and there should be code to parse this nonstandard /proc/cpuinfo
? Or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably no other architecture so far has needed this fix-up,
so if this fix-up is needed for alpha, it should not be
retroactively added to other architectures at the same time.
If /proc/cpuinfo
is different there, then you certainly should actually parse it in an alpha-specific way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably no other architecture so far has needed this fix-up, so if this fix-up is needed for alpha, it should not be retroactively added to other architectures at the same time.
If
/proc/cpuinfo
is different there, then you certainly should actually parse it in an alpha-specific way.
I believe there should be some sort of error condition then, rather than returning zero as if it was a valid number of CPUs. The calling code does not seem to consider this an error condition, and just blindly keeps going. For example, when returning zero CPUs, later code subtracts 1 from this (resulting in -1), then tries to cast to a size_t
and allocate a vector of beyond SIZE_MAX
length. So, not being able to parse /proc/cpuinfo
does not seem to have been considered as a valid error condition.
Would you be okay with having this throw a runtime error if /proc/cpuinfo
is unable to be parsed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, does that happen on Alpha, or on an other architecture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't support exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested, and returning -1 results in the same issues as returning 0. Here is an example of what it looks like with -1:
Running /var/tmp/portage/dev-cpp/benchmark-1.8.3/work/benchmark-1.8.3_build/test/benchmark_test
Run on (-1 X 1987.32 MHz CPU )
Load Average: 1.09, 1.04, 0.90
terminate called after throwing an instance of 'std::length_error'
what(): cannot create std::vector larger than max_size()
<end of output>
Test time = 0.05 sec
----------------------------------------------------------
Test Failed.
I see that it's used immediately in CPUInfo
constructor without error checking. I'll have to experiment and see what is a reasonable way to propagate such an error up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, personally, i just wouldn't bother.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then how about std::abort()
(or, I guess the abort handler stuff)? It looks like there's just no good way for callers to handle errors here and there are no circumstances under which you can meaningfully work with zero CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we could.. the CHECK
macros we have get compiled out in release builds but maybe we should have one that doesn't get compiled out (or just CallAbortHandler
directly).
src/sysinfo.cc
Outdated
@@ -510,10 +510,13 @@ int GetNumCPUs() { | |||
int max_id = -1; | |||
std::ifstream f("/proc/cpuinfo"); | |||
if (!f.is_open()) { | |||
std::cerr << "failed to open /proc/cpuinfo\n"; | |||
return -1; | |||
PrintErrorAndDie("Failed to open /proc/cpuinfo"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While there, could you please cleanup rest of this function, Ctrl+F for stderr
?
(generally looks fine to me, thank you) |
src/sysinfo.cc
Outdated
if (GetSysctl("hw.ncpu", &num_cpu)) return num_cpu; | ||
fprintf(stderr, "Err: %s\n", strerror(errno)); | ||
std::exit(EXIT_FAILURE); | ||
PrintErrorAndDie("Err: ", strerror(errno)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This becoming too messy :)
Please split this PR into two:
- The actual changes to support the architecture
- All of the PrintErrorAndDie stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, a little confused, isn't that what you asked for above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess i thought it would look nicer?
Concretely, i'm getting stuck on how asymmetrical this all is,
sometimes there is a check for ncpu=0, sometimes there is not.
My next comment would be to extract the if ncpu<1 abort
logic
into a wrapper function.
I just don't think it makes sense to do all that cleanup in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will dropping the last commit be sufficient for this PR then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"sysinfo: support parsing /proc/cpuinfo on Alpha" also has some PrintErrorAndDie
stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO that is immediately and directly relevant to this PR. I spent a lot of time investigating and trying to understand why the tests were failing, looking for issues with the cycle timer, when it just turned out that failing to parse /proc/cpuinfo
silently returns garbage data that results in all sorts of cryptic errors down the line later. I believe handling this in a sensible way is super important in order to make porting to this and other platforms a reasonable process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, i'm only talking about how to get this diff to the state where i'd be happy to merge it,
and splitting it into two separate chunks will do that.
We spent more time arguing about that than it would have taken to get both of those PR's merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(please send that second PR please :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it!
As documented, the real cycle counter is unsafe to use here, because it is a 32-bit integer which wraps every ~4s. Use gettimeofday instead, which has a limitation of a low-precision real-time-clock (~1ms), but no wrapping. Passes test suite. Signed-off-by: Sam James <sam@gentoo.org>
And also, bail out if unable to parse /proc/cpuinfo. This will preemptively alert users on platforms that need custom code for parsing /proc/cpuinfo. Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Sam James <sam@gentoo.org>
@thesamesam thank you! |
Thank you all! |
Defines a wrapper function, CheckNumCPUs, which enforces that GetNumCPUs never returns fewer than one CPU. There is no reasonable way to continue if we are unable to identify the number of CPUs. This is a followup to google#1753. Signed-off-by: Sam James <sam@gentoo.org>
Accepted upstream already. See: google/benchmark#1753 See: google/benchmark#1756 Bug: https://bugs.gentoo.org/922877 Signed-off-by: Matoro Mahri <matoro_gentoo@matoro.tk>
Accepted upstream already. See: google/benchmark#1753 See: google/benchmark#1756 Bug: https://bugs.gentoo.org/922877 Signed-off-by: Matoro Mahri <matoro_gentoo@matoro.tk> Closes: #35729 Signed-off-by: Sam James <sam@gentoo.org>
This uses the native RPCC instruction, which comes with a caveat. The
PCC (Program Cycle Register) tick counter on Alpha is a 32-bit unsigned
integer, which means that it overflows every ~4-ish seconds, depending
on clock speed. Therefore this is only safe to use for small
measurements. This is the same implementation used in fftw.