Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CycleClock: Add support for Alpha architecture #1753

Merged
merged 3 commits into from
Feb 13, 2024
Merged

Conversation

thesamesam
Copy link
Contributor

This uses the native RPCC instruction, which comes with a caveat. The
PCC (Program Cycle Register) tick counter on Alpha is a 32-bit unsigned
integer, which means that it overflows every ~4-ish seconds, depending
on clock speed. Therefore this is only safe to use for small
measurements. This is the same implementation used in fftw.

src/cycleclock.h Outdated
// which means it overflows every ~4sec depending on CPU frequency. This
// tick count will only be reliable for short intervals. This implementation
// is the same as what fftw uses.
uint64_t cc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two thoughts: if we know it's 32-bits why do we read into a 64-bit and mask?

also if it's only 4 seconds i'm not sure it's going to be very useful. is there no alternative?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does gettimeofday() (or an alternative) work on that arch? There's precedent for using that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two thoughts: if we know it's 32-bits why do we read into a 64-bit and mask?

also if it's only 4 seconds i'm not sure it's going to be very useful. is there no alternative?

Hi, I helped work on this. It's read into a 64-bit because the register is 64-bits wide, the lower 32 bits are the actual counter, while the high 32 bits are "operating system dependent". Linux does not do anything with them today.

Does gettimeofday() (or an alternative) work on that arch? There's precedent for using that.

It works, the problem is that clock precision is too low. You can see this for example in the python bindings:

In [2]: time.clock_getres(time.CLOCK_MONOTONIC)
Out[2]: 0.000976563

This has been the cause of many a PR of mine, as on alpha two calls to gettimeofday() less than ~1ms apart will return the same timestamp.

That said, the rpcc technique seems to sufficient to pass the test suite. I could include both options and make it configurable at compile-time perhaps?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to make this configurable.
It should just work right (by the definition of "right",
that is expected by the users of that architecture),
or not at all.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to make this configurable. It should just work right (by the definition of "right", that is expected by the users of that architecture), or not at all.

Then rpcc is the way to go. There isn't any method of measuring cycle counts for arbitrary periods on the architecture, so this is the best compromise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test suite is optimised to run quickly so that's not surprising.

i don't know if we have real world users that have per iteration timings of >4s but it wouldn't surprise me if we did. i think until there's a 64-bit high precision timer on arch, gettimeofday is going to be the better option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/per-iteration/per-repetition/, which is much worse

src/sysinfo.cc Outdated
@@ -550,7 +550,9 @@ int GetNumCPUs() {
"CPU ID assignments in /proc/cpuinfo seem messed up."
" This is usually caused by a bad BIOS.\n");
}
return num_cpus;
// If we are unable to identify the number of CPUs, default to one rather
// than zero.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a global fix-up.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate what you mean? It should continue to default to zero and there should be code to parse this nonstandard /proc/cpuinfo? Or something else?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably no other architecture so far has needed this fix-up,
so if this fix-up is needed for alpha, it should not be
retroactively added to other architectures at the same time.

If /proc/cpuinfo is different there, then you certainly should actually parse it in an alpha-specific way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably no other architecture so far has needed this fix-up, so if this fix-up is needed for alpha, it should not be retroactively added to other architectures at the same time.

If /proc/cpuinfo is different there, then you certainly should actually parse it in an alpha-specific way.

I believe there should be some sort of error condition then, rather than returning zero as if it was a valid number of CPUs. The calling code does not seem to consider this an error condition, and just blindly keeps going. For example, when returning zero CPUs, later code subtracts 1 from this (resulting in -1), then tries to cast to a size_t and allocate a vector of beyond SIZE_MAX length. So, not being able to parse /proc/cpuinfo does not seem to have been considered as a valid error condition.

Would you be okay with having this throw a runtime error if /proc/cpuinfo is unable to be parsed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, does that happen on Alpha, or on an other architecture?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't support exceptions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested, and returning -1 results in the same issues as returning 0. Here is an example of what it looks like with -1:

Running /var/tmp/portage/dev-cpp/benchmark-1.8.3/work/benchmark-1.8.3_build/test/benchmark_test
Run on (-1 X 1987.32 MHz CPU )
Load Average: 1.09, 1.04, 0.90
terminate called after throwing an instance of 'std::length_error'
  what():  cannot create std::vector larger than max_size()
<end of output>
Test time =   0.05 sec
----------------------------------------------------------
Test Failed.

I see that it's used immediately in CPUInfo constructor without error checking. I'll have to experiment and see what is a reasonable way to propagate such an error up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, personally, i just wouldn't bother.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then how about std::abort() (or, I guess the abort handler stuff)? It looks like there's just no good way for callers to handle errors here and there are no circumstances under which you can meaningfully work with zero CPUs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we could.. the CHECK macros we have get compiled out in release builds but maybe we should have one that doesn't get compiled out (or just CallAbortHandler directly).

src/sysinfo.cc Outdated
@@ -510,10 +510,13 @@ int GetNumCPUs() {
int max_id = -1;
std::ifstream f("/proc/cpuinfo");
if (!f.is_open()) {
std::cerr << "failed to open /proc/cpuinfo\n";
return -1;
PrintErrorAndDie("Failed to open /proc/cpuinfo");
Copy link
Collaborator

@LebedevRI LebedevRI Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While there, could you please cleanup rest of this function, Ctrl+F for stderr?

@LebedevRI
Copy link
Collaborator

(generally looks fine to me, thank you)

src/sysinfo.cc Outdated
Comment on lines 480 to 481
if (GetSysctl("hw.ncpu", &num_cpu)) return num_cpu;
fprintf(stderr, "Err: %s\n", strerror(errno));
std::exit(EXIT_FAILURE);
PrintErrorAndDie("Err: ", strerror(errno));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becoming too messy :)
Please split this PR into two:

  1. The actual changes to support the architecture
  2. All of the PrintErrorAndDie stuff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, a little confused, isn't that what you asked for above?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess i thought it would look nicer?
Concretely, i'm getting stuck on how asymmetrical this all is,
sometimes there is a check for ncpu=0, sometimes there is not.

My next comment would be to extract the if ncpu<1 abort logic
into a wrapper function.

I just don't think it makes sense to do all that cleanup in this PR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will dropping the last commit be sufficient for this PR then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sysinfo: support parsing /proc/cpuinfo on Alpha" also has some PrintErrorAndDie stuff.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO that is immediately and directly relevant to this PR. I spent a lot of time investigating and trying to understand why the tests were failing, looking for issues with the cycle timer, when it just turned out that failing to parse /proc/cpuinfo silently returns garbage data that results in all sorts of cryptic errors down the line later. I believe handling this in a sensible way is super important in order to make porting to this and other platforms a reasonable process.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, i'm only talking about how to get this diff to the state where i'd be happy to merge it,
and splitting it into two separate chunks will do that.
We spent more time arguing about that than it would have taken to get both of those PR's merged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(please send that second PR please :))

Copy link
Collaborator

@LebedevRI LebedevRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

As documented, the real cycle counter is unsafe to use here, because it
is a 32-bit integer which wraps every ~4s.  Use gettimeofday instead,
which has a limitation of a low-precision real-time-clock (~1ms), but no
wrapping.  Passes test suite.

Signed-off-by: Sam James <sam@gentoo.org>
And also, bail out if unable to parse /proc/cpuinfo.  This will
preemptively alert users on platforms that need custom code for parsing
/proc/cpuinfo.

Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Sam James <sam@gentoo.org>
@LebedevRI LebedevRI merged commit 385033b into google:main Feb 13, 2024
80 checks passed
@LebedevRI
Copy link
Collaborator

@thesamesam thank you!

@thesamesam
Copy link
Contributor Author

Thank you all!

@thesamesam thesamesam deleted the alpha branch February 13, 2024 23:00
thesamesam added a commit to thesamesam/benchmark that referenced this pull request Feb 14, 2024
Defines a wrapper function, CheckNumCPUs, which enforces that GetNumCPUs
never returns fewer than one CPU.  There is no reasonable way to
continue if we are unable to identify the number of CPUs.

This is a followup to google#1753.

Signed-off-by: Sam James <sam@gentoo.org>
matoro added a commit to matoro/gentoo that referenced this pull request Mar 12, 2024
Accepted upstream already.

See: google/benchmark#1753
See: google/benchmark#1756
Bug: https://bugs.gentoo.org/922877
Signed-off-by: Matoro Mahri <matoro_gentoo@matoro.tk>
gentoo-bot pushed a commit to gentoo/gentoo that referenced this pull request Mar 21, 2024
Accepted upstream already.

See: google/benchmark#1753
See: google/benchmark#1756
Bug: https://bugs.gentoo.org/922877
Signed-off-by: Matoro Mahri <matoro_gentoo@matoro.tk>
Closes: #35729
Signed-off-by: Sam James <sam@gentoo.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants