Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mfence vs lfence vs cpuid #32

Open
oreparaz opened this issue Jan 12, 2024 · 1 comment
Open

mfence vs lfence vs cpuid #32

oreparaz opened this issue Jan 12, 2024 · 1 comment

Comments

@oreparaz
Copy link
Owner

oreparaz commented Jan 12, 2024

We've been a bit lazy on how we're using RDTSC. The original piece of code (probably about 10 years ago) had this comment:

Intel actually recommends calling CPUID to serialize the execution flow
 and reduce variance in measurement due to out-of-order execution.
 We don't do that here yet.
 see §3.2.1 http://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-execution-paper.html

That link is gone, but the paper can be found in mirrors. It's a good resource and has the following advice. We should probably just follow it:

1 2

Resources:

uint64_t rdtsc() {
  uint64_t a, d;
  asm volatile ("mfence");
  asm volatile ("rdtsc" : "=a" (a), "=d" (d));
  a = (d<<32) | a;
  asm volatile ("mfence");
  return a;
}
long long ticks(void)
{
  unsigned long long result;
  asm volatile(".byte 15;.byte 49;shlq $32,%%rdx;orq %%rdx,%%rax"
    : "=a"(result) :: "%rdx");
  return result;
}

The test programs use the serializing instruction CPUID before and after reading the time stamp counter in order to prevent out-of-order execution to interfere with the measurements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants