Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why getarch get L1d(8k) is different from lscpu(32k) ? #1232

Closed
wangshankun opened this issue Jul 11, 2017 · 12 comments · Fixed by #1236
Closed

why getarch get L1d(8k) is different from lscpu(32k) ? #1232

wangshankun opened this issue Jul 11, 2017 · 12 comments · Fixed by #1236

Comments

@wangshankun
Copy link

wangshankun commented Jul 11, 2017

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Stepping: 3
CPU MHz: 816.664
BogoMIPS: 6816.07
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
hd@WellOcean12:~/OpenBLAS$ ./getarch 1
#define HASWELL
#define L1_CODE_SIZE 16384
#define L1_CODE_ASSOCIATIVE 4
#define L1_CODE_LINESIZE 64
#define L1_DATA_SIZE 8192
#define L1_DATA_ASSOCIATIVE 4
#define L1_DATA_LINESIZE 64
#define L2_SIZE 262144
#define L2_ASSOCIATIVE 8
#define L2_LINESIZE 64

hd@WellOcean12:~/OpenBLAS$ more /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
stepping : 3
microcode : 0x84
cpu MHz : 800.062
cache size : 8192 KB

@martin-frbg
Copy link
Collaborator

Could be a mistake or omission in the get_cacheinfo() function of cpuid_x86.c - is this with the current
"develop" branch or with one of the older release versions ?

@wangshankun
Copy link
Author

develop
git remote -v
https://github.com/xianyi/OpenBLAS.git

@martin-frbg
Copy link
Collaborator

As far as I can tell, the entries in the big switch{} statement around line 340 of cpuid_x86.c appear to match the enumeration at http://www.sandpile.org/x86/cpuid.htm#level_0000_0002h from which it is derived (at least as far as entries for "data L1 cache" are concerned). Perhaps it would help to know what info[i] contains in your case.

@martin-frbg
Copy link
Collaborator

He. Seems there is a "break" missing after line 645 of cpuid_x86.c, causing the 0x63 case to fall through into unrelated 0x66 that sets LD1 to 8k. Can you check if that fixes your problem, or would you like me to supply the modified file ?

@martin-frbg
Copy link
Collaborator

Patch committed as obvious (famous last words...), please check if this actually fixes your problem.

@MigMuc
Copy link

MigMuc commented Jul 11, 2017

I applied your patch and I do not get any L1-Cache information. My CPU is Kaby Lake.
``[mig@madrid OpenBLAS-develop]$ ./getarch 1
#define HASWELL
#define L2_SIZE 262144
#define L2_ASSOCIATIVE 8
#define L2_LINESIZE 64
#define ITB_SIZE 2097152
#define ITB_ASSOCIATIVE 0
#define ITB_ENTRIES 8
#define DTB_SIZE 4096
#define DTB_ASSOCIATIVE 4
#define DTB_DEFAULT_ENTRIES 64
#define HAVE_CMOV
#define HAVE_MMX
#define HAVE_SSE
#define HAVE_SSE2
#define HAVE_SSE3
#define HAVE_SSSE3
#define HAVE_SSE4_1
#define HAVE_SSE4_2
#define HAVE_AVX
#define HAVE_FMA3
#define HAVE_CFLUSH
#define NUM_SHAREDCACHE 1
#define NUM_CORES 1
#define CORE_HASWELL
#define CHAR_CORENAME "HASWELL"`

With lscpu I get:
[mig@madrid ~]$ lscpu
Architektur: x86_64
CPU Operationsmodus: 32-bit, 64-bit
Byte-Reihenfolge: Little Endian
CPU(s): 4
Liste der Online-CPU(s):0-3
Thread(s) pro Kern: 2
Kern(e) pro Socket: 2
Sockel: 1
NUMA-Knoten: 1
Anbieterkennung: GenuineIntel
Prozessorfamilie: 6
Modell: 142
Modellname: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
Stepping: 9
CPU MHz: 399.957
Maximale Taktfrequenz der CPU:3100,0000
Minimale Taktfrequenz der CPU:400,0000
BogoMIPS: 5426.00
Virtualisierung: VT-x
L1d Cache: 32K
L1i Cache: 32K
L2 Cache: 256K
L3 Cache: 3072K
NUMA-Knoten0 CPU(s): 0-3

@wangshankun
Copy link
Author

Me too ...
$ ./getarch 1
#define HASWELL
#define L2_SIZE 262144
#define L2_ASSOCIATIVE 8
#define L2_LINESIZE 64
#define ITB_SIZE 2097152
#define ITB_ASSOCIATIVE 0
#define ITB_ENTRIES 8
#define DTB_SIZE 4096
#define DTB_ASSOCIATIVE 4
#define DTB_DEFAULT_ENTRIES 64
#define HAVE_CMOV
#define HAVE_MMX
#define HAVE_SSE
#define HAVE_SSE2
#define HAVE_SSE3
#define HAVE_SSSE3
#define HAVE_SSE4_1
#define HAVE_SSE4_2
#define HAVE_AVX
#define HAVE_FMA3
#define HAVE_CFLUSH
#define NUM_SHAREDCACHE 1
#define NUM_CORES 1
#define CORE_HASWELL
#define CHAR_CORENAME "HASWELL"

@martin-frbg
Copy link
Collaborator

So there must be more to this - I have reverted my "quick fix" now, sorry for that.. will try to get a dump of the info array from get_cacheinfo on Kaby Lake later today to see if there are any unhandled or conflicting returns.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jul 12, 2017

On Kaby Lake the non-zero elements of the info array turn out to be 0x63 0x03 0x76 0xFF 0xB5 0xF0 0xC3 , of which the unhandled B5 is supposed to set the code TLB values (4/8/64) and FF is "use cpuid(4,...) to query cache configuration". So it appears the inadvertent fall-through from 0x63 to 0x66 just masks this by supplying a wrong but passable value for the L1 size.
Update: actually it seems we need to use cpuid(0x80000005) to query L1 size - the current code has similar provisions to read L2 from 0x80000006 on Intel in case it could not be determined, and for querying cache sizes on AMD an Centaur chips. I'll try to come up with an improved PR, but I am wondering if this might be too far-reaching to go into the upcoming 0.2.20

@martin-frbg
Copy link
Collaborator

Trouble is that the extended cpuid call 0x8000_0005 does not appear to return anything non-zero on Kaby Lake (probably Intel cpus in general ?), and standard 0x8000_0004 does not seem to provide the cache size (unless it is implicit in one of the items). I wonder if it makes sense to just fall back to a hardcoded 32k L1 code/data cache size for "modern" Intel processors if the value could not be read ?

@martin-frbg
Copy link
Collaborator

Can either of you test with #1236 please ? (I assume it will work on Linux and Windows, if anything there may be problems on OS X or other x86-based operating systems that I cannot test)

@MigMuc
Copy link

MigMuc commented Jul 14, 2017

The patch works for me. Now I get:
#define HASWELL
#define L1_CODE_SIZE 32768
#define L1_CODE_ASSOCIATIVE 8
#define L1_CODE_LINESIZE 64
#define L1_DATA_SIZE 32768
#define L1_DATA_ASSOCIATIVE 8
#define L1_DATA_LINESIZE 64
#define L2_SIZE 262144
#define L2_ASSOCIATIVE 8
#define L2_LINESIZE 64
#define ITB_SIZE 2097152
#define ITB_ASSOCIATIVE 0
#define ITB_ENTRIES 8
#define DTB_SIZE 4096
#define DTB_ASSOCIATIVE 4
#define DTB_DEFAULT_ENTRIES 64
#define HAVE_CMOV
#define HAVE_MMX
#define HAVE_SSE
#define HAVE_SSE2
#define HAVE_SSE3
#define HAVE_SSSE3
#define HAVE_SSE4_1
#define HAVE_SSE4_2
#define HAVE_AVX
#define HAVE_FMA3
#define HAVE_CFLUSH
#define NUM_SHAREDCACHE 1
#define NUM_CORES 1
#define CORE_HASWELL
#define CHAR_CORENAME "HASWELL"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants