-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apple M1 CPU not identified as hybrid #454
Comments
We recently discussed this in the comments at the bottom of https://cpufun.substack.com/p/more-m1-fun-hardware-information/comments#comment-1284317 We're looking for a way to identify CPUs but sysctl doesn't help. And MacOS doesn't support binding, hence we cannot bind of each core to execute some ARM specific instruction for detecting features. |
OK thanks for the info! For my MPI-based app, forcing it to use 4 of 8 M1 cores made the MPI app twice as fast as using all 8 cores, as my hwloc-based CPU detection code does normally. I think for the immediate moment that will be my workaround till the community determines how to detect this programatically. For reference, a crude workaround in CMake can be derived from cmake_host_system_information(RESULT sys_info QUERY OS_NAME OS_PLATFORM) that will contain list |
Looks like there are 11MB (unless unit is KB in that case this would be 11GB) for OpenCL: Co-Processor(OpenCL) L#0 (Backend=OpenCL OpenCLDeviceType=GPU GPUVendor=Apple GPUModel="Apple M1" OpenCLPlatformIndex=0 OpenCLPlatformName=Apple OpenCLPlatformDeviceIndex=0 OpenCLComputeUnits=8 OpenCLGlobalMemorySize=11184816) "opencl0d0" |
MemorySize in info attributes are indeed in kB (contrary to NUMA node integer attribute in bytes). That's a mistake from the very first (likely CUDA) GPU inof attributes, hard to fix now. |
information of interest: hw.memsize: 17179869184 could |
We're supposed to have cache line size and page size already (check in the cache object attributes, and in the numa node page_type attribute in the XML output from "lstopo -.xml"). Regarding the core description, I have been thinking about this for a while. We'd basically need to hardwire the list of common ARM core numbers like lscpu does (in https://github.com/karelzak/util-linux/blob/master/sys-utils/lscpu-arm.c but it looks like they don't have M1 yet). But it's not easy to maintain up-to-date given how many ARM CPU vendors exist :/ |
IORegistryExplorer from "Additional tools for Xcode": https://developer.apple.com/download/more/ This gives device tree information that allows capturing what is necessary: I am going to check if I can make a PoC to get that information for hwloc |
This looks like the ARM device-tree on Linux (unfortunately we had no way to know which device-tree CPU correspond to which Linux cpu). At least the icestorm/firestorm line is different between the 2 cpukinds. |
The following code allows to extract relevant cluster and cpu information: I assume a hwloc savvy developper can leverage it to add better Apple M1 support ;-)
|
I get the same result with the code above, having to add the line clang hw.c -framework Foundation -framework IOKit For now I'm using the workaround I mentioned above (hardcoding number of CPU when Mac ARM64 detected) but with newer Mac ARM64's seeming imminent, looking forward to getting a more robust solution. I put fozog's code into a Gist to make it easier to use: https://gist.github.com/scivision/4abc01e731105228272f74fb6d112232 |
In anticipation of new Apple Silicon models, I made a CMake standalone example that programmatically counts the number of "fast" Apple Silicon cores. As new Apple Silicon CPUs become public, this can be modified to accommodate them. This workaround doesn't use hwloc. https://github.com/scivision/cmake-apple-silicon-count For my MPI projects, I programmatically count physical CPU cores in CMake, which generates configuration files used at runtime. For other architectures, I use hwloc from CMake to do this. |
Hello. I haven't had a chance to look at this yet, mostly because I don't have access to the hardware. I'd like to look at it for 2.6 during summer. One thing to clarify: You said earlier "For my MPI-based app, forcing it to use 4 of 8 M1 cores ...", what do you mean? Do you just ask for 4 cores in mpirun and the OS manages to use 4 fast ones? Or do you have a way to say you want fast cores? Or do manually select every fast cores? We currently have no way to bind tasks to individual CPUs on Mac OS X. Hence we don't care which cores are marked as performance or efficiency. It looks like your code identifies those performance or efficient cores anyway, but do we only care about the number of each kind? |
It seems MacOS defaults to use the fast cores for intensive tasks. So if I identify how many fast cores are on a system--in this case, I count the number of "firestorm" CPU cores, and then do In effect, I have CMake build and run a little C program doing that, and write the fast CPU count to a generated file that's used by the user programs. For non Apple Silicon, I use hwloc to count the physical CPU cores and generate the same file. So when new Apple Silicon comes out, maybe it will have a different name for the fast cores, and I'll detect that and count that name instead for those CPUs. |
Thanks. Do you have an easy way to enable this code only on recent Mac platforms? The code works on old x86 macs (just need to initialize cpu.compatible to NULL) but it's useless there. |
The original code from opensource.apple.com that @fozog modidied is under the APSL licence. It looks like this cannot be casted into hwloc's BSD3. We'd need to find a BSD-compatible implementation or somebody to rewrite it from scratch. |
Here's a completely different implementation that I wrote from scratch specifically for this usecase. Can you get test it? My x86 VMs have none of the required attributes, hence I used "name" for very basic testing of this code.
|
Seems to work fine, I am going to integrate it in hwloc. I think I will combine all strings from "compatible" into a single one in case the useful ones aren't first one day. |
OK thanks--I will correct the license for the demo project code from @fozog (or perhaps replace with yours). Thank you! |
Try a hwloc tarball from https://ci.inria.fr/hwloc/view/all/job/bgoglin/407/ |
@bgoglin your code works nicely on my system too. |
We read the 'cluster-type' ('E' for energy and 'P' for performance) and the 'compatible' string (either "apple,icestorm;ARM,v8" or "apple,firestorm;ARM,v8" on M1 processor) to build two cpukinds. Thanks to Michael Hirsch and Francois Ozog for the help. Closes: open-mpi#454 Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
We read the 'cluster-type' ('E' for energy and 'P' for performance) and the 'compatible' string (either "apple,icestorm;ARM,v8" or "apple,firestorm;ARM,v8" on M1 processor) to build two cpukinds. Thanks to Michael Hirsch and Francois Ozog for the help. Closes: open-mpi#454 Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Tarball of the pull request are available at https://ci.inria.fr/hwloc/job/basic/view/change-requests/job/PR-475/
On old Mac without hybrid CPU, there are no cpukind at all. So if you get 0 from hwloc-calc, try again with |
We read the 'cluster-type' ('E' for energy and 'P' for performance) and the 'compatible' string (either "apple,icestorm;ARM,v8" or "apple,firestorm;ARM,v8" on M1 processor) to build two cpukinds. Thanks to Michael Hirsch and Francois Ozog for the help. Closes: open-mpi#454 Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
By the way, we could add information about core frequency in hwloc like we do on Linux. Can you check if some fields in the IO Registry are different between cores? For instance, do you have something like "00 36 6e 01" in clock-frequency and fixed-frequency for all cores? Also the cache configuration reported by hwloc looks wrong compared to what wikipedia says (L2 should be shared by cluster, the sizes are different). |
Ok thanks. Looks like we won't do anything with frequencies. Cache is surprising since icestorm is supposed to have a 4MB cache and firestorm a 12MB. Not sure how those would map to 00 00 40 00 and 00 00 80 00 :) Ok I'll forget about this for now (we had issues with cache sizes and sharing reported by Mac in the past) and merge the PR next week unless somebody complains. |
Using hwloc 2.4.1 or 2.5.0
gives incorrect
lstopo
:This is a Mac Mini M1 (2020) with 8 GB RAM.
sysctl_hw.txt
The text was updated successfully, but these errors were encountered: