Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apple M1 CPU not identified as hybrid #454

Closed
scivision opened this issue Mar 1, 2021 · 24 comments
Closed

Apple M1 CPU not identified as hybrid #454

scivision opened this issue Mar 1, 2021 · 24 comments

Comments

@scivision
Copy link
Contributor

scivision commented Mar 1, 2021

Using hwloc 2.4.1 or 2.5.0

$ uname -a

Darwin rigs-mac-mini.lan 20.5.0 Darwin Kernel Version 20.5.0: Sat May  8 05:10:31 PDT 2021; root:xnu-7195.121.3~9/RELEASE_ARM64_T8101 arm64

gives incorrect lstopo:

Machine (3491MB total)
  Package L#0
    NUMANode L#0 (P#0 3491MB)
    L2 L#0 (4096KB) + L1d L#0 (64KB) + L1i L#0 (128KB) + Core L#0 + PU L#0 (P#0)
    L2 L#1 (4096KB) + L1d L#1 (64KB) + L1i L#1 (128KB) + Core L#1 + PU L#1 (P#1)
    L2 L#2 (4096KB) + L1d L#2 (64KB) + L1i L#2 (128KB) + Core L#2 + PU L#2 (P#2)
    L2 L#3 (4096KB) + L1d L#3 (64KB) + L1i L#3 (128KB) + Core L#3 + PU L#3 (P#3)
    L2 L#4 (4096KB) + L1d L#4 (64KB) + L1i L#4 (128KB) + Core L#4 + PU L#4 (P#4)
    L2 L#5 (4096KB) + L1d L#5 (64KB) + L1i L#5 (128KB) + Core L#5 + PU L#5 (P#5)
    L2 L#6 (4096KB) + L1d L#6 (64KB) + L1i L#6 (128KB) + Core L#6 + PU L#6 (P#6)
    L2 L#7 (4096KB) + L1d L#7 (64KB) + L1i L#7 (128KB) + Core L#7 + PU L#7 (P#7)
  CoProc(OpenCL) "opencl0d0"

This is a Mac Mini M1 (2020) with 8 GB RAM.

  • it looks like total RAM is not correct
  • it looks like all CPU are identified as identical--however it looks like sysctl is also saying this.

sysctl_hw.txt

@bgoglin
Copy link
Contributor

bgoglin commented Mar 1, 2021

We recently discussed this in the comments at the bottom of https://cpufun.substack.com/p/more-m1-fun-hardware-information/comments#comment-1284317
The missing RAM is likely partitioned and given to your GPU, you should see it in the OpenCL device with lstopo -v.

We're looking for a way to identify CPUs but sysctl doesn't help. And MacOS doesn't support binding, hence we cannot bind of each core to execute some ARM specific instruction for detecting features.

@scivision
Copy link
Contributor Author

scivision commented Mar 1, 2021

OK thanks for the info!

For my MPI-based app, forcing it to use 4 of 8 M1 cores made the MPI app twice as fast as using all 8 cores, as my hwloc-based CPU detection code does normally.

I think for the immediate moment that will be my workaround till the community determines how to detect this programatically.

For reference, a crude workaround in CMake can be derived from

cmake_host_system_information(RESULT sys_info QUERY OS_NAME OS_PLATFORM)

that will contain list "macOS;arm64" which can be used to enable whatever workaround one desires to use 4 of the 8 CPUs for CTest MPI use

@bgoglin bgoglin changed the title Apple M1 misidentifies CPUs Apple M1 CPU not identified as hybrid Mar 1, 2021
@fozog
Copy link

fozog commented Apr 12, 2021

Looks like there are 11MB (unless unit is KB in that case this would be 11GB) for OpenCL:

Co-Processor(OpenCL) L#0 (Backend=OpenCL OpenCLDeviceType=GPU GPUVendor=Apple GPUModel="Apple M1" OpenCLPlatformIndex=0 OpenCLPlatformName=Apple OpenCLPlatformDeviceIndex=0 OpenCLComputeUnits=8 OpenCLGlobalMemorySize=11184816) "opencl0d0"
depth 0: 1 Machine (type #0)
depth 1: 1 Package (type #1)
depth 2: 8 L2Cache (type #5)
depth 3: 8 L1dCache (type #4)
depth 4: 8 L1iCache (type #9)
depth 5: 8 Core (type #2)
depth 6: 8 PU (type #3)
Special depth -3: 1 NUMANode (type #13)
Special depth -6: 1 OSDev (type #16)

@bgoglin
Copy link
Contributor

bgoglin commented Apr 12, 2021

MemorySize in info attributes are indeed in kB (contrary to NUMA node integer attribute in bytes). That's a mistake from the very first (likely CUDA) GPU inof attributes, hard to fix now.

@fozog
Copy link

fozog commented Apr 12, 2021

information of interest:

hw.memsize: 17179869184
hw.cachelinesize: 128 // key for high-performance networking such as DPDK, non 64B cache line need to produce some warning for performance for instance
hw.pagesize: 16384 // important for alignment and other optimized slab allocators.

could
hw.cpufamily: 458787763
hw.cpusubfamily: 2
be used to populate proper core description?

@bgoglin
Copy link
Contributor

bgoglin commented Apr 12, 2021

We're supposed to have cache line size and page size already (check in the cache object attributes, and in the numa node page_type attribute in the XML output from "lstopo -.xml").

Regarding the core description, I have been thinking about this for a while. We'd basically need to hardwire the list of common ARM core numbers like lscpu does (in https://github.com/karelzak/util-linux/blob/master/sys-utils/lscpu-arm.c but it looks like they don't have M1 yet). But it's not easy to maintain up-to-date given how many ARM CPU vendors exist :/

@fozog
Copy link

fozog commented Apr 13, 2021

IORegistryExplorer from "Additional tools for Xcode": https://developer.apple.com/download/more/

This gives device tree information that allows capturing what is necessary:

image

image

I am going to check if I can make a PoC to get that information for hwloc

@bgoglin
Copy link
Contributor

bgoglin commented Apr 13, 2021

This looks like the ARM device-tree on Linux (unfortunately we had no way to know which device-tree CPU correspond to which Linux cpu). At least the icestorm/firestorm line is different between the 2 cpukinds.

@fozog
Copy link

fozog commented Apr 14, 2021

The following code allows to extract relevant cluster and cpu information:
cpu0@0(E): apple,icestorm
cpu1@0(E): apple,icestorm
cpu2@0(E): apple,icestorm
cpu3@0(E): apple,icestorm
cpu4@1(P): apple,firestorm
cpu5@1(P): apple,firestorm
cpu6@1(P): apple,firestorm
cpu7@1(P): apple,firestorm

I assume a hwloc savvy developper can leverage it to add better Apple M1 support ;-)

// cluster types: https://developer.apple.com/news/?id=vk3m204o
// E=Efficiency, P=Performance
// derived from https://opensource.apple.com/source/IOKitTools/IOKitTools-56/ioreg.tproj/ioreg.c.auto.html

#include <stdio.h>
#include <IOKit/IOKitLib.h>                           // (IOMasterPort, ...)

#define DT_PLANE "IODeviceTree"

static void assertion(int condition, char * message)
{
    if (condition == 0)
    {
        fprintf(stderr, "ioreg: error: %s.\n", message);
        exit(1);
    }
}

struct cpu {
    int cluster;
    char cluster_type;
    int logical_id;
    char* compatible;
};

static char* get_first_string(CFDataRef object)
{
    const UInt8 * bytes;
    CFIndex       index;
    CFIndex       length;
    
    length = CFDataGetLength(object);
    bytes  = CFDataGetBytePtr(object);
    
    for (index = 0; index < length; index++)  // (scan for ascii string/strings)
    {
        if (bytes[index] == 0)
        {
            // can return a zero length string '\0'
            char* value = malloc(index+1);
            if (value != NULL) strncpy(value, (char*)bytes, index+1);
            return value;
        }
    }

    return NULL;

}

static char* get_string(CFStringRef object)
{
    const char * c = CFStringGetCStringPtr(object, kCFStringEncodingMacRoman);

    if (c)
        return strdup(c);
    else
    {
        CFIndex bufferSize = CFStringGetLength(object) + 1;
        char *  buffer     = malloc(bufferSize);

        if (buffer)
        {
            if ( CFStringGetCString(
                    /* string     */ object,
                    /* buffer     */ buffer,
                    /* bufferSize */ bufferSize,
                    /* encoding   */ kCFStringEncodingMacRoman ) )
                

                return buffer;
        }
        return NULL;
    }
}

static int get_number(CFNumberRef object, long long* result)
{
    return CFNumberGetValue(object, kCFNumberLongLongType, result);
}

static void get_cpu(const void * key, const void * value, void * parameter)
{
    struct cpu* cpu = (struct cpu*)parameter;
    char* name = get_string(key);
    if (strcmp(name, "compatible") == 0) {
        cpu->compatible = get_first_string(value);
    }
    else if (strcmp(name, "logical-cluster-id") == 0) {
        long long number = -1;
        if (get_number(value, &number))
             cpu->cluster = (int)number;
    }
    else if (strcmp(name, "logical-cpu-id") == 0) {
        long long number = -1;
        if (get_number(value, &number))
             cpu->logical_id = (int)number;
    }
    else if (strcmp(name, "cluster-type") == 0) {
        char* type = get_first_string(value);
        if (type != NULL && strlen(type)==1)
            cpu->cluster_type = type[0];
        else
            cpu->cluster_type = '?';
        free(type);
    }
    free(name);
}

bool has_cpu_started = false;
bool last_cpu_seen = false;

static void scan( io_registry_entry_t service,
                  Boolean             serviceHasMoreSiblings,
                  UInt32              serviceDepth,
                  UInt64              stackOfBits )
{
    io_registry_entry_t child       = 0; // (needs release)
    io_registry_entry_t childUpNext = 0; // (don't release)
    io_iterator_t       children    = 0; // (needs release)
    kern_return_t       status      = KERN_SUCCESS;
    
    io_name_t       name;           // (don't release)
    
    // Obtain the service's children.

    status = IORegistryEntryGetChildIterator(service, DT_PLANE, &children);
    assertion(status == KERN_SUCCESS, "can't obtain children");

    childUpNext = IOIteratorNext(children);

    // Save has-more-siblings state into stackOfBits for this depth.

    if (serviceHasMoreSiblings)
        stackOfBits |=  (1 << serviceDepth);
    else
        stackOfBits &= ~(1 << serviceDepth);

    // Save has-children state into stackOfBits for this depth.

    if (childUpNext)
        stackOfBits |=  (2 << serviceDepth);
    else
        stackOfBits &= ~(2 << serviceDepth);

    // Print out the relevant service information.
    status = IORegistryEntryGetNameInPlane(service, DT_PLANE, name);
    assertion(status == KERN_SUCCESS, "can't obtain name");

    // Traverse over the children of this service.
    if (has_cpu_started)
    {
            // handle cpu
        CFMutableDictionaryRef properties = 0; // (needs release)
        struct cpu cpu;

        status = IORegistryEntryCreateCFProperties(service,
                                                   &properties,
                                                   kCFAllocatorDefault,
                                                   kNilOptions);
        assertion(status == KERN_SUCCESS, "can't obtain properties");
        assertion(CFGetTypeID(properties) == CFDictionaryGetTypeID(), NULL);
        
        CFDictionaryApplyFunction(properties, get_cpu, &cpu);
        
        printf("%s@%d(%c): %s\n", name, cpu.cluster, cpu.cluster_type, cpu.compatible);
        
        CFRelease(properties);
        
    }
    else if (strcmp(name, "cpus") == 0)
    {
        has_cpu_started = true;
    }
    
    while (childUpNext && !last_cpu_seen)
    {
        child       = childUpNext;
        childUpNext = IOIteratorNext(children);

        scan( /* service                */ child,
              /* serviceHasMoreSiblings */ (childUpNext) ? TRUE : FALSE,
              /* serviceDepth           */ serviceDepth + 1,
              /* stackOfBits            */ stackOfBits);

        IOObjectRelease(child); child = 0;
    }

    IOObjectRelease(children); children = 0;
    
    if (has_cpu_started && strcmp(name, "cpus") == 0)
    {
        last_cpu_seen = true;
    }
}

int main(int argc, const char * argv[]) {

    io_registry_entry_t service   = 0; // (needs release)

    service = IORegistryGetRootEntry(kIOMasterPortDefault);
    assertion(service, "can't obtain I/O Kit's root service");
    
    scan( /* service                */ service,
          /* serviceHasMoreSiblings */ FALSE,
          /* serviceDepth           */ 0,
          /* stackOfBits            */ 0
         );

    IOObjectRelease(service); service = 0;

    return 0;

}


@scivision
Copy link
Contributor Author

scivision commented Jun 1, 2021

I get the same result with the code above, having to add the line #include <CoreFoundation/CoreFoundation.h> and building as file hw.c with:

clang hw.c -framework Foundation -framework IOKit

For now I'm using the workaround I mentioned above (hardcoding number of CPU when Mac ARM64 detected) but with newer Mac ARM64's seeming imminent, looking forward to getting a more robust solution.

I put fozog's code into a Gist to make it easier to use: https://gist.github.com/scivision/4abc01e731105228272f74fb6d112232

@scivision
Copy link
Contributor Author

scivision commented Jun 15, 2021

In anticipation of new Apple Silicon models, I made a CMake standalone example that programmatically counts the number of "fast" Apple Silicon cores. As new Apple Silicon CPUs become public, this can be modified to accommodate them. This workaround doesn't use hwloc.

https://github.com/scivision/cmake-apple-silicon-count

For my MPI projects, I programmatically count physical CPU cores in CMake, which generates configuration files used at runtime. For other architectures, I use hwloc from CMake to do this.
I have found that using the slow and fast Apple Silicon is slower than using the fast cores only. I.e. better to use 4 fast cores rather than 4 fast + 4 slow, for my MPI application at least.

@bgoglin
Copy link
Contributor

bgoglin commented Jun 16, 2021

Hello. I haven't had a chance to look at this yet, mostly because I don't have access to the hardware. I'd like to look at it for 2.6 during summer.

One thing to clarify: You said earlier "For my MPI-based app, forcing it to use 4 of 8 M1 cores ...", what do you mean? Do you just ask for 4 cores in mpirun and the OS manages to use 4 fast ones? Or do you have a way to say you want fast cores? Or do manually select every fast cores?

We currently have no way to bind tasks to individual CPUs on Mac OS X. Hence we don't care which cores are marked as performance or efficiency. It looks like your code identifies those performance or efficient cores anyway, but do we only care about the number of each kind?

@scivision
Copy link
Contributor Author

scivision commented Jun 16, 2021

It seems MacOS defaults to use the fast cores for intensive tasks. So if I identify how many fast cores are on a system--in this case, I count the number of "firestorm" CPU cores, and then do mpiexec -n 4 say, MacOS seems to automatically use those four fast firestorm cores.

In effect, I have CMake build and run a little C program doing that, and write the fast CPU count to a generated file that's used by the user programs.

For non Apple Silicon, I use hwloc to count the physical CPU cores and generate the same file.

So when new Apple Silicon comes out, maybe it will have a different name for the fast cores, and I'll detect that and count that name instead for those CPUs.

@bgoglin
Copy link
Contributor

bgoglin commented Jun 17, 2021

Thanks. Do you have an easy way to enable this code only on recent Mac platforms? The code works on old x86 macs (just need to initialize cpu.compatible to NULL) but it's useless there.
Should we detect "ARM" at build or runtime? Or detect the OSX release at runtime?

@bgoglin
Copy link
Contributor

bgoglin commented Jun 19, 2021

The original code from opensource.apple.com that @fozog modidied is under the APSL licence. It looks like this cannot be casted into hwloc's BSD3. We'd need to find a BSD-compatible implementation or somebody to rewrite it from scratch.
Based on the IORegistryExplorer output above, what we need is much more simple. The IOKit part just needs to scan entries in the cpus directory under root. The attribute code for parsing these 4-5 values will hopefully remain simple too.

@bgoglin
Copy link
Contributor

bgoglin commented Jun 19, 2021

Here's a completely different implementation that I wrote from scratch specifically for this usecase. Can you get test it? My x86 VMs have none of the required attributes, hence I used "name" for very basic testing of this code.

#include <stdio.h>
#include <stdlib.h>
#include <IOKit/IOKitLib.h>
#include <CoreFoundation/CoreFoundation.h>

#define DT_PLANE "IODeviceTree"

int main()
{
  io_registry_entry_t cpus_root;
  io_iterator_t cpus_iter;
  io_registry_entry_t cpus_child;
  kern_return_t kret;

  cpus_root = IORegistryEntryFromPath(kIOMasterPortDefault, DT_PLANE ":/cpus");
  if (!cpus_root) {
    fprintf(stderr, "failed to find IODeviceTree:/cpus\n");
    exit(EXIT_FAILURE);
  }

  kret = IORegistryEntryGetChildIterator(cpus_root, DT_PLANE, &cpus_iter);
  if (kret != KERN_SUCCESS) {
    fprintf(stderr, "failed to create iterator\n");
    exit(EXIT_FAILURE);
  }

  while ((cpus_child = IOIteratorNext(cpus_iter)) != 0) {
    io_name_t name;
    CFTypeRef ref;

    kret = IORegistryEntryGetNameInPlane(cpus_child, DT_PLANE, name);
    if (kret != KERN_SUCCESS)
      continue;
    printf("looking at %s\n", name);

    ref = IORegistryEntrySearchCFProperty(cpus_child, DT_PLANE, CFSTR("logical-cpu-id"), kCFAllocatorDefault, kNilOptions);
    if (!ref) {
      fprintf(stderr, "failed to find logical-cpu-id\n");
    } else {
      printf("logical-cpu-id type is: %s\n",
             CFStringGetCStringPtr(CFCopyTypeIDDescription(CFGetTypeID(ref)), kCFStringEncodingUTF8));
      if (CFGetTypeID(ref) == CFNumberGetTypeID()) {
        long long value;
        if (CFNumberGetValue(ref, kCFNumberLongLongType, &value))
          printf("got logical-cpu-id %lld\n", value);
        else
          printf("failed to get logical-cpu-id\n");
      }
    }

    ref = IORegistryEntrySearchCFProperty(cpus_child, DT_PLANE, CFSTR("logical-cluster-id"), kCFAllocatorDefault, kNilOptions);
    if (!ref) {
      fprintf(stderr, "failed to find logical-cluster-id\n");
    } else {
      printf("logical-cluster-id type is: %s\n",
             CFStringGetCStringPtr(CFCopyTypeIDDescription(CFGetTypeID(ref)), kCFStringEncodingUTF8));
      if (CFGetTypeID(ref) == CFNumberGetTypeID()) {
        long long value;
        if (CFNumberGetValue(ref, kCFNumberLongLongType, &value))
          printf("got logical-cluster-id %lld\n", value);
        else
          printf("failed to get logical-cluster-id\n");
      }
      CFRelease(ref);
    }

    ref = IORegistryEntrySearchCFProperty(cpus_child, DT_PLANE, CFSTR("cluster-type"), kCFAllocatorDefault, kNilOptions);
    if (!ref) {
      fprintf(stderr, "failed to find cluster-type\n");
    } else {
      printf("cluster-type type is: %s\n",
             CFStringGetCStringPtr(CFCopyTypeIDDescription(CFGetTypeID(ref)), kCFStringEncodingUTF8));
      if (CFGetTypeID(ref) == CFDataGetTypeID()) {
        if (CFDataGetLength(ref) >= 2) {
          UInt8 value[2];
          CFDataGetBytes(ref, CFRangeMake(0, 2), value);
          if (value[1] == 0)
            printf("got cluster-type %c\n", value[0]);
          else
            printf("got more than one character in cluster-type data %c%c...\n", value[0], value[1]);
        } else
          printf("got only %ld bytes in cluster-type data \n", CFDataGetLength(ref));
      }
      CFRelease(ref);
    }

    ref = IORegistryEntrySearchCFProperty(cpus_child, DT_PLANE, CFSTR("compatible"), kCFAllocatorDefault, kNilOptions);
    if (!ref) {
      fprintf(stderr, "failed to find compatible\n");
    } else {
      printf("compatible type is: %s\n",
             CFStringGetCStringPtr(CFCopyTypeIDDescription(CFGetTypeID(ref)), kCFStringEncodingUTF8));
      if (CFGetTypeID(ref) == CFDataGetTypeID()) {
#define HWLOC_DARWIN_COMPATIBLE_MAX 64
        UInt8 value[HWLOC_DARWIN_COMPATIBLE_MAX+1];
        value[HWLOC_DARWIN_COMPATIBLE_MAX] = 0;
        CFDataGetBytes(ref, CFRangeMake(0, HWLOC_DARWIN_COMPATIBLE_MAX), value);
        if (value[0])
          printf("got compatible %s\n", value);
        else
          printf("compatible is empty\n");
      }
      CFRelease(ref);
    }

    ref = IORegistryEntrySearchCFProperty(cpus_child, DT_PLANE, CFSTR("name"), kCFAllocatorDefault, kNilOptions);
    if (!ref) {
      fprintf(stderr, "failed to find name\n");
    } else {
      printf("name type is: %s\n",
             CFStringGetCStringPtr(CFCopyTypeIDDescription(CFGetTypeID(ref)), kCFStringEncodingUTF8));
      if (CFGetTypeID(ref) == CFDataGetTypeID()) {
#define HWLOC_DARWIN_NAME_MAX 64
        UInt8 value[HWLOC_DARWIN_NAME_MAX+1];
        value[HWLOC_DARWIN_NAME_MAX] = 0;
        CFDataGetBytes(ref, CFRangeMake(0, HWLOC_DARWIN_NAME_MAX), value);
        if (value[0])
          printf("got name %s\n", value);
        else
          printf("name is empty\n");
      }
      CFRelease(ref);
    }

    IOObjectRelease(cpus_child);
  }
  IOObjectRelease(cpus_iter);
  IOObjectRelease(cpus_root);

  exit(EXIT_SUCCESS);
}

@bgoglin
Copy link
Contributor

bgoglin commented Jun 22, 2021

Seems to work fine, I am going to integrate it in hwloc. I think I will combine all strings from "compatible" into a single one in case the useful ones aren't first one day.
It's not clear to me whether we need to support "logical-cpu-id" and "logicial-cluster-id" not being CFNumber (unlikely), or "cluster-type" and "compatible" not being CFData (CFString or CFArray might work too). For now, I'll emit a warning if that ever happens.

@scivision
Copy link
Contributor Author

OK thanks--I will correct the license for the demo project code from @fozog (or perhaps replace with yours). Thank you!

@bgoglin
Copy link
Contributor

bgoglin commented Jun 22, 2021

Try a hwloc tarball from https://ci.inria.fr/hwloc/view/all/job/bgoglin/407/
You should see 2 cpukinds in "lstopo --cpukinds".
To get the number of slow cores do "hwloc-calc --cpukind 0 -N core all". For fast cores, do "--cpukind 1" (cores are ranked by efficiency first).

@fozog
Copy link

fozog commented Jun 22, 2021

@bgoglin your code works nicely on my system too.
Here is the dump:
looking at cpu0
logical-cpu-id type is: (null)
got logical-cpu-id 0
logical-cluster-id type is: (null)
got logical-cluster-id 0
cluster-type type is: (null)
got cluster-type E
compatible type is: (null)
got compatible apple,icestorm
name type is: (null)
got name cpu0
looking at cpu1
logical-cpu-id type is: (null)
got logical-cpu-id 1
logical-cluster-id type is: (null)
got logical-cluster-id 0
cluster-type type is: (null)
got cluster-type E
compatible type is: (null)
got compatible apple,icestorm
name type is: (null)
got name cpu1
looking at cpu2
logical-cpu-id type is: (null)
got logical-cpu-id 2
logical-cluster-id type is: (null)
got logical-cluster-id 0
cluster-type type is: (null)
got cluster-type E
compatible type is: (null)
got compatible apple,icestorm
name type is: (null)
got name cpu2
looking at cpu3
logical-cpu-id type is: (null)
got logical-cpu-id 3
logical-cluster-id type is: (null)
got logical-cluster-id 0
cluster-type type is: (null)
got cluster-type E
compatible type is: (null)
got compatible apple,icestorm
name type is: (null)
got name cpu3
looking at cpu4
logical-cpu-id type is: (null)
got logical-cpu-id 4
logical-cluster-id type is: (null)
got logical-cluster-id 1
cluster-type type is: (null)
got cluster-type P
compatible type is: (null)
got compatible apple,firestorm
name type is: (null)
got name cpu4
looking at cpu5
logical-cpu-id type is: (null)
got logical-cpu-id 5
logical-cluster-id type is: (null)
got logical-cluster-id 1
cluster-type type is: (null)
got cluster-type P
compatible type is: (null)
got compatible apple,firestorm
name type is: (null)
got name cpu5
looking at cpu6
logical-cpu-id type is: (null)
got logical-cpu-id 6
logical-cluster-id type is: (null)
got logical-cluster-id 1
cluster-type type is: (null)
got cluster-type P
compatible type is: (null)
got compatible apple,firestorm
name type is: (null)
got name cpu6
looking at cpu7
logical-cpu-id type is: (null)
got logical-cpu-id 7
logical-cluster-id type is: (null)
got logical-cluster-id 1
cluster-type type is: (null)
got cluster-type P
compatible type is: (null)
got compatible apple,firestorm
name type is: (null)
got name cpu7
Program ended with exit code: 0

bgoglin added a commit to bgoglin/hwloc that referenced this issue Jun 23, 2021
We read the 'cluster-type' ('E' for energy and 'P' for performance)
and the 'compatible' string (either "apple,icestorm;ARM,v8"
or "apple,firestorm;ARM,v8" on M1 processor) to build two cpukinds.

Thanks to Michael Hirsch and Francois Ozog for the help.

Closes: open-mpi#454

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
bgoglin added a commit to bgoglin/hwloc that referenced this issue Jun 23, 2021
We read the 'cluster-type' ('E' for energy and 'P' for performance)
and the 'compatible' string (either "apple,icestorm;ARM,v8"
or "apple,firestorm;ARM,v8" on M1 processor) to build two cpukinds.

Thanks to Michael Hirsch and Francois Ozog for the help.

Closes: open-mpi#454

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
@bgoglin
Copy link
Contributor

bgoglin commented Jun 23, 2021

Tarball of the pull request are available at https://ci.inria.fr/hwloc/job/basic/view/change-requests/job/PR-475/
You're supposed to see something like:

$ lstopo --cpukinds
CPU kind #0 efficiency 0 cpuset 0x0000000f
  DarwinCompatible = apple,icestorm;ARM,v8
CPU kind #1 efficiency 1 cpuset 0x000000f0
  DarwinCompatible = apple,firestorm;ARM,v8

$ utils/hwloc/hwloc-calc --cpukind 1 -N core all
4

On old Mac without hybrid CPU, there are no cpukind at all. So if you get 0 from hwloc-calc, try again with hwloc-calc -N core allto get the total number of cores.

bgoglin added a commit to bgoglin/hwloc that referenced this issue Jun 24, 2021
We read the 'cluster-type' ('E' for energy and 'P' for performance)
and the 'compatible' string (either "apple,icestorm;ARM,v8"
or "apple,firestorm;ARM,v8" on M1 processor) to build two cpukinds.

Thanks to Michael Hirsch and Francois Ozog for the help.

Closes: open-mpi#454

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
@bgoglin
Copy link
Contributor

bgoglin commented Jun 24, 2021

By the way, we could add information about core frequency in hwloc like we do on Linux. Can you check if some fields in the IO Registry are different between cores? For instance, do you have something like "00 36 6e 01" in clock-frequency and fixed-frequency for all cores?

Also the cache configuration reported by hwloc looks wrong compared to what wikipedia says (L2 should be shared by cluster, the sizes are different).

@fozog
Copy link

fozog commented Jun 25, 2021

All nodes have clock-frequency set to 00 36 6e 01.
cpu0 has an additional fixed-frequency 00 36 6e 01 00 00 00 00

l2-cache-id is 0 for all cores
l2-cache-size is 00 00 40 00 for icestorm
l2-cache-size is 00 00 80 00 for firestorm

image

@bgoglin
Copy link
Contributor

bgoglin commented Jun 25, 2021

Ok thanks. Looks like we won't do anything with frequencies. Cache is surprising since icestorm is supposed to have a 4MB cache and firestorm a 12MB. Not sure how those would map to 00 00 40 00 and 00 00 80 00 :)

Ok I'll forget about this for now (we had issues with cache sizes and sharing reported by Mac in the past) and merge the PR next week unless somebody complains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants