Can Quest run on Apple M1 ? #301

keithyau · 2021-09-27T04:10:28Z

Wondering if llvm / Clang can be supported Apple M1

TysonRayJones · 2021-09-27T04:34:54Z

Hi there,

I don't have an M1 handy to test, but certainly there's nothing special in the QuEST architecture to preclude it.
I would confidently assume that serial QuEST is supported by whatever the M1 compiling chain is.

For multithreading; QuEST supports OpenMP versions 2.0 (in develop - the master branch temporarily requires 3.1) through to OpenMP 5.0 (the latest). It is not yet tested with 5.1, but is expected compatible. Mature releases of Clang support OpenMP (e.g. OpenMP 4.5 in Clang 13). If the M1 compiling chain fully supports clang, then I expect QuEST to compile fine.

But one never knows until they test!

keithyau · 2021-11-12T01:58:21Z

thank you !

mmoelle1 · 2021-12-16T07:23:21Z

Hi there,

I tried compiling QuEST on an M1 and it works. However, it needs some modification of the CMakeLists.txt file.

Original (same for C++ compiler):

# TODO standardize
# set C compiler flags based on compiler type
if ("${CMAKE_C_COMPILER_ID}" STREQUAL "Clang")
  # using Clang
  set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} \
    -mavx -Wall"
  )
elseif ("${CMAKE_C_COMPILER_ID}" STREQUAL "GNU")
  # using GCC
  set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} \
    -mavx -Wall"
  )
elseif ("${CMAKE_C_COMPILER_ID}" STREQUAL "Intel")
  # using Intel
  set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} \
    -fprotect-parens -Wall -xAVX -axCORE-AVX2 -diag-disable cpu-dispatch"
  )
elseif ("${CMAKE_C_COMPILER_ID}" STREQUAL "MSVC")
  # using Visual Studio
  string(REGEX REPLACE "/W3" "" CMAKE_C_FLAGS ${CMAKE_C_FLAGS})
  string(REGEX REPLACE "-W3" "" CMAKE_C_FLAGS ${CMAKE_C_FLAGS})
  set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} \
    -w"
  )
endif()

Apple's default compiler reports itself as AppleClang so by accident no -mavx flag is set which does not work on M1. However, when you install a true GCC (e.g. using homebrew), the about detects a GNU compiler and sets -mavx which leads to a compiler error. The same problem happens on any non-x86_64 architecture (ARM/ARM64, PPC).

As a quick fix I'd suggest to wrap the entire if()...endif() block in if (CMAKE_SYSTEM_PROCESSOR MATCHES "(x86)|(X86)|(amd64)|(AMD64)") ... endif() which will disable it for any non-x86_64 architecture.

TysonRayJones · 2021-12-24T05:25:15Z

Hi Matthias,
That's really useful to know, thanks very much!
I've been meaning to test whether QuEST can meaningfully utilise auto-vectorisation for a while, so I'll add that to my backlog and update the build afterward (or remove the flag entirely). @rrmeister who has a better understanding of the CMake build may also be interested.
Thanks again!

ekapit · 2022-02-15T19:51:40Z

I just got a new M1 Max laptop, and am trying out QuEST on it. Naively, it should be extremely fast-- this CPU has 10 cores and 200+ GB/s usable memory bandwidth, higher than most Xeons, and since that's the primary bottleneck it should be very quick. And I was able to get Apple clang to link to openMP correctly, so it is multithreaded. However when trying it out it ends up being much slower than on intel chips. I tried setting "march=apple-m1" as a compiler flag to make sure it's compiling native code but that didn't seem to change anything. I strongly suspect this is a compiler issue, though I'm not sure what to try next.

Has anyone gotten QuEST to perform well on Apple Silicon?

TysonRayJones · 2022-02-19T05:12:31Z

Hi ekapit,

Hmm that's quite puzzling. I've created a very simple MWE below which modifies a complex array much like QuEST's backend CPU code.

Let's first test if your laptop is performing as expected for a serial simulation.
Can you copy the code below into a file (e.g. github_issue.c), and compile it serially using -O3 optimisation, and whatever additional arguments you need to target M1?

On my 13-inch Macbook, I compiled via

clang github_issue.c -O3 -o test

using clang-10. It ran (./test) in 12s.

In what time does your M1 laptop run?

MWE

/* compile as...
 *  serial:
 *      clang github_issue.c -O3 -o test
 *  multithreaded:
 *      clang github_issue.c -O3 -openmp -o test
 *
 * run as...
 *     export OMP_NUM_THREADS=1
 *     ./test
 *
 * Memory cost = 16 * 2^numQb (bytes)
 *      20 qubits = 16 MiB
 *      28 qubits = 4 GiB
 *
 * Serial simulation of 28 qubits on my 13-inch Macbook Pro,
 * compiled with clang-1000.10.44.2:
 *      12.133904 (s)
 */

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <complex.h>
#include <sys/time.h>

#ifdef _OPENMP
#include <omp.h>
#endif

#define START_TIMING() \
    struct timeval tval_before, tval_after, tval_result; \
    gettimeofday(&tval_before, NULL);
    
#define STOP_TIMING() \
    gettimeofday(&tval_after, NULL); \
    timersub(&tval_after, &tval_before, &tval_result); \
    printf("%ld.%06ld (s)\n", \
        (long int) tval_result.tv_sec, \
        (long int) tval_result.tv_usec);

typedef long long unsigned int INDEX;

typedef double complex AMP;

void applyGate(AMP* amps, int t, int numQb) {
    
    const double fac = 1/sqrt(2);
    const INDEX iNum = (1ULL << numQb) >> 1;
    
#ifdef _OPENMP
#pragma omp parallel \
    default  (none) \
    shared   (amps,t,numQb, fac,iNum) \
    private  (i,j,j0k,j1k,a1,a2)
#endif
    {
#ifdef _OPENMP
#pragma omp for schedule (static)
#endif
        for (INDEX i=0; i<iNum; i++) {
            
            // |0>|i> -> |j>|0>|k>, |j>|1>|k>
            INDEX j = (i >> t) << t;
            INDEX j0k = (j << 1ULL) ^ (i - j);
            INDEX j1k = j0k ^ (1ULL << t);
                    
            AMP a1 = amps[j0k];
            AMP a2 = amps[j1k];
            amps[j0k] = fac*a1 + fac*a2;
            amps[j1k] = fac*a1 - fac*a2;
        }
    }
}



int main() {
    
    int numQb = 28;
    
    INDEX numAmp = (1ULL<<numQb);
    AMP* amps = malloc(numAmp * sizeof *amps);
    for (INDEX i=0; i<numAmp; i++)
        amps[i] = 1./i + 2.*I/i;
    
    START_TIMING()
    
    for (int t=0; t<numQb; t++)
        applyGate(amps, t, numQb);
        
    STOP_TIMING()
    
    free(amps);
    return 0;
}

mmoelle1 · 2022-02-20T18:51:37Z

Hi Tyson,

I tried you code on my Apple M1 (MacBook Pro) not the M1 Max or Pro as the OP.

Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: arm64-apple-darwin21.3.0

Serial

❯ clang github_issue.c -O3 -o test
8.559273 (s)

OpenMP

❯ clang github_issue.c -O3 -openmp -o test
7.743996 (s) OMP_NUM_THREADS=1
4.227490 (s) OMP_NUM_THREADS=2
4.195969 (s) OMP_NUM_THREADS=4
4.211792 (s) OMP_NUM_THREADS=8

GCC 11.2.0.3 (from home-brew)

Serial

7.596629 (s)

OpenMP

7.835089 (s) OMP_NUM_THREADS=1
5.348674 (s) OMP_NUM_THREADS=2
5.083343 (s) OMP_NUM_THREADS=4
5.096947 (s) OMP_NUM_THREADS=8

For GCC the line private (i,j,j0k,j1k,a1,a2) needs to be removed.

TysonRayJones · 2022-02-24T10:42:07Z

Thanks very much Matthias! (and oops regarding GCC; I forgot we have to pre-declare our OpenMP variables there like filthy animals).

Those are encouraging times, which to me confirm ekapit's performance issues are indeed related to build parameters, as we discussed above. Or maybe we're comparing to some very impressive Intel chips! :)

fieldofnodes · 2023-01-13T13:01:09Z

Hi, I have an M1 Max macbook pro and I just added #346 to this as I can not get QuEST to make for testing.

TysonRayJones · 2024-08-21T11:21:12Z

Confirming QuEST v4 (due for release mid-September) runs fine on an M1 Mac (which is now my main development machine!), with a naive build. We'll make sure our revised CMake build avoids the above issues.

TysonRayJones added the enhancement label Oct 13, 2021

rrmeister mentioned this issue Jan 13, 2023

Failed make. CATCH_BREAK_INTO_DEBUGGER(); unrecognized instruction mnemonic, did you mean: bit, cnt, hint, ins, not? #346

Closed

TysonRayJones closed this as completed Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can Quest run on Apple M1 ? #301

Can Quest run on Apple M1 ? #301

keithyau commented Sep 27, 2021

TysonRayJones commented Sep 27, 2021

keithyau commented Nov 12, 2021

mmoelle1 commented Dec 16, 2021

TysonRayJones commented Dec 24, 2021

ekapit commented Feb 15, 2022

TysonRayJones commented Feb 19, 2022

mmoelle1 commented Feb 20, 2022

TysonRayJones commented Feb 24, 2022

fieldofnodes commented Jan 13, 2023

TysonRayJones commented Aug 21, 2024

Can Quest run on Apple M1 ? #301

Can Quest run on Apple M1 ? #301

Comments

keithyau commented Sep 27, 2021

TysonRayJones commented Sep 27, 2021

keithyau commented Nov 12, 2021

mmoelle1 commented Dec 16, 2021

TysonRayJones commented Dec 24, 2021

ekapit commented Feb 15, 2022

TysonRayJones commented Feb 19, 2022

MWE

mmoelle1 commented Feb 20, 2022

TysonRayJones commented Feb 24, 2022

fieldofnodes commented Jan 13, 2023

TysonRayJones commented Aug 21, 2024