LLVM aarch64 relocation overflow #1421

Kenny-Heitritter · 2024-03-20T03:55:38Z

Required prerequisites

Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
Make sure you've read the documentation. Your issue may be addressed there.
Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

When running VQEs requiring larger amounts of memory from within the CUDA Quantum docker container (v0.6.0) on NVIDIA GH200, there is an increasing chance of getting the following error:

python: /llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:514: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.

Steps to reproduce the bug

# The following has been adapted from Marwa Farag's code at https://github.com/marwafar/QChem-cudaq/blob/main/LiH-full-space/Full-space-cudaq.py
# To reproduce the error, run this code from within the CUDA Quantum v0.6.0 container (docker run --rm -it --gpus=all nvcr.io/nvidia/cuda-quantum:0.6.0)

import cudaq
from cudaq import spin
from pyscf import gto, scf, mp, mcscf, fci, cc
from pyscf import ao2mo
from pyscf.tools import molden
from functools import reduce
import numpy as np

from openfermion import generate_hamiltonian
from openfermion.transforms import jordan_wigner

from typing import List, Tuple

def init_param_CCSD(qubits_num,nele_cas,t1,t2):
    
    sz=np.empty(qubits_num)

    for i in range(qubits_num):
        if i%2 == 0:
            sz[i]=0.5
        else:
            sz[i]=-0.5

# thetas for single excitation
    thetas_1=[]
# theta for double excitation
    thetas_2=[]

    tot_params=0
    nmo_occ=nele_cas//2

    for p_occ in range(nele_cas):
        for r_vir in range(nele_cas,qubits_num):
            if (sz[r_vir]-sz[p_occ]==0):
                thetas_1.append(t1[p_occ//2,r_vir//2-nmo_occ])
                tot_params+=1


    for p_occ in range(nele_cas-1):
        for q_occ in range(p_occ+1,nele_cas):
            for r_vir in range(nele_cas,qubits_num-1):
                for s_vir in range(r_vir+1,qubits_num):
                    if (sz[r_vir]+sz[s_vir]-sz[p_occ]-sz[q_occ])==0:
                        thetas_2.append(t2[p_occ//2,q_occ//2,r_vir//2-nmo_occ,s_vir//2-nmo_occ])
                        tot_params+=1


    init_params=np.concatenate((thetas_2,thetas_1), axis=0)
    return init_params,tot_params


mol=gto.M(
    atom='Li 0.0 0.0 0.0; H 0.0 0.0 1.5',
    spin=0,
    charge=0,
    basis='6-31G',
    output='LiH'+'.out',
    verbose=4
)

## 1- Classical preprocessing

print('\n')
print('Beginning of classical preprocessing', '\n')
print ('Energies from classical simulations','\n')

##################################
# Mean field (HF)
##################################
myhf=scf.RHF(mol)
myhf.max_cycle=100
myhf.kernel()

nelec = mol.nelectron
print('Total number of electrons= ', nelec, '\n')
norb = myhf.mo_coeff.shape[1]
print('Total number of orbitals= ', norb, '\n')

print('RHF energy= ', myhf.e_tot, '\n')

mycc=cc.CCSD(myhf).run()
print('Total CCSD energy= ', mycc.e_tot, '\n')

myfci=fci.FCI(myhf)
result= myfci.kernel()
print('FCI energy= ', result[0], '\n')

# Compute the 1e integral in atomic orbital then convert to HF basis
h1e_ao = mol.intor("int1e_kin") + mol.intor("int1e_nuc")
## Ways to convert from ao to mo
h1e=reduce(np.dot, (myhf.mo_coeff.conj().T, h1e_ao, myhf.mo_coeff))

# Compute the 2e integrals then convert to HF basis
h2e_ao = mol.intor("int2e_sph", aosym='1')
h2e=ao2mo.incore.full(h2e_ao, myhf.mo_coeff)

# Reorder the chemist notation (pq|rs) ERI h_pqrs to h_prqs
# to "generate_hamiltonian" in openfermion 
h2e=h2e.transpose(0,2,3,1)

nuclear_repulsion = myhf.energy_nuc()

print('h1e_shape ', h1e.shape, '\n')
print('h2e_shape ', h2e.shape, '\n')

mol_ham=generate_hamiltonian(h1e,h2e,nuclear_repulsion)

ham_operator = jordan_wigner(mol_ham)

spin_ham=cudaq.SpinOperator(ham_operator)

# We will be optimizing over a custom objective function that takes a vector
# of parameters as input and returns either the cost as a single float,
# or in a tuple of (cost, gradient_vector) depending on the optimizer used.

# In this case, we will use the spin Hamiltonian and ansatz from `simple_vqe.py`
# and find the `thetas` that minimize the expectation value of the system.
hamiltonian = spin_ham
qubits_num=2*norb
cudaq.set_target("nvidia")


kernel, thetas = cudaq.make_kernel(list)
qubits = kernel.qalloc(qubits_num)

for i in range(nelec):
    kernel.x(qubits[i])
cudaq.kernels.uccsd(kernel, qubits, thetas, nelec, qubits_num)
parameter_count = cudaq.kernels.uccsd_num_parameters(nelec,qubits_num)

init_params,tot_params=init_param_CCSD(qubits_num,nelec,mycc.t1,mycc.t2)

# Define the optimizer that we'd like to use.
optimizer = cudaq.optimizers.Adam()
optimizer.max_iterations = 1
# optimizer = cudaq.optimizers.COBYLA()
optimizer.initial_parameters=init_params

# Since we'll be using a gradient-based optimizer, we can leverage
# CUDA Quantum's gradient helper class to automatically compute the gradient
# vector for us. The use of this class for gradient calculations is
# purely optional and can be replaced with your own custom gradient
# routine.
gradient = cudaq.gradients.CentralDifference()


def objective_function(parameter_vector: List[float],
                       hamiltonian=hamiltonian,
                       gradient_strategy=gradient,
                       kernel=kernel) -> Tuple[float, List[float]]:
    """
    Note: the objective function may also take extra arguments, provided they
    are passed into the function as default arguments in python.
    """

    # Call `cudaq.observe` on the spin operator and ansatz at the
    # optimizer provided parameters. This will allow us to easily
    # extract the expectation value of the entire system in the
    # z-basis.

    # We define the call to `cudaq.observe` here as a lambda to
    # allow it to be passed into the gradient strategy as a
    # function. If you were using a gradient-free optimizer,
    # you could purely define `cost = cudaq.observe().expectation()`.
    get_result = lambda parameter_vector: cudaq.observe(
        kernel, hamiltonian, parameter_vector, shots_count=100).expectation()
    # `cudaq.observe` returns a `cudaq.ObserveResult` that holds the
    # counts dictionary and the `expectation`.
    cost = get_result(parameter_vector)
    print(f"<H> = {cost}")
    # Compute the gradient vector using `cudaq.gradients.STRATEGY.compute()`.
    gradient_vector = gradient_strategy.compute(parameter_vector, get_result,
                                                cost)

    # Return the (cost, gradient_vector) tuple.
    return cost, gradient_vector


cudaq.set_random_seed(13)  # make repeatable
import time
start = time.time()
energy, parameter = optimizer.optimize(dimensions=1,
                                       function=objective_function)
tot_time = time.time()-start
print(f"time per iteration {tot_time}")

print(f"\nminimized <H> = {round(energy,16)}")
print(f"optimal theta = {round(parameter[0],16)}")

Expected behavior

The code should run without producing an error.

Is this a regression? If it is, put the last known working version (or commit) here.

Not a regression

Environment

CUDA Quantum version: 0.6.0
Python version: 3.10.12
C++ compiler: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Operating system: Host OS Ubuntu 22.04.4 LTS (GNU/Linux 6.2.0-1015-nvidia-64k aarch64)

Suggestions

No response

The text was updated successfully, but these errors were encountered:

bmhowe23 · 2024-03-20T13:40:15Z

Thank you very much for this bug report, @Kenny-Heitritter. We just released version 0.7.0, can you please tell us if the problem is more likely or less likely to occur on 0.7.0? There are no direct fixes for this issue in 0.7.0, but the timing likely changed, so it would be good to know if we should focus our debug efforts on a specific version or not.

A few items to note:

The Docker image isn't quite on our main channel yet, so please use nvcr.io/nvidia/nightly/cuda-quantum:0.7.0 (which includes nightly in the path). This will only be necessary for the next week or so and then you will see it on the main channel.
The Python UCCSD API changed slightly in 0.7.0, so you'll need to apply this change to your test script.

--- test_0.6.0.py       2024-03-20 13:21:03.138949476 +0000
+++ test_0.7.0.py       2024-03-20 13:31:09.739183293 +0000
@@ -128,7 +128,7 @@

 for i in range(nelec):
     kernel.x(qubits[i])
-cudaq.kernels.uccsd(kernel, qubits, thetas, nelec, qubits_num)
+kernel.apply_call(cudaq.kernels.uccsd, qubits, thetas, nelec, qubits_num)
 parameter_count = cudaq.kernels.uccsd_num_parameters(nelec,qubits_num)

Kenny-Heitritter · 2024-03-20T15:15:41Z

Thanks @bmhowe23! Just tested the same test script, modulo the new UCCSD API shown above, and it does appear the issue is present to the same degree in 0.7.0. Please do let me know if there are any other tests I can run which would be helpful.

bmhowe23 · 2024-04-01T19:18:21Z

@Kenny-Heitritter I am still trying to reproduce the issue on servers that I have access to (unsuccessfully thus far), but if you would like to try ~~https://github.com/NVIDIA/cuda-quantum/pkgs/container/cuda-quantum-dev/196615788?tag=pr-1444-base~~ [edit: see new link in comment below] on your system, please feel free. This is a "cuda-quantum-dev" image, so it will slightly different than a "cuda-quantum" image, but I think you should be able to run any C++/Python examples that you place in the container, just like normal. One notable difference is that the binaries are installed in /usr/local/cudaq instead of /opt/nvidia/cudaq. Hopefully that doesn't matter to you.

bmhowe23 · 2024-04-05T18:31:36Z

@Kenny-Heitritter I am still trying to reproduce the issue on servers that I have access to (unsuccessfully thus far), but if you would like to try https://github.com/NVIDIA/cuda-quantum/pkgs/container/cuda-quantum-dev/196615788?tag=pr-1444-base on your system, please feel free. This is a "cuda-quantum-dev" image, so it will slightly different than a "cuda-quantum" image, but I think you should be able to run any C++/Python examples that you place in the container, just like normal. One notable difference is that the binaries are installed in /usr/local/cudaq instead of /opt/nvidia/cudaq. Hopefully that doesn't matter to you.

The old link expired, so here is a new one: https://github.com/NVIDIA/cuda-quantum/pkgs/container/cuda-quantum-dev/200241787?tag=pr-1444-base

bmhowe23 · 2024-06-27T02:15:43Z

@Kenny-Heitritter We've seen some positive results from this image and will likely include the change in this image in our next release. Feel free to test it out if you'd like: https://github.com/NVIDIA/cuda-quantum/pkgs/container/cuda-quantum-dev/235623747?tag=pr-1444-base.

(Thanks @jfriel-oqc!)

bebora · 2024-10-21T09:54:10Z

Hi @bmhowe23, I can confirm that the issue is resolved when upgrading to a 0.8.0 container. However, I still face this issue on a GH200 Grace Hopper when installing CUDA-Q through Conda.

Basic installing steps for a single node:

name="cq_arm"
conda create -y -n $name -c conda-forge python=3.10 pip
conda install -y -n $name -c "nvidia/label/cuda-11.8.0" cuda
conda install -y -n $name -c conda-forge mpi4py openmpi cxx-compiler
conda run -n $name pip install cuda-quantum
conda activate $name
conda env config vars set -n $name LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
conda deactivate
conda activate $name

Env info:

$ pip freeze
astpretty==3.0.0
certifi==2024.8.30
charset-normalizer==3.4.0
cuda-quantum==0.8.0
cuquantum-cu11==24.8.0
custatevec-cu11==1.6.0.post1
cutensor-cu11==2.0.2
cutensornet-cu11==2.5.0
graphlib_backport==1.1.0
idna==3.10
mpi4py @ file:///home/conda/feedstock_root/build_artifacts/mpi4py_1728738708331/work
numpy==2.1.2
requests==2.32.3
urllib3==2.2.3

Example program source:

#!/usr/bin/env python3
import sys
import cudaq

print(f"Running on target {cudaq.get_target().simulator}")
qubit_count = int(sys.argv[1]) if 1 < len(sys.argv) else 2


@cudaq.kernel
def kernel():
    qubits = cudaq.qvector(qubit_count)
    h(qubits[0])
    for i in range(0, qubit_count-1):
        x.ctrl(qubits[i], qubits[i+1])
    mz(qubits)


result = cudaq.sample(kernel)
if (not cudaq.mpi.is_initialized()) or (cudaq.mpi.rank() == 0):
    print(result) # Example: { 11:500 00:500 }

The crash probability is about 10%. Experimental data suggest no correlation between the number of simulated qubits and the chance of crashing.

Kenny-Heitritter · 2024-12-02T21:10:55Z

@bmhowe23 This problem seems to be making a comeback. As of testing with cudaq version 0.9.0, I am seeing this error come up with a relatively high frequency on our GH200.

Environment
CUDA Quantum version: 0.9.0
Python version: 3.11.7
C++ compiler: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Operating system: Host OS Ubuntu 22.04.4 LTS (GNU/Linux 6.2.0-1015-nvidia-64k aarch64)

bmhowe23 · 2024-12-02T22:51:59Z

@Kenny-Heitritter - thanks for the information. Are you using the Docker image or the Python wheels?

bebora · 2024-12-04T15:53:13Z

@bmhowe23 I will chime in since I am affected by this problem as well. It only happens when using Python wheels. The crash occurs very frequently, more often than I had previously reported.

I am using the same example program (ghz.py) that I reported before.

I am testing with the following script:

#!/usr/bin/env bash
declare -A exit_codes=()

for round in {1..100}
do
    python ghz.py 2 --target nvidia > /dev/null 2>&1
    exit_code=$?

    if [[ -v exit_codes[$exit_code] ]]; then
        ((exit_codes[$exit_code]++))
    else
        exit_codes[$exit_code]=1
    fi

done

echo -e "OK: ${exit_codes[0]}\nKO: ${exit_codes[134]}"

I ran my script using three approaches:

Conda and PyPI wheel: conda create -n gh200bug python=3.10.12, conda activate gh200bug and then pip install cudaq:
OK: 32
KO: 68
Basic CUDA image and PyPI wheel: pull nvcr.io/nvidia/cuda:12.6.3-runtime-ubuntu22.04, run it, apt install python3 python3.10-venv, python3 -m venv .venv, source .venv/bin/activate and pip install cudaq:
OK: 35
KO: 65
CUDA-Q image: pull nvcr.io/nvidia/quantum/cuda-quantum:cu12-0.9.0, run it:
OK: 100
KO:

This PR fixes an LLVM discrepancy in our Docker images vs our Python wheels. This should fix #1421 and #1799 for our Python wheels. (The Docker images were already correct.)

bmhowe23 · 2025-01-14T00:12:30Z

@bebora, @Kenny-Heitritter - thanks again for bringing this to our attention. The issue should be resolved w/ #2504. We will likely go through a full release that includes this PR in a few weeks.

This was referenced Mar 20, 2024

GPU memory leak in nvidia target #1423

Closed

Fix LLVM aarch64 relocation overflow #1444

Merged

bmhowe23 added the bug Something isn't working label Mar 28, 2024

bettinaheim assigned bmhowe23 Apr 2, 2024

bmhowe23 mentioned this issue Jun 27, 2024

Implement reserveAllocationSpace for SectionMemoryManager llvm/llvm-project#71968

Open

bettinaheim added this to the release 0.8.0 milestone Jul 1, 2024

bmhowe23 closed this as completed in #1444 Jul 4, 2024

bmhowe23 mentioned this issue Jan 12, 2025

Apply LLVM customizations for Python wheel build #2504

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM aarch64 relocation overflow #1421

LLVM aarch64 relocation overflow #1421

Kenny-Heitritter commented Mar 20, 2024

bmhowe23 commented Mar 20, 2024

Kenny-Heitritter commented Mar 20, 2024

bmhowe23 commented Apr 1, 2024 •

edited

Loading

bmhowe23 commented Apr 5, 2024

bmhowe23 commented Jun 27, 2024

bebora commented Oct 21, 2024

Kenny-Heitritter commented Dec 2, 2024

bmhowe23 commented Dec 2, 2024

bebora commented Dec 4, 2024 •

edited

Loading

bmhowe23 commented Jan 14, 2025

LLVM aarch64 relocation overflow #1421

LLVM aarch64 relocation overflow #1421

Comments

Kenny-Heitritter commented Mar 20, 2024

Required prerequisites

Describe the bug

Steps to reproduce the bug

Expected behavior

Is this a regression? If it is, put the last known working version (or commit) here.

Environment

Suggestions

bmhowe23 commented Mar 20, 2024

Kenny-Heitritter commented Mar 20, 2024

bmhowe23 commented Apr 1, 2024 • edited Loading

bmhowe23 commented Apr 5, 2024

bmhowe23 commented Jun 27, 2024

bebora commented Oct 21, 2024

Kenny-Heitritter commented Dec 2, 2024

bmhowe23 commented Dec 2, 2024

bebora commented Dec 4, 2024 • edited Loading

bmhowe23 commented Jan 14, 2025

bmhowe23 commented Apr 1, 2024 •

edited

Loading

bebora commented Dec 4, 2024 •

edited

Loading