Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM aarch64 relocation overflow #1421

Closed
3 of 4 tasks
Kenny-Heitritter opened this issue Mar 20, 2024 · 10 comments · Fixed by #1444 or #2504
Closed
3 of 4 tasks

LLVM aarch64 relocation overflow #1421

Kenny-Heitritter opened this issue Mar 20, 2024 · 10 comments · Fixed by #1444 or #2504
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Kenny-Heitritter
Copy link

Required prerequisites

  • Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
  • Make sure you've read the documentation. Your issue may be addressed there.
  • Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
  • If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

When running VQEs requiring larger amounts of memory from within the CUDA Quantum docker container (v0.6.0) on NVIDIA GH200, there is an increasing chance of getting the following error:

python: /llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:514: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.

Steps to reproduce the bug

# The following has been adapted from Marwa Farag's code at https://github.com/marwafar/QChem-cudaq/blob/main/LiH-full-space/Full-space-cudaq.py
# To reproduce the error, run this code from within the CUDA Quantum v0.6.0 container (docker run --rm -it --gpus=all nvcr.io/nvidia/cuda-quantum:0.6.0)

import cudaq
from cudaq import spin
from pyscf import gto, scf, mp, mcscf, fci, cc
from pyscf import ao2mo
from pyscf.tools import molden
from functools import reduce
import numpy as np

from openfermion import generate_hamiltonian
from openfermion.transforms import jordan_wigner

from typing import List, Tuple

def init_param_CCSD(qubits_num,nele_cas,t1,t2):
    
    sz=np.empty(qubits_num)

    for i in range(qubits_num):
        if i%2 == 0:
            sz[i]=0.5
        else:
            sz[i]=-0.5

# thetas for single excitation
    thetas_1=[]
# theta for double excitation
    thetas_2=[]

    tot_params=0
    nmo_occ=nele_cas//2

    for p_occ in range(nele_cas):
        for r_vir in range(nele_cas,qubits_num):
            if (sz[r_vir]-sz[p_occ]==0):
                thetas_1.append(t1[p_occ//2,r_vir//2-nmo_occ])
                tot_params+=1


    for p_occ in range(nele_cas-1):
        for q_occ in range(p_occ+1,nele_cas):
            for r_vir in range(nele_cas,qubits_num-1):
                for s_vir in range(r_vir+1,qubits_num):
                    if (sz[r_vir]+sz[s_vir]-sz[p_occ]-sz[q_occ])==0:
                        thetas_2.append(t2[p_occ//2,q_occ//2,r_vir//2-nmo_occ,s_vir//2-nmo_occ])
                        tot_params+=1


    init_params=np.concatenate((thetas_2,thetas_1), axis=0)
    return init_params,tot_params


mol=gto.M(
    atom='Li 0.0 0.0 0.0; H 0.0 0.0 1.5',
    spin=0,
    charge=0,
    basis='6-31G',
    output='LiH'+'.out',
    verbose=4
)

## 1- Classical preprocessing

print('\n')
print('Beginning of classical preprocessing', '\n')
print ('Energies from classical simulations','\n')

##################################
# Mean field (HF)
##################################
myhf=scf.RHF(mol)
myhf.max_cycle=100
myhf.kernel()

nelec = mol.nelectron
print('Total number of electrons= ', nelec, '\n')
norb = myhf.mo_coeff.shape[1]
print('Total number of orbitals= ', norb, '\n')

print('RHF energy= ', myhf.e_tot, '\n')

mycc=cc.CCSD(myhf).run()
print('Total CCSD energy= ', mycc.e_tot, '\n')

myfci=fci.FCI(myhf)
result= myfci.kernel()
print('FCI energy= ', result[0], '\n')

# Compute the 1e integral in atomic orbital then convert to HF basis
h1e_ao = mol.intor("int1e_kin") + mol.intor("int1e_nuc")
## Ways to convert from ao to mo
h1e=reduce(np.dot, (myhf.mo_coeff.conj().T, h1e_ao, myhf.mo_coeff))

# Compute the 2e integrals then convert to HF basis
h2e_ao = mol.intor("int2e_sph", aosym='1')
h2e=ao2mo.incore.full(h2e_ao, myhf.mo_coeff)

# Reorder the chemist notation (pq|rs) ERI h_pqrs to h_prqs
# to "generate_hamiltonian" in openfermion 
h2e=h2e.transpose(0,2,3,1)

nuclear_repulsion = myhf.energy_nuc()

print('h1e_shape ', h1e.shape, '\n')
print('h2e_shape ', h2e.shape, '\n')

mol_ham=generate_hamiltonian(h1e,h2e,nuclear_repulsion)

ham_operator = jordan_wigner(mol_ham)

spin_ham=cudaq.SpinOperator(ham_operator)

# We will be optimizing over a custom objective function that takes a vector
# of parameters as input and returns either the cost as a single float,
# or in a tuple of (cost, gradient_vector) depending on the optimizer used.

# In this case, we will use the spin Hamiltonian and ansatz from `simple_vqe.py`
# and find the `thetas` that minimize the expectation value of the system.
hamiltonian = spin_ham
qubits_num=2*norb
cudaq.set_target("nvidia")


kernel, thetas = cudaq.make_kernel(list)
qubits = kernel.qalloc(qubits_num)

for i in range(nelec):
    kernel.x(qubits[i])
cudaq.kernels.uccsd(kernel, qubits, thetas, nelec, qubits_num)
parameter_count = cudaq.kernels.uccsd_num_parameters(nelec,qubits_num)

init_params,tot_params=init_param_CCSD(qubits_num,nelec,mycc.t1,mycc.t2)

# Define the optimizer that we'd like to use.
optimizer = cudaq.optimizers.Adam()
optimizer.max_iterations = 1
# optimizer = cudaq.optimizers.COBYLA()
optimizer.initial_parameters=init_params

# Since we'll be using a gradient-based optimizer, we can leverage
# CUDA Quantum's gradient helper class to automatically compute the gradient
# vector for us. The use of this class for gradient calculations is
# purely optional and can be replaced with your own custom gradient
# routine.
gradient = cudaq.gradients.CentralDifference()


def objective_function(parameter_vector: List[float],
                       hamiltonian=hamiltonian,
                       gradient_strategy=gradient,
                       kernel=kernel) -> Tuple[float, List[float]]:
    """
    Note: the objective function may also take extra arguments, provided they
    are passed into the function as default arguments in python.
    """

    # Call `cudaq.observe` on the spin operator and ansatz at the
    # optimizer provided parameters. This will allow us to easily
    # extract the expectation value of the entire system in the
    # z-basis.

    # We define the call to `cudaq.observe` here as a lambda to
    # allow it to be passed into the gradient strategy as a
    # function. If you were using a gradient-free optimizer,
    # you could purely define `cost = cudaq.observe().expectation()`.
    get_result = lambda parameter_vector: cudaq.observe(
        kernel, hamiltonian, parameter_vector, shots_count=100).expectation()
    # `cudaq.observe` returns a `cudaq.ObserveResult` that holds the
    # counts dictionary and the `expectation`.
    cost = get_result(parameter_vector)
    print(f"<H> = {cost}")
    # Compute the gradient vector using `cudaq.gradients.STRATEGY.compute()`.
    gradient_vector = gradient_strategy.compute(parameter_vector, get_result,
                                                cost)

    # Return the (cost, gradient_vector) tuple.
    return cost, gradient_vector


cudaq.set_random_seed(13)  # make repeatable
import time
start = time.time()
energy, parameter = optimizer.optimize(dimensions=1,
                                       function=objective_function)
tot_time = time.time()-start
print(f"time per iteration {tot_time}")

print(f"\nminimized <H> = {round(energy,16)}")
print(f"optimal theta = {round(parameter[0],16)}")

Expected behavior

The code should run without producing an error.

Is this a regression? If it is, put the last known working version (or commit) here.

Not a regression

Environment

  • CUDA Quantum version: 0.6.0
  • Python version: 3.10.12
  • C++ compiler: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
  • Operating system: Host OS Ubuntu 22.04.4 LTS (GNU/Linux 6.2.0-1015-nvidia-64k aarch64)

Suggestions

No response

@bmhowe23
Copy link
Collaborator

Thank you very much for this bug report, @Kenny-Heitritter. We just released version 0.7.0, can you please tell us if the problem is more likely or less likely to occur on 0.7.0? There are no direct fixes for this issue in 0.7.0, but the timing likely changed, so it would be good to know if we should focus our debug efforts on a specific version or not.

A few items to note:

  1. The Docker image isn't quite on our main channel yet, so please use nvcr.io/nvidia/nightly/cuda-quantum:0.7.0 (which includes nightly in the path). This will only be necessary for the next week or so and then you will see it on the main channel.

  2. The Python UCCSD API changed slightly in 0.7.0, so you'll need to apply this change to your test script.

--- test_0.6.0.py       2024-03-20 13:21:03.138949476 +0000
+++ test_0.7.0.py       2024-03-20 13:31:09.739183293 +0000
@@ -128,7 +128,7 @@

 for i in range(nelec):
     kernel.x(qubits[i])
-cudaq.kernels.uccsd(kernel, qubits, thetas, nelec, qubits_num)
+kernel.apply_call(cudaq.kernels.uccsd, qubits, thetas, nelec, qubits_num)
 parameter_count = cudaq.kernels.uccsd_num_parameters(nelec,qubits_num)

@Kenny-Heitritter
Copy link
Author

Thanks @bmhowe23! Just tested the same test script, modulo the new UCCSD API shown above, and it does appear the issue is present to the same degree in 0.7.0. Please do let me know if there are any other tests I can run which would be helpful.

@bmhowe23 bmhowe23 added the bug Something isn't working label Mar 28, 2024
@bmhowe23
Copy link
Collaborator

bmhowe23 commented Apr 1, 2024

@Kenny-Heitritter I am still trying to reproduce the issue on servers that I have access to (unsuccessfully thus far), but if you would like to try https://github.com/NVIDIA/cuda-quantum/pkgs/container/cuda-quantum-dev/196615788?tag=pr-1444-base [edit: see new link in comment below] on your system, please feel free. This is a "cuda-quantum-dev" image, so it will slightly different than a "cuda-quantum" image, but I think you should be able to run any C++/Python examples that you place in the container, just like normal. One notable difference is that the binaries are installed in /usr/local/cudaq instead of /opt/nvidia/cudaq. Hopefully that doesn't matter to you.

@bmhowe23
Copy link
Collaborator

bmhowe23 commented Apr 5, 2024

@Kenny-Heitritter I am still trying to reproduce the issue on servers that I have access to (unsuccessfully thus far), but if you would like to try https://github.com/NVIDIA/cuda-quantum/pkgs/container/cuda-quantum-dev/196615788?tag=pr-1444-base on your system, please feel free. This is a "cuda-quantum-dev" image, so it will slightly different than a "cuda-quantum" image, but I think you should be able to run any C++/Python examples that you place in the container, just like normal. One notable difference is that the binaries are installed in /usr/local/cudaq instead of /opt/nvidia/cudaq. Hopefully that doesn't matter to you.

The old link expired, so here is a new one: https://github.com/NVIDIA/cuda-quantum/pkgs/container/cuda-quantum-dev/200241787?tag=pr-1444-base

@bmhowe23
Copy link
Collaborator

@Kenny-Heitritter We've seen some positive results from this image and will likely include the change in this image in our next release. Feel free to test it out if you'd like: https://github.com/NVIDIA/cuda-quantum/pkgs/container/cuda-quantum-dev/235623747?tag=pr-1444-base.

(Thanks @jfriel-oqc!)

@bebora
Copy link
Contributor

bebora commented Oct 21, 2024

Hi @bmhowe23, I can confirm that the issue is resolved when upgrading to a 0.8.0 container. However, I still face this issue on a GH200 Grace Hopper when installing CUDA-Q through Conda.

Basic installing steps for a single node:

name="cq_arm"
conda create -y -n $name -c conda-forge python=3.10 pip
conda install -y -n $name -c "nvidia/label/cuda-11.8.0" cuda
conda install -y -n $name -c conda-forge mpi4py openmpi cxx-compiler
conda run -n $name pip install cuda-quantum
conda activate $name
conda env config vars set -n $name LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
conda deactivate
conda activate $name

Env info:

$ pip freeze
astpretty==3.0.0
certifi==2024.8.30
charset-normalizer==3.4.0
cuda-quantum==0.8.0
cuquantum-cu11==24.8.0
custatevec-cu11==1.6.0.post1
cutensor-cu11==2.0.2
cutensornet-cu11==2.5.0
graphlib_backport==1.1.0
idna==3.10
mpi4py @ file:///home/conda/feedstock_root/build_artifacts/mpi4py_1728738708331/work
numpy==2.1.2
requests==2.32.3
urllib3==2.2.3

Example program source:

#!/usr/bin/env python3
import sys
import cudaq

print(f"Running on target {cudaq.get_target().simulator}")
qubit_count = int(sys.argv[1]) if 1 < len(sys.argv) else 2


@cudaq.kernel
def kernel():
    qubits = cudaq.qvector(qubit_count)
    h(qubits[0])
    for i in range(0, qubit_count-1):
        x.ctrl(qubits[i], qubits[i+1])
    mz(qubits)


result = cudaq.sample(kernel)
if (not cudaq.mpi.is_initialized()) or (cudaq.mpi.rank() == 0):
    print(result) # Example: { 11:500 00:500 }

The crash probability is about 10%. Experimental data suggest no correlation between the number of simulated qubits and the chance of crashing.

@Kenny-Heitritter
Copy link
Author

@bmhowe23 This problem seems to be making a comeback. As of testing with cudaq version 0.9.0, I am seeing this error come up with a relatively high frequency on our GH200.

Environment
CUDA Quantum version: 0.9.0
Python version: 3.11.7
C++ compiler: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Operating system: Host OS Ubuntu 22.04.4 LTS (GNU/Linux 6.2.0-1015-nvidia-64k aarch64)

@bmhowe23
Copy link
Collaborator

bmhowe23 commented Dec 2, 2024

@Kenny-Heitritter - thanks for the information. Are you using the Docker image or the Python wheels?

@bebora
Copy link
Contributor

bebora commented Dec 4, 2024

@bmhowe23 I will chime in since I am affected by this problem as well. It only happens when using Python wheels. The crash occurs very frequently, more often than I had previously reported.

I am using the same example program (ghz.py) that I reported before.

I am testing with the following script:

#!/usr/bin/env bash
declare -A exit_codes=()

for round in {1..100}
do
    python ghz.py 2 --target nvidia > /dev/null 2>&1
    exit_code=$?

    if [[ -v exit_codes[$exit_code] ]]; then
        ((exit_codes[$exit_code]++))
    else
        exit_codes[$exit_code]=1
    fi

done

echo -e "OK: ${exit_codes[0]}\nKO: ${exit_codes[134]}"

I ran my script using three approaches:

  1. Conda and PyPI wheel: conda create -n gh200bug python=3.10.12, conda activate gh200bug and then pip install cudaq:
    OK: 32
    KO: 68
  2. Basic CUDA image and PyPI wheel: pull nvcr.io/nvidia/cuda:12.6.3-runtime-ubuntu22.04, run it, apt install python3 python3.10-venv, python3 -m venv .venv, source .venv/bin/activate and pip install cudaq:
    OK: 35
    KO: 65
  3. CUDA-Q image: pull nvcr.io/nvidia/quantum/cuda-quantum:cu12-0.9.0, run it:
    OK: 100
    KO:

bmhowe23 added a commit that referenced this issue Jan 13, 2025
This PR fixes an LLVM discrepancy in our Docker images vs our Python wheels. This should fix #1421 and #1799 for our Python wheels. (The Docker images were already correct.)
@bmhowe23
Copy link
Collaborator

@bebora, @Kenny-Heitritter - thanks again for bringing this to our attention. The issue should be resolved w/ #2504. We will likely go through a full release that includes this PR in a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants