Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error: cuda.h: No such file or directory #114

Open
zzy221127 opened this issue Nov 27, 2022 · 7 comments
Open

fatal error: cuda.h: No such file or directory #114

zzy221127 opened this issue Nov 27, 2022 · 7 comments
Labels
installation Compile and install issues

Comments

@zzy221127
Copy link

zzy221127 commented Nov 27, 2022

Dear author:

I try to test Fastfold, after followed the Installation Using Conda, (i think there are no command to test for a successful installation)

I run inference.py with the following code:

#################################
conda activate fastfold
python /home/FastFold/inference.py used.fasta /database/alphafold2-data/pdb_mmcif/mmcif_files/
--output_dir /mydir/output
--cpus 80
--gpus 3
--param_path /database/alphafold2-data/params/params_model_1.npz
--uniref90_database_path /database/alphafold2-data/uniref90/uniref90.fasta
--mgnify_database_path /database/alphafold2-data/mgnify/mgy_clusters_2018_12.fa
--pdb70_database_path /database/alphafold2-data/pdb70/pdb70
--uniclust30_database_path /database/alphafold2-data/uniclust30/uniclust30_2018_08/uniclust30_2018_08
--bfd_database_path /database/alphafold2-data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
--jackhmmer_binary_path /home/Software/miniconda3/envs/fastfold/bin/jackhmmer
--hhblits_binary_path /home/Software/miniconda3/envs/fastfold/bin/hhblits
--hhsearch_binary_path /home/Software/miniconda3/envs/fastfold/bin/hhsearch
--kalign_binary_path /home/Software/miniconda3/envs/fastfold/bin/kalign
#################################

It seems right at the jackhmmer→hhsearch→jackhmmer→hhblits steps

then I meet error print as follow:

I woundering what they hints and what should i do to run fastfold properly?

##########error message##################

/tmp/tmp4wm30exa/main.c:2:10: fatal error: cuda.h: No such file or directory
2 | #include "cuda.h"
| ^~~~~~~~
/tmp/tmp65558a3s/main.c:2:10: fatal error: cuda.h: No such file or directory
2 | #include "cuda.h"
| ^~~~~~~~
compilation terminated.
compilation terminated.
Traceback (most recent call last):
File "/home/FastFold/inference.py", line 513, in
main(args)
File "/home/FastFold/inference.py", line 150, in main
inference_monomer_model(args)
File "/home/FastFold/inference.py", line 415, in inference_monomer_model
torch.multiprocessing.spawn(inference_model, nprocs=args.gpus, args=(args.gpus, result_q, batch, args))
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "", line 21, in _layer_norm_fwd_fused
KeyError: ('2-.-0-.-0-d82511111ad128294e9d31a6ac684238-7929002797455b30efce6e41eddc6b57-3aa563e00c5c695dd945e23b09a86848-bb0203f280ee2aaa28bc6e4eff4090f3-ff946bd4b3b4a4cbdf8cedc6e1c658e0-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, 'i32', 'i32', 'fp32'), (256,), (True, True, True, True, True, True, (True, False), (True, False), (False,)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/FastFold/inference.py", line 135, in inference_model
out = model(batch)
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/FastFold/fastfold/model/hub/alphafold.py", line 507, in forward
outputs, m_1_prev, z_prev, x_prev = self.iteration(
File "/home/FastFold/fastfold/model/hub/alphafold.py", line 232, in iteration
m_1_prev, z_prev = self.recycling_embedder(
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/FastFold/fastfold/model/fastnn/ops.py", line 1097, in forward
m_update = self.layer_norm_m(m)
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/FastFold/fastfold/model/fastnn/kernel/layer_norm.py", line 52, in forward
return self.kernel_forward(input)
File "/home/FastFold/fastfold/model/fastnn/kernel/layer_norm.py", line 56, in kernel_forward
return LayerNormTritonFunc.apply(input, self.normalized_shape, self.weight, self.bias,
File "/home/FastFold/fastfold/model/fastnn/kernel/triton/layer_norm.py", line 164, in forward
_layer_norm_fwd_fused[(M,)](
File "/home/triton/python/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "", line 41, in _layer_norm_fwd_fused
File "/home/triton/python/triton/compiler.py", line 1239, in compile
so = _build(fn.name, src_path, tmpdir)
File "/home/triton/python/triton/compiler.py", line 1169, in _build
ret = subprocess.check_call(cc_cmd)
File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp65558a3s/main.c', '-O3', '-I/usr/local/cuda/include', '-I/home/Software/miniconda3/envs/fastfold/include/python3.8', '-I/tmp/tmp65558a3s', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmp65558a3s/_layer_norm_fwd_fused.cpython-38-x86_64-linux-gnu.so', '-L/usr/lib/x86_64-linux-gnu']' returned non-zero exit status 1.

@Shenggan
Copy link
Contributor

Could you please check for your cuda environment, suppose you should have your nvcc compiler.

nvcc -V

If you do not have cuda compiler. conda environment maybe only contain cuda runtime. So you can choose to install fully CUDA environment from NVIDIA website or you can try to install development environment in conda.

@Shenggan Shenggan added the installation Compile and install issues label Nov 27, 2022
@zzy221127
Copy link
Author

thankyou, below is what 'nvcc -V' shows, it seems the cuda compiler is already in

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

@Shenggan
Copy link
Contributor

Ok, could you please provide your cuda path with which nvcc, and the way you install triton.

The simple way is to uninstall triton, and the code will fallback to cuda kernel.

@zzy221127
Copy link
Author

zzy221127 commented Nov 28, 2022

thankyou so much! After your kindly remind, it find out to be the installion problem with triton.

I first install triton with command:

pip install triton==2.0.0.dev20221005

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting triton==2.0.0.dev20221005
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/11/f3/db2d366485b3160419f8415e0293aac6daaa018d7a02b9c0a40f89a137bf/triton-2.0.0.dev20221005-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.7 MB)
Requirement already satisfied: torch in /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages (from triton==2.0.0.dev20221005) (1.13.0+cu117)
Requirement already satisfied: filelock in /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages (from triton==2.0.0.dev20221005) (3.8.0)
Requirement already satisfied: cmake in /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages (from triton==2.0.0.dev20221005) (3.24.3)
Requirement already satisfied: typing-extensions in /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages (from torch->triton==2.0.0.dev20221005) (4.4.0)
Installing collected packages: triton
Successfully installed triton-2.0.0.dev20221005

I seems ok.

then, I used the following command to install triton again.

git clone https://github.com/openai/triton.git ~/triton \
 && cd ~/triton/python \
 && pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple --default-timeout=10000000

and got the error message below, do you have any suggestions for this?

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///home/triton/python
  Preparing metadata (setup.py) ... done
Requirement already satisfied: cmake in /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages (from triton==2.0.0) (3.24.3)
Requirement already satisfied: filelock in /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages (from triton==2.0.0) (3.8.0)
Requirement already satisfied: torch in /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages (from triton==2.0.0) (1.13.0+cu117)
Requirement already satisfied: typing-extensions in /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages (from torch->triton==2.0.0) (4.4.0)
Installing collected packages: triton
  Attempting uninstall: triton
    Found existing installation: triton 2.0.0
    Uninstalling triton-2.0.0:
      Successfully uninstalled triton-2.0.0
  Running setup.py develop for triton
    error: subprocess-exited-with-error
    
    × python setup.py develop did not run successfully.
    │ exit code: 1
    ╰─> [59 lines of output]
        running develop
        running egg_info
        writing triton.egg-info/PKG-INFO
        writing dependency_links to triton.egg-info/dependency_links.txt
        writing requirements to triton.egg-info/requires.txt
        writing top-level names to triton.egg-info/top_level.txt
        reading manifest file 'triton.egg-info/SOURCES.txt'
        reading manifest template 'MANIFEST.in'
        writing manifest file 'triton.egg-info/SOURCES.txt'
        running build_ext
        /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
          warnings.warn(
        /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
          warnings.warn(
        Traceback (most recent call last):
          File "<string>", line 2, in <module>
          File "<pip-setuptools-caller>", line 34, in <module>
          File "/home/triton/python/setup.py", line 152, in <module>
            setup(
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/__init__.py", line 87, in setup
            return distutils.core.setup(**attrs)
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
            return run_commands(dist)
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
            dist.run_commands()
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
            self.run_command(cmd)
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/dist.py", line 1217, in run_command
            super().run_command(command)
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
            cmd_obj.run()
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
            self.install_for_development()
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/command/develop.py", line 114, in install_for_development
            self.run_command('build_ext')
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
            self.distribution.run_command(command)
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/dist.py", line 1217, in run_command
            super().run_command(command)
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
            cmd_obj.run()
          File "/home/triton/python/setup.py", line 114, in run
            self.build_extension(ext)
          File "/home/triton/python/setup.py", line 118, in build_extension
            thirdparty_cmake_args = get_thirdparty_packages(triton_cache_path)
          File "/home/triton/python/setup.py", line 74, in get_thirdparty_packages
            file.extractall(path=package_root_dir)
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/tarfile.py", line 2028, in extractall
            self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/tarfile.py", line 2069, in extract
            self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/tarfile.py", line 2141, in _extract_member
            self.makefile(tarinfo, targetpath)
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/tarfile.py", line 2190, in makefile
            copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
          File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/tarfile.py", line 249, in copyfileobj
            raise exception("unexpected end of data")
        tarfile.ReadError: unexpected end of data
        downloading and extracting https://github.com/llvm/llvm-project/releases/download/llvmorg-15.0.4/clang+llvm-15.0.4-powerpc64le-linux-ubuntu-18.04.5.tar.xz ...
        [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
  Rolling back uninstall of triton
  Moving to /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/triton.egg-link
   from /tmp/pip-uninstall-q6f21a3r/triton.egg-link
error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [59 lines of output]
    running develop
    running egg_info
    writing triton.egg-info/PKG-INFO
    writing dependency_links to triton.egg-info/dependency_links.txt
    writing requirements to triton.egg-info/requires.txt
    writing top-level names to triton.egg-info/top_level.txt
    reading manifest file 'triton.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'triton.egg-info/SOURCES.txt'
    running build_ext
    /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    /home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    Traceback (most recent call last):
      File "<string>", line 2, in <module>
      File "<pip-setuptools-caller>", line 34, in <module>
      File "/home/triton/python/setup.py", line 152, in <module>
        setup(
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/__init__.py", line 87, in setup
        return distutils.core.setup(**attrs)
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
        return run_commands(dist)
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
        dist.run_commands()
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
        self.run_command(cmd)
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/dist.py", line 1217, in run_command
        super().run_command(command)
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
        cmd_obj.run()
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
        self.install_for_development()
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/command/develop.py", line 114, in install_for_development
        self.run_command('build_ext')
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
        self.distribution.run_command(command)
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/dist.py", line 1217, in run_command
        super().run_command(command)
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
        cmd_obj.run()
      File "/home/triton/python/setup.py", line 114, in run
        self.build_extension(ext)
      File "/home/triton/python/setup.py", line 118, in build_extension
        thirdparty_cmake_args = get_thirdparty_packages(triton_cache_path)
      File "/home/triton/python/setup.py", line 74, in get_thirdparty_packages
        file.extractall(path=package_root_dir)
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/tarfile.py", line 2028, in extractall
        self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/tarfile.py", line 2069, in extract
        self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/tarfile.py", line 2141, in _extract_member
        self.makefile(tarinfo, targetpath)
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/tarfile.py", line 2190, in makefile
        copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
      File "/home/Software/miniconda3/envs/fastfold/lib/python3.8/tarfile.py", line 249, in copyfileobj
        raise exception("unexpected end of data")
    tarfile.ReadError: unexpected end of data
    downloading and extracting https://github.com/llvm/llvm-project/releases/download/llvmorg-15.0.4/clang+llvm-15.0.4-powerpc64le-linux-ubuntu-18.04.5.tar.xz ...
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

@Shenggan
Copy link
Contributor

The log shows that maybe the network problem, you can not download the llvm from github. You should use pip install triton==2.0.0.dev20221005 to install specify version triton. The main branch of triton is not stable. If you struggle with triton, just uninstall it and run again.

@zzy221127
Copy link
Author

Dear Shenggan:

by uninstall triton, I successful run out the inference.py scripts with no error print.

the out put is one relaxed.pdb, one unrelaxed.pbd, with one " alignments" folder , right?

Although I definitely feel much faster than runing alphafold2,

but i woundering without triton, am i " leverage the power of FastFold" ?

@Shenggan
Copy link
Contributor

The expected output file is correct.

You can already get great acceleration with the cuda kernel when triton is not installed. Triton kernel is currently experimental. It can have some acceleration effect on NVIDIA Ampere platform (maybe 10%~20%).

I think you can try to use triton==2.0.0.dev20221005 and figure out why it can not find cuda.h. I think you can try to set environment variables CUDA_HOME to your cuda path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation Compile and install issues
Projects
None yet
Development

No branches or pull requests

2 participants