Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Error building extension 'slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0' #25

Open
God-YYB opened this issue Jun 19, 2024 · 18 comments

Comments

@God-YYB
Copy link

God-YYB commented Jun 19, 2024

This problem shows after i solved "RuntimeError: Ninja is required to load C++ extensions" by "pip3 install Ninja"

@miaozhixu
Copy link

Would you please paste the log of your output?
I face the same problem on windows 11, it's caused by the CUDA libraries, because Ninja assemble the nvcc command line incorrectly, leading the program cannot find CUDA libraries. I guess it was the space char in path string.
But on ubuntu 22..04, with NVidia driver 535, CUDA 12.1, cudnn 9, pytorch 2.3.1, and Ninja installed, it has no problem.
PS: the new version of xlstm 1.0.4 has something wrong in slstm layer src, you should try 1.0.3 on Ubuntu.

@Marco-Nguyen
Copy link

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

@miaozhixu
Copy link

miaozhixu commented Jun 19, 2024

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Bring up a fresh Ubuntu 22.04.4 installation. It comes with a NVidia GPU driver 535.
Follow the document on nivida.com to install CUDA 12.1 and cudnn 9.
Install pytorch 2.3.1 with CUDA support.
Use pip to install xlstm, I recommend you install xlstm v1.0.3.
I try 1.0.4 yestoday, the return_last_state parameter lead to an error.

But somebody say that this issue could solve with "conda install cccl". Check the link below.
#19 (comment)

@Marco-Nguyen
Copy link

So you installed xlstm via pip, not by cloning the repo, right?

@miaozhixu
Copy link

So you installed xlstm via pip, not by cloning the repo, right?

Yep

@yongyin-ma
Copy link

yongyin-ma commented Jun 20, 2024

check your log file, there should be few Fatal error or says some file doesn't exist. I have the same error under Win 10 then I tried linux, still has the problem. I thought it might be some root or environment problem. ECHO your $PATH and $LD_LIBRARY_PATH see if you have cuda path, if not, following my step below. Remember to checked your cuda installation location, once you have find your cuda installation location then use
"""
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
"""
Remember to change the path with your own one
This will solve the problem. This one works on me. The other solution "pip install cccl" doesn't works in my situation.

Also, under Readme author told us using "python experiments/main.py --config experiments/parity_xLSTM01.yaml" which I have got no such a file error, there is no upper case in the name of the yaml file

And you should use pip install xlstm=1.0.3, there are other bugs in 1.0,4

@God-YYB
Copy link
Author

God-YYB commented Jun 21, 2024

check your log file, there should be few Fatal error or says some file doesn't exist. I have the same error under Win 10 then I tried linux, still has the problem. I thought it might be some root or environment problem. ECHO your $PATH and $LD_LIBRARY_PATH see if you have cuda path, if not, following my step below. Remember to checked your cuda installation location, once you have find your cuda installation location then use """ export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH """ Remember to change the path with your own one This will solve the problem. This one works on me. The other solution "pip install cccl" doesn't works in my situation.

Also, under Readme author told us using "python experiments/main.py --config experiments/parity_xLSTM01.yaml" which I have got no such a file error, there is no upper case in the name of the yaml file

And you should use pip install xlstm=1.0.3, there are other bugs in 1.0,4

The yaml file has a case error that is easy to resolve. I will try the PATH solution you mentioned. Thank you

@God-YYB
Copy link
Author

God-YYB commented Jun 21, 2024

Would you please paste the log of your output? I face the same problem on windows 11, it's caused by the CUDA libraries, because Ninja assemble the nvcc command line incorrectly, leading the program cannot find CUDA libraries. I guess it was the space char in path string. But on ubuntu 22..04, with NVidia driver 535, CUDA 12.1, cudnn 9, pytorch 2.3.1, and Ninja installed, it has no problem. PS: the new version of xlstm 1.0.4 has something wrong in slstm layer src, you should try 1.0.3 on Ubuntu.

Thank you,i will try pip install xlstm v1.0.3. later ,and the version details are very helpful!

@yanpeng0520
Copy link

Hi, I am trying to install the same version as you, but when I install cudnn>9, it return error: torch 2.3.1+cu121 requires nvidia-cudnn-cu12==8.9.2.26, how did you do that?

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Bring up a fresh Ubuntu 22.04.4 installation. It comes with a NVidia GPU driver 535. Follow the document on nivida.com to install CUDA 12.1 and cudnn 9. Install pytorch 2.3.1 with CUDA support. Use pip to install xlstm, I recommend you install xlstm v1.0.3. I try 1.0.4 yestoday, the return_last_state parameter lead to an error.

But somebody say that this issue could solve with "conda install cccl". Check the link below. #19 (comment)

@miaozhixu
Copy link

miaozhixu commented Jun 21, 2024

nvidia-cudnn-cu12==8.9.2.26

I follow the nvidia.com's guide to install cudnn and cuda, after successfully install these two libs, install the pytorch.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install cuda-toolkit-12-1
then add /usr/local/cuda/bin to PATH
sudo apt-get install cudnn9-cuda-12
finally install pytorch:
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
I use anaconda 3, but I think pip will work just fine.

@Atlantis-esh
Copy link

Would you please paste the log of your output? I face the same problem on windows 11, it's caused by the CUDA libraries, because Ninja assemble the nvcc command line incorrectly, leading the program cannot find CUDA libraries. I guess it was the space char in path string. But on ubuntu 22..04, with NVidia driver 535, CUDA 12.1, cudnn 9, pytorch 2.3.1, and Ninja installed, it has no problem. PS: the new version of xlstm 1.0.4 has something wrong in slstm layer src, you should try 1.0.3 on Ubuntu.

Thank you,i will try pip install xlstm v1.0.3. later ,and the version details are very helpful!

How do you all switch to the Ubuntu version? Isn't it normal to use a laboratory server? Isn't the Ubuntu system on one server fixed? Or are you using a virtual machine:(

@Atlantis-esh
Copy link

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Bring up a fresh Ubuntu 22.04.4 installation. It comes with a NVidia GPU driver 535. Follow the document on nivida.com to install CUDA 12.1 and cudnn 9. Install pytorch 2.3.1 with CUDA support. Use pip to install xlstm, I recommend you install xlstm v1.0.3. I try 1.0.4 yestoday, the return_last_state parameter lead to an error.

But somebody say that this issue could solve with "conda install cccl". Check the link below. #19 (comment)
How do you all switch the Ubuntu version to 22.04? Isn't it normal to use a laboratory server? Isn't the Ubuntu system on one server fixed? Or are you all using a virtual machine:(

@Atlantis-esh
Copy link

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Bring up a fresh Ubuntu 22.04.4 installation. It comes with a NVidia GPU driver 535. Follow the document on nivida.com to install CUDA 12.1 and cudnn 9. Install pytorch 2.3.1 with CUDA support. Use pip to install xlstm, I recommend you install xlstm v1.0.3. I try 1.0.4 yestoday, the return_last_state parameter lead to an error.

But somebody say that this issue could solve with "conda install cccl". Check the link below. #19 (comment)

Hello, May I ask you how do you switch the Ubuntu version to 22.04? Isn't it normal to use a laboratory server? Isn't the Ubuntu system on one server fixed? Or are you all using a virtual machine:(

@miaozhixu
Copy link

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Bring up a fresh Ubuntu 22.04.4 installation. It comes with a NVidia GPU driver 535. Follow the document on nivida.com to install CUDA 12.1 and cudnn 9. Install pytorch 2.3.1 with CUDA support. Use pip to install xlstm, I recommend you install xlstm v1.0.3. I try 1.0.4 yestoday, the return_last_state parameter lead to an error.
But somebody say that this issue could solve with "conda install cccl". Check the link below. #19 (comment)

Hello, May I ask you how do you switch the Ubuntu version to 22.04? Isn't it normal to use a laboratory server? Isn't the Ubuntu system on one server fixed? Or are you all using a virtual machine:(

Ubuntu installed on my Laptop, along side with windows 11. This laptop has a RTX5000 GPU.

@leezhien
Copy link

@miaozhixu I have been puzzled by this problem for a long time, may I ask if you have successfully run it? If convenient, can you add the contact information for communication

@miaozhixu
Copy link

@miaozhixu I have been puzzled by this problem for a long time, may I ask if you have successfully run it? If convenient, can you add the contact information for communication

Not switch, but fresh new installation of Ubuntu

@zhonglin-cdut
Copy link

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Has it been resolved now?

@2022LJC
Copy link

2022LJC commented Dec 6, 2024

@miaozhixu Using C:\Users\L.J.Y\AppData\Local\torch_extensions\torch_extensions\Cache\py311_cu124 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file C:\Users\L.J.Y\AppData\Local\torch_extensions\torch_extensions\Cache\py311_cu124\slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0\build.ninja...
D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py:1964: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
Building extension module slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_backward.cuda.o.d -Xcudafe --diag_suppress=dl
l_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=ba
se_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /
wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_IN
CLUDE_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\So
ftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit
\CUDA\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIO
NS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compu
te_80,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4
-DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GAT
ES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -
U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS
-U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_backward.cu -o slstm_backward.cuda.o
FAILED: slstm_backward.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_backward.cuda.o.d -Xcudafe --diag_suppress=dll_inte
rface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_cla
ss_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267
-Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_
EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWare
\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -
D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,
code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLS
TM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -
DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUD
A_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS
-U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_backward.cu -o slstm_backward.cuda.o
nvcc fatal : Unknown option '-Xptxas -O3'
[2/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_backward_cut.cuda.o.d -Xcudafe --diag_suppres
s=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppres
s=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompil
er /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_AP
I_INCLUDE_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID
:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Too
lkit\CUDA\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVE
RSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=c
ompute_80,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STAT
ES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM
GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0
.0 -U__CUDA_NO_HALF_OPERATORS
-U__CUDA_NO_HALF_CONVERSIONS
-U__CUDA_NO_BFLOAT16_OPERATORS
-U__CUDA_NO_BFLOAT16_CONVERSIONS
-U__CUDA_NO_BFLOAT162_OPERATORS
_ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_backward_cut.cu -o slstm_backward_cut.cuda.o
FAILED: slstm_backward_cut.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_backward_cut.cuda.o.d -Xcudafe --diag_suppress=dll_
interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base
class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd
4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCL
UDE_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\Soft
Ware\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\C
UDA\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
_ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS
__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute
80,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -
DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES
=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U

CUDA_NO_HALF_OPERATORS
-U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS
-U__CUDA_NO_BFLOAT162_OPERATORS
_ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_backward_cut.cu -o slstm_backward_cut.cuda.o
nvcc fatal : Unknown option '-Xptxas -O3'
[3/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_pointwise.cuda.o.d -Xcudafe --diag_suppress=d
ll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=b
ase_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler
/wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_I
NCLUDE_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\S
oftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolki
t\CUDA\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSI
ONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=comp
ute_80,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=
4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GA
TES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0
-U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS
-U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_pointwise.cu -o slstm_pointwise.cuda.o
FAILED: slstm_pointwise.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_pointwise.cuda.o.d -Xcudafe --diag_suppress=dll_int
erface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_cl
ass_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd426
7 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE
EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWar
e\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
_ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__
-D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80
,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSL
STM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4
-DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CU
DA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS
-U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_pointwise.cu -o slstm_pointwise.cuda.o
nvcc fatal : Unknown option '-Xptxas -O3'
[4/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output blas.cuda.o.d -Xcudafe --diag_suppress=dll_interfac
e_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_h
as_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xc
ompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTE
NSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWare\ana
conda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.
4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__C
UDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code
=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLSTM_D
TYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLS
TM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO
HALF_OPERATORS
-U__CUDA_NO_HALF_CONVERSIONS
-U__CUDA_NO_BFLOAT16_OPERATORS
-U__CUDA_NO_BFLOAT16_CONVERSIONS
-U__CUDA_NO_BFLOAT162_OPERATORS
_ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\util\blas.cu -o blas.cuda.o
FAILED: blas.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output blas.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conf
lict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_dif
ferent_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompile
r /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_
H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWare\anaconda
envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\incl
ude" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO
HALF2_OPERATORS_ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code=compu
te_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B
=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIM
PLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF

OPERATORS
-U__CUDA_NO_HALF_CONVERSIONS
-U__CUDA_NO_BFLOAT16_OPERATORS
-U__CUDA_NO_BFLOAT16_CONVERSIONS
_ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\util\blas.cu -o blas.cuda.o
nvcc fatal : Unknown option '-Xptxas -O3'
[5/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_forward.cuda.o.d -Xcudafe --diag_suppress=dll
interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=bas
e_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /w
d4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INC
LUDE_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\Sof
tWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit
CUDA\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
_ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSION
S__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=comput
e_80,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4
-DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATE
S=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U
CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS
-U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_forward.cu -o slstm_forward.cuda.o
FAILED: slstm_forward.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_forward.cuda.o.d -Xcudafe --diag_suppress=dll_inter
face_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_clas
s_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267
-Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_E
XTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWare
anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v
12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D
CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,c
ode=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLST
M_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -D
SLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA
NO_HALF_OPERATORS
-U__CUDA_NO_HALF_CONVERSIONS
-U__CUDA_NO_BFLOAT16_OPERATORS
-U__CUDA_NO_BFLOAT16_CONVERSIONS
-U__CUDA_NO_BFLOAT162_OPERATORS
_ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_forward.cu -o slstm_forward.cuda.o
nvcc fatal : Unknown option '-Xptxas -O3'
[6/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output cuda_error.cuda.o.d -Xcudafe --diag_suppress=dll_in
terface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_c
lass_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd42
67 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUD
E_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWa
re\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUD
A\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__
-D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_8
0,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DS
LSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4
-DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__C
UDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS
-U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\util\cuda_error.cu -o cuda_error.cuda.o
FAILED: cuda_error.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output cuda_error.cuda.o.d -Xcudafe --diag_suppress=dll_interfac
e_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_h
as_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xc
ompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTE
NSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWare\ana
conda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.
4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__C
UDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code
=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLSTM_D
TYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLS
TM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO
HALF_OPERATORS
-U__CUDA_NO_HALF_CONVERSIONS
-U__CUDA_NO_BFLOAT16_OPERATORS
-U__CUDA_NO_BFLOAT16_CONVERSIONS
-U__CUDA_NO_BFLOAT162_OPERATORS
_ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\util\cuda_error.cu -o cuda_error.cuda.o
nvcc fatal : Unknown option '-Xptxas -O3'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py", line 2104, in _run_ninja_build
subprocess.run(
File "D:\SoftWare\anaconda\envs\xlstm\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "E:\Project\pycharm\xlstm-main\main.py", line 158, in
main(cfg)
File "E:\Project\pycharm\xlstm-main\main.py", line 54, in main
model = xLSTMLMModel(from_dict(xLSTMLMModelConfig, OmegaConf.to_container(cfg.model))).to(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Project\pycharm\xlstm-main\xlstm\xlstm_lm_model.py", line 29, in init
self.xlstm_block_stack = xLSTMBlockStack(config=config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Project\pycharm\xlstm-main\xlstm\xlstm_block_stack.py", line 84, in init
self.blocks = self._create_blocks(config=config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Project\pycharm\xlstm-main\xlstm\xlstm_block_stack.py", line 105, in _create_blocks
blocks.append(sLSTMBlock(config=config))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\block.py", line 33, in init
super().init(
File "E:\Project\pycharm\xlstm-main\xlstm\blocks\xlstm_block.py", line 63, in init
self.xlstm = sLSTMLayer(config=self.config.slstm)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\layer.py", line 78, in init
self.slstm_cell = sLSTMCell(self.config)
^^^^^^^^^^^^^^^^^^^^^^
File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\cell.py", line 780, in new
return sLSTMCell_cuda(config, skip_backend_init=skip_backend_init)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\cell.py", line 690, in init
self.func = sLSTMCellFuncGenerator(self.training, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\cell.py", line 536, in sLSTMCellFuncGenerator
slstm_cuda = sLSTMCellCUDA.instance(config=config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\cell.py", line 515, in instance
cls.mod[repr(config)] = load(
^^^^^
File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda_init.py", line 84, in load
mod = _load(name + suffix, sources, **myargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py", line 1314, in load
return _jit_compile(
^^^^^^^^^^^^^
File "D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py", line 1721, in _jit_compile
_write_ninja_file_and_build_library(
File "D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py", line 1833, in _write_ninja_file_and_build_library
_run_ninja_build(
File "D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py", line 2120, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0'
this is my problem,how to solve,please.I try it all one day

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants