-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: device-side assert triggered when running Llama on multiple gpus #22546
Comments
This repo is not in sync with the model and tokenizer as implemented in the Transformers library. Sadly, we do not have permission to distribute the weights, so there is no official checkpoint you can use. After you get the official weights from Meta and run the conversion command as documented, you shouldn't have any problem with the model. |
@sgugger I'm experiencing exactly the same error when using official llama weights converted using the huggingface conversion script from the master branch. It happens on the master branch when running inference with accelerate on multiple GPUs (I tried 2x4090 and 4x4090). To reproduce:
This used to work last week. I don't have the exact branch commit ID, but could do git bisect if it'd help. I'm using pytorch==2.0.0, cuda 11.7, and recent versions of accelearte and bitsandbytes (yes, it also shows the same error with load_in_8bits=True). |
@emvw7yf Could you print |
@sgugger I managed to use the official llama weights and still getting the same error, for llama 7B, using the code from #22546 (comment), and printing {'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 0, 'model.layers.12': 0, 'model.layers.13': 0, 'model.layers.14': 0, 'model.layers.15': 0, 'model.layers.16': 1, 'model.layers.17': 1, 'model.layers.18': 1, 'model.layers.19': 1, 'model.layers.20': 1, 'model.layers.21': 1, 'model.layers.22': 1, 'model.layers.23': 1, 'model.layers.24': 1, 'model.layers.25': 1, 'model.layers.26': 1, 'model.layers.27': 1, 'model.layers.28': 1, 'model.layers.29': 1, 'model.layers.30': 1, 'model.layers.31': 1, 'model.norm': 1, 'lm_head': 1} |
So I skimmed through the existing repos to look for one that has the same weights/tokenizer as what I get after the conversion script is applied. Applying this code:
gives me the exact same device map as you @TerryCM and works without any issue. I am on Transformers main and Accelerate latest version. |
@sgugger I'm also on Transformers main and accelerate version (I used pip install accelerate), could be this a drivers problem? Im using the following drivers |
Are you using the same repository as me? I'm on CUDA 11.8 and 520 drivers. |
I can reliably reproduce it on both runpod.io and vast.ai. I'm using 2x4090 GPUs and the default docker image on each service (runpod/pytorch:3.10-2.0.0-117 and pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel). I'm running the following:
This results in the assertion error above. When I restrict it to a single GPU (using CUDA_VISIBLE_DEVICES), it works without errors. Versions (taken on vast.ai):
How could I help debugging this? |
I actually realized that the error I'm getting is slightly different (even though the assertion is the same), pasting it below:
|
Interestingly, I'm not getting this error on my home machine. I'm using the same GPUs and the same docker image, so the versions are exactly the same - except the nvidia driver is 525.89.02 instead of 525.78.01. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
I encountered the same error, with CUDA Version: 11.7 and Driver Version: 515.86.01 |
In "config.json" change "pad_token_id=-1" to "pad_token_id=2". This happens because during batch generation, the model sometimes generates pad_token_id=-1 |
how to solve? |
Thanks! This solve my problem. |
same problem |
same error when I load model on multiple gpus eg. 4,which set bu CUDA_VISIBLE_DEVICES=0,1,2,3. but when I load model only in 1 gpu, It can generate result succesfully. my code:
` |
I'm experiencing the same issue with two gpus. When I replace |
same problem when running with multiple gpus |
same problem here |
Please stop commenting with "same problem" without providing a reproducer. We can't do anything about a bug we can't reproduce. |
@sgugger sorry, here's my environment: |
why set pad_token_id to 2 instead of 0? Does this (set pad_token_id to 2) have any impact on the model performance? |
I could reproduce the error with following env. 2x A100 (80 GB each) python 3.10.6 torch.version.cuda -> 11.8 Model "meta-llama/Llama-2-70b-chat-hf"
Worked fine with 1xA100 with 8 bit model set to true. I just wanted to run some test with 16 bit mode. |
@thusithaC Does it only happen for the 70b model or does it also happen with the 7b model? cc @SunMarc @shl518 This is not a reproducer, it lacks how the model is created or what prompts you pass it. |
sorry, I have update my issue |
hi @sgugger I could only reproduce it for the 70B model. The trigger condition seemed to be getting split up in multiple GPUs, and was difficult to do so with the 13/7 B models. |
I had a similar error. reference |
tokenizer.pad_token = tokenizer.eos_token |
I'm getting a similar issue with 2x A100s for inference (or training too). [Although I don't get this issue with 4x A6000s] Environment
Reproduction:
Error
|
Any permanent solution to this? Getting the same error fine-tuning meta-llama/Llama-2-70b-hf on 2x A100. |
This is an issue with the way cuda is installed, not transformers |
Just in case it's helpful, I encounted this similar error and I found that I forgot to change model's embedding size after adding special tokens (
Similar to #24698. |
Have you fix or is there a cuda/nvidia repo where it's best to post? Thanks |
I finally solved this by disabling ACS in bios, ref https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/troubleshooting.html#pci-access-control-services-acs. Changing nvidia driver and cuda version doesn't help. This test is very helpful. https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/troubleshooting.html#gpu-to-gpu-communication |
My code worked when I had two GPUs. I added one more GPU and updated my driver yesterday. It breaks my code. Error code: As for the ACS, it's disabled. |
solved for me by resizing token embeddings after adding pad token. |
@asherisaac |
With a setup consisting of 4x RTX 3090 GPUs and a TRX40 motherboard. I run my model in a Docker container and encountered a similar CUDA error. The issue was solved by updating the NVIDIA driver to the latest version and upgrading CUDA from 12.3 to 12.4 on the host machine, without changing the CUDA version within the container. Hope the solution be helpful for those experiencing similar issues in containerized environments. Note: My host OS is Arch Linux with kernel version 6.8.2-arch2-1, and both CUDA 11.7 and CUDA 12.4 work in the container. |
Hi, could you please explain a bit more why it is an issue with CUDA, and how do we solve it? |
System Info
transformers
version: 4.28.0.dev0Who can help?
@sgugger @MKhalusova @ArthurZucker @younesbelkada
I am experiencing an assertion error in ScatterGatherKernel.cu when using LlamaTokenizer and multi-GPU inference with any variant of Llama model. The error occurs during the model.generate() call.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
{'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 0, 'model.layers.12': 0, 'model.layers.13': 0, 'model.layers.14': 0, 'model.layers.15': 0, 'model.layers.16': 0, 'model.layers.17': 0, 'model.layers.18': 0, 'model.layers.19': 0, 'model.layers.20': 0, 'model.layers.21': 0, 'model.layers.22': 0, 'model.layers.23': 0, 'model.layers.24': 0, 'model.layers.25': 0, 'model.layers.26': 0, 'model.layers.27': 0, 'model.layers.28': 0, 'model.layers.29': 1, 'model.layers.30': 1, 'model.layers.31': 1, 'model.layers.32': 1, 'model.layers.33': 1, 'model.layers.34': 1, 'model.layers.35': 1, 'model.layers.36': 1, 'model.layers.37': 1, 'model.layers.38': 1, 'model.layers.39': 1, 'model.layers.40': 1, 'model.layers.41': 1, 'model.layers.42': 1, 'model.layers.43': 1, 'model.layers.44': 1, 'model.layers.45': 1, 'model.layers.46': 1, 'model.layers.47': 1, 'model.layers.48': 1, 'model.layers.49': 1, 'model.layers.50': 1, 'model.layers.51': 1, 'model.layers.52': 1, 'model.layers.53': 1, 'model.layers.54': 1, 'model.layers.55': 1, 'model.layers.56': 1, 'model.layers.57': 1, 'model.layers.58': 1, 'model.layers.59': 2, 'model.norm': 2, 'lm_head': 2}
Loading checkpoint shards: 100%|██████████████████████████| 61/61 [00:25<00:00, 2.43it/s]
/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [64,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [65,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [66,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [67,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [68,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [69,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [70,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [71,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [72,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [73,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [74,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [75,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [76,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [77,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [78,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [79,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [80,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [81,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [82,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [83,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [84,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [85,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [86,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [87,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [88,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [89,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [90,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [91,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [92,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [93,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [94,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [95,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [96,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [97,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [98,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [99,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [100,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [101,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [102,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [103,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [104,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [105,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [106,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [107,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [108,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [109,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [110,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [111,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [112,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [113,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [114,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [115,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [116,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [117,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [118,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [119,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [120,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [121,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [122,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [123,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [124,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [125,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [126,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [127,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [0,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [1,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [2,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [3,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [4,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [5,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [6,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [7,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [8,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [9,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [10,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [11,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [12,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [13,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [14,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [15,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [16,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [17,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [18,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [19,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [20,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [21,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [22,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [23,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [24,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [25,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [26,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [27,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [28,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [29,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [30,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [31,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [32,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [33,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [34,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [35,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [36,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [37,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [38,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [39,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [40,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [41,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [42,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [43,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [44,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [45,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [46,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [47,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [48,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [49,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [50,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [51,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [52,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [53,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [54,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [55,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [56,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [57,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [58,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [59,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [60,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [61,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [62,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [63,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [64,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [65,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [66,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [67,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [68,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [69,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [70,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [71,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [72,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [73,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [74,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [75,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [76,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [77,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [78,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [79,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [80,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [81,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [82,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [83,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [84,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [85,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [86,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [87,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [88,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [89,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [90,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [91,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [92,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [93,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [94,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [95,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [96,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [97,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [98,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [99,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [100,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [101,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [102,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [103,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [104,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [105,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [106,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [107,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [108,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [109,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [110,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [111,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [112,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [113,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [114,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [115,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [116,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [117,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [118,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [119,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [120,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [121,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [122,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [123,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [124,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [125,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [126,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [127,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [0,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [1,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [2,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [3,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [4,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [5,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [6,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [7,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [8,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [9,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [10,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [11,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [12,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [13,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [14,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [15,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [16,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [17,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [18,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [19,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [20,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [21,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [22,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [23,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [24,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [25,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [26,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [27,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [28,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [29,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [30,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [31,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [32,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [33,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [34,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [35,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [36,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [37,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [38,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [39,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [40,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [41,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [42,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [43,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [44,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [45,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [46,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [47,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [48,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [49,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [50,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [51,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [52,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [53,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [54,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [55,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [56,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [57,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [58,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [59,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [60,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [61,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [62,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [1,0,0], thread: [63,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.Traceback (most recent call last):
File "/home/u30/terrycruz/chatPaper.py", line 48, in
generated_ids = model.generate(
^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/transformers/generation/utils.py", line 1457, in generate
return self.contrastive_search(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/transformers/generation/utils.py", line 1871, in contrastive_search
outputs = self(
^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/u30/terrycruz/anaconda3/envs/multiple_gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 241, in forward
attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
CUDA_LAUNCH_BLOCKING=1 python script.py
Expected behavior
The puma bla bla bla.
The text was updated successfully, but these errors were encountered: