[HANDS-ON BUG] #560

MojtabaAbdi · 2024-09-08T13:31:59Z

### Bonus Unit 1 Notebook Error
Hello. I have a problem with executing my code in Bonus Unit 1 and it arises from this line, where, honestly talking, I have not manipulated anything:

!mlagents-learn ./config/ppo/Huggy.yaml --env=./trained-envs-executables/linux/Huggy/Huggy --run-id="Huggy2" --no-graphics

Below is a screetshot of an execution of the cell:

Actually I have copied the Bonus Unit 1 notebook to my google drive and ran there.

The text was updated successfully, but these errors were encountered:

RubSevian · 2024-09-08T19:38:46Z

I have same problem

RubSevian · 2024-09-08T21:06:18Z

I fixed this problem with a quick fix of 56 lines on torch.float32 in the file /content/ml-agents/ml-agents/mlagents/torch_utils/torch.py .
P.S this line has already been fixed in the screenshot

simoninithomas · 2024-09-09T07:30:06Z

Hi, I think the solution for now provided by @RubSevian is the best (thanks 🤗 ) I'm going to check with MLAgents team to see where this error comes from.

MojtabaAbdi · 2024-09-09T12:26:27Z

@RubSevian @simoninithomas Thank you a lot. It worked for me too.

iyaijuil · 2024-09-09T19:54:16Z

Hi, I think the solution for now provided by @RubSevian is the best (thanks 🤗 ) I'm going to check with MLAgents team to see where this error comes from.

Hi, I also meet the same problem in unit5 SnowballTarget, I tried the same solution by @RubSevian but still can't fix it (it worked when I tried to fix Unit1 problem)

Here is the screenshot of an execution of the cell after I applied @RubSevian solution:

"RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)"

RubSevian · 2024-09-13T13:23:56Z

@iyaijuil Based on your mistake, I can make an assumption that the problem is in choosing a device, perhaps you need to specify what specifically to use the cpu or video card (CUDA)

MrPark97 · 2024-09-14T14:51:40Z

Hi, I think the solution for now provided by @RubSevian is the best (thanks 🤗 ) I'm going to check with MLAgents team to see where this error comes from.

Hi, I also meet the same problem in unit5 SnowballTarget, I tried the same solution by @RubSevian but still can't fix it (it worked when I tried to fix Unit1 problem)

Here is the screenshot of an execution of the cell after I applied @RubSevian solution:
"RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)"

I've encountered same problem with 5th unit

iyaijuil · 2024-09-15T02:22:21Z

@iyaijuil Based on your mistake, I can make an assumption that the problem is in choosing a device, perhaps you need to specify what specifically to use the cpu or video card (CUDA)

Thanks for your reply. I used google colab to train the model. I followed the tutorial to use T4 GPU as my runtime type, and I used Macbook pro M3. Is it because there is any conflict within this set up?

maartenx01 · 2024-09-17T18:07:40Z

I'm encountering the same issue on Unit 5 of Deep RL Course of RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm) No issues with Units 1-4.

Andimeo · 2024-09-18T08:11:09Z

Same for me. Don't know how to explicitly set device.

I've even tried to add a .to(device) for each forward function in. mlagents/trainers/torch_entities/networks.py. But another error (about ambiguous bool or something) shows.

MojtabaAbdi · 2024-09-18T17:32:23Z

Actually, you don't need to train using a GPU. It took me 12 minutes to train the model with a cpu on colab. Thereby, you won't encounter errors.

maartenx01 · 2024-09-18T18:40:19Z

Actually, you don't need to train using a GPU. It took me 12 minutes to train the model with a cpu on colab. Thereby, you won't encounter errors.

Thank you so much! This worked!

grib0ed0v · 2024-10-30T15:26:38Z

Looks like the proposed fix (changing torch.cuda.FloatTensor to torch.float32) was merged in upstream of ml-agents .

But to me, it also doesn't work. I experienced the same as @iyaijuil described.

I finally just run experiment on cpu by adding env variable.

!CUDA_VISIBLE_DEVICES='' mlagents-learn ./config/ppo/SnowballTarget.yaml --env=./training-envs-executables/linux/SnowballTarget/SnowballTarget --run-id="SnowballTarget1" --no-graphics

To me, it took around 8 min training for 200k on Colab CPU, so I agree with @MojtabaAbdi - just run on CPU and that's it.

[INFO] SnowballTarget. Step: 200000. Time Elapsed: 443.264 s. Mean Reward: 25.114. Std of Reward: 2.328. Training.

Nagi-ovo mentioned this issue Sep 25, 2024

Unit 1: Part : Train our agent #565

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HANDS-ON BUG] #560

[HANDS-ON BUG] #560

MojtabaAbdi commented Sep 8, 2024 •

edited

Loading

RubSevian commented Sep 8, 2024

RubSevian commented Sep 8, 2024

simoninithomas commented Sep 9, 2024

MojtabaAbdi commented Sep 9, 2024

iyaijuil commented Sep 9, 2024

RubSevian commented Sep 13, 2024

MrPark97 commented Sep 14, 2024

iyaijuil commented Sep 15, 2024

maartenx01 commented Sep 17, 2024

Andimeo commented Sep 18, 2024

MojtabaAbdi commented Sep 18, 2024

maartenx01 commented Sep 18, 2024

grib0ed0v commented Oct 30, 2024

[HANDS-ON BUG] #560

[HANDS-ON BUG] #560

Comments

MojtabaAbdi commented Sep 8, 2024 • edited Loading

RubSevian commented Sep 8, 2024

RubSevian commented Sep 8, 2024

simoninithomas commented Sep 9, 2024

MojtabaAbdi commented Sep 9, 2024

iyaijuil commented Sep 9, 2024

RubSevian commented Sep 13, 2024

MrPark97 commented Sep 14, 2024

iyaijuil commented Sep 15, 2024

maartenx01 commented Sep 17, 2024

Andimeo commented Sep 18, 2024

MojtabaAbdi commented Sep 18, 2024

maartenx01 commented Sep 18, 2024

grib0ed0v commented Oct 30, 2024

MojtabaAbdi commented Sep 8, 2024 •

edited

Loading