Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch LTS support (1.8.2) or stable (1.11.1) #60

Open
ziegenbalg opened this issue May 31, 2022 · 5 comments
Open

pytorch LTS support (1.8.2) or stable (1.11.1) #60

ziegenbalg opened this issue May 31, 2022 · 5 comments

Comments

@ziegenbalg
Copy link

ziegenbalg commented May 31, 2022

Hello!

I was wondering if someone can confirm that this package still runs under pytroch lts or current stable (1.11.1)?

I'm getting a curious error. Note this is for CPU training. Maybe someone can confirm this is only broken under cpu training.

Thank you!

`03:44 $ python ./tasks/adding_task.py -lr 0.0001 -rnn_type lstm -memory_type sam -nlayer 1 -nhlayer 1 -nhid 100 -dropout 0 -mem_slot 1000 -mem_size 32 -read_heads 1 -sparse_reads 4 -batch_size 20 -optim rmsprop -input_size 3 -sequence_max_length 100
Namespace(batch_size=20, check_freq=100, clip=50, cuda=-1, dropout=0.0, input_size=3, iterations=2000, lr=0.0001, mem_size=32, mem_slot=1000, memory_type='sam', nhid=100, nhlayer=1, nlayer=1, optim='rmsprop', read_heads=1, rnn_type='lstm', sequence_max_length=100, sparse_reads=4, summarize_freq=100, temporal_reads=2, visdom=False)
Using CPU.


SAM(3, 100, num_hidden_layers=1, nr_cells=1000, read_heads=1, cell_size=32)
SAM(
(lstm_layer_0): LSTM(35, 100, batch_first=True)
(rnn_layer_memory_shared): SparseMemory(
(interface_weights): Linear(in_features=100, out_features=70, bias=True)
)
(output): Linear(in_features=132, out_features=3, bias=True)
)

Iteration 0/2000
Falling back to FLANN (CPU).
For using faster, GPU based indexes, install FAISS: "conda install faiss-gpu -c pytorch"
Traceback (most recent call last):
File "./tasks/adding_task.py", line 222, in
loss.backward()
File "/home/eziegenbalg/.conda/envs/default/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/eziegenbalg/.conda/envs/default/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 1000]], which is output 0 of AsStridedBackward, is at version 70; expected version 69 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

^C
(default) ✘-INT ~/pytorch-dnc [master|✚ 2]
03:45 $ `

@ixaxaar
Copy link
Owner

ixaxaar commented Jun 1, 2022

AFAIK I did not really use the cpu training except while some testing.
Anyway, I need to update pytorch support, lemme look at it this weekend.

@ziegenbalg
Copy link
Author

Will report back here once I can confirm gpu training still works. Setting up env for LTS and 1.11.1 this week.

@ziegenbalg
Copy link
Author

Still broker with gpu I think?

(default) [eziegenbalg@localhost-live pytorch-dnc]$ python ./tasks/adding_task.py -cuda 0 -lr 0.0001 -rnn_type lstm -memory_type sam -nlayer 1 -nhlayer 1 -nhid 100 -dropout 0 -mem_slot 1000 -mem_size 32 -read_heads 1 -sparse_reads 4 -batch_size 20 -optim rmsprop -input_size 3 -sequence_max_length 100
Namespace(batch_size=20, check_freq=100, clip=50, cuda=0, dropout=0.0, input_size=3, iterations=2000, lr=0.0001, mem_size=32, mem_slot=1000, memory_type='sam', nhid=100, nhlayer=1, nlayer=1, optim='rmsprop', read_heads=1, rnn_type='lstm', sequence_max_length=100, sparse_reads=4, summarize_freq=100, temporal_reads=2, visdom=False)
Using CUDA.


SAM(3, 100, num_hidden_layers=1, nr_cells=1000, read_heads=1, cell_size=32, gpu_id=0)
SAM(
(lstm_layer_0): LSTM(35, 100, batch_first=True)
(rnn_layer_memory_shared): SparseMemory(
(interface_weights): Linear(in_features=100, out_features=70, bias=True)
)
(output): Linear(in_features=132, out_features=3, bias=True)
)

Iteration 0/2000
Falling back to FLANN (CPU).
For using faster, GPU based indexes, install FAISS: conda install faiss-gpu -c pytorch
Traceback (most recent call last):
File "./tasks/adding_task.py", line 222, in
loss.backward()
File "/home/eziegenbalg/.conda/envs/default/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/eziegenbalg/.conda/envs/default/lib/python3.8/site-packages/torch/autograd/init.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 1000]], which is output 0 of ScatterBackward0, is at version 57; expected version 56 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
(default) [eziegenbalg@localhost-live pytorch-dnc]$ conda install faiss-gpu -c pytorch
Collecting package metadata (current_repodata.json): done
Solving environment: done

All requested packages already installed.

(default) [eziegenbalg@localhost-live pytorch-dnc]$ nvidia-smi
Sun Jun 12 10:19:06 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:00:0C.0 Off | N/A |
| 32% 32C P8 14W / 215W | 1MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
(default) [eziegenbalg@localhost-live pytorch-dnc]$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="36 (Workstation Edition)"
ID=fedora
VERSION_ID=36
VERSION_CODENAME=""
PLATFORM_ID="platform:f36"
PRETTY_NAME="Fedora Linux 36 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:36"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f36/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=36
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=36
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation
(default) [eziegenbalg@localhost-live pytorch-dnc]$

@ziegenbalg
Copy link
Author

@ixaxaar have you had a chance to see if this works under the new pytorch LTS version?

@Marchetz
Copy link

Hi, I continue this issue to ask the same thing.
In these days, I was trying to use SDNC and SAM architecture with GPU setting but I have many problems with FAISS and with related libraries and packages. Instead, DNC model works perfectly.
I think that I have installed all the necessary package.
I would like to know if these two archictetures support the new pytorch version.
If everything works, it means that I'm wrong something during the installation process.

Thank you for the repository!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants