Problem of load pre-trained model. #1

happsky · 2018-06-21T20:45:14Z

The command I used is python baseline_resnet152charades.py.
Thank you in advance!

=> using pre-trained model 'resnet152'
loading pretrained-weights from /nfs.yoda/gsigurds/charades_pretrained/resnet_rgb.pth.tar
Traceback (most recent call last):
File "baseline_resnet152charades.py", line 38, in
main()
File "./main.py", line 60, in main
model, criterion, optimizer = create_model(args)
File "./models/init.py", line 10, in create_model
model = load_architecture(args)
File "./models/utils.py", line 78, in load_architecture
model = generic_load(args.arch, args.pretrained, args.pretrained_weights, args)
File "./models/utils.py", line 61, in generic_load
model = model.dictarch
File "./models/ActorObserverBase.py", line 55, in init
model = load_sub_architecture(args)
File "./models/utils.py", line 73, in load_sub_architecture
model = generic_load(args.subarch, args.pretrained, args.pretrained_subweights, args)
File "./models/utils.py", line 65, in generic_load
chkpoint = torch.load(weights)
File "/home/csdept/anaconda3/envs/py27/lib/python2.7/site-packages/torch/serialization.py", line 301, in load
f = open(f, 'rb')
IOError: [Errno 2] No such file or directory: '/nfs.yoda/gsigurds/charades_pretrained/resnet_rgb.pth.tar'

The text was updated successfully, but these errors were encountered:

gsig · 2018-06-21T23:07:11Z

The resnet baseline refers to models from https://github.com/gsig/charades-algorithms Download the following and update the path: https://www.dropbox.com/s/iy9fmk0r1a3edoz/resnet_rgb.pth.tar?dl=1 Hope that helps!

…

On Thu, Jun 21, 2018, 2:45 PM Happsky ***@***.***> wrote: The command I used is python baseline_resnet152charades.py. Thank you in advance! => using pre-trained model 'resnet152' loading pretrained-weights from /nfs.yoda/gsigurds/charades_pretrained/resnet_rgb.pth.tar Traceback (most recent call last): File "baseline_resnet152charades.py", line 38, in main() File "./main.py", line 60, in main model, criterion, optimizer = create_model(args) File "./models/*init*.py", line 10, in create_model model = load_architecture(args) File "./models/utils.py", line 78, in load_architecture model = generic_load(args.arch, args.pretrained, args.pretrained_weights, args) File "./models/utils.py", line 61, in generic_load model = model.*dict*arch <http://args> File "./models/ActorObserverBase.py", line 55, in *init* model = load_sub_architecture(args) File "./models/utils.py", line 73, in load_sub_architecture model = generic_load(args.subarch, args.pretrained, args.pretrained_subweights, args) File "./models/utils.py", line 65, in generic_load chkpoint = torch.load(weights) File "/home/csdept/anaconda3/envs/py27/lib/python2.7/site-packages/torch/serialization.py", line 301, in load f = open(f, 'rb') IOError: [Errno 2] No such file or directory: '/nfs.yoda/gsigurds/charades_pretrained/resnet_rgb.pth.tar' — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADbmNW4K1HdilvHf9Vz8RsA4s9Xee-obks5t_AXagaJpZM4UyvSs> .

happsky · 2018-06-22T13:51:35Z

Thanks for your quick reply, after downloading the pre-trained model, I meet another problem as follows.
The command I used is still python baseline_resnet152charades.py.
Thank you in advances!

cachefile ./caches/baseline_resnet152charades//CharadesEgo_train.pkl
Loading cached result from './caches/baseline_resnet152charades//CharadesEgo_train.pkl'
0 samples loaded
cachefile ./caches/baseline_resnet152charades//CharadesEgo_val.pkl
Loading cached result from './caches/baseline_resnet152charades//CharadesEgo_val.pkl'
0 samples loaded
cachefile ./caches/baseline_resnet152charades//CharadesEgo_val_video.pkl
Loading cached result from './caches/baseline_resnet152charades//CharadesEgo_val_video.pkl'
0 samples loaded
cachefile ./caches/baseline_resnet152charades//CharadesEgo_val.pkl
Loading cached result from './caches/baseline_resnet152charades//CharadesEgo_val.pkl'
0 samples loaded
Initializing FC7 extractor with AOB instance
Traceback (most recent call last):
File "baseline_resnet152charades.py", line 38, in
main()
File "./main.py", line 69, in main
scores = validate(trainer, loaders, model, criterion, args)
File "./main.py", line 48, in validate
scores.update(trainer.validate(val_loader, model, criterion, epoch, args))
File "./train.py", line 159, in validate
metrics.update(triplet_allk(*zip(*alloutputs)))
TypeError: triplet_allk() takes exactly 3 arguments (0 given)

/home/csdept/projects/actor-observer/train.py(159)validate()
-> metrics.update(triplet_allk(*zip(*alloutputs)))
(Pdb)

gsig · 2018-06-27T19:19:39Z

It seems that the code is unable to load in the Charades frames. Have you double checked the directories for the rgb frames? I recommend setting the --cache-buster flag to tell the code to ignore the "empty" cache it has generated. Let me know if you have any questions!

happsky · 2018-07-02T03:01:18Z

Thanks for your response! Now baseline_resnet152charades.py, baseline_resnet152imagenet.py and third_to_first_person.py worked for me, while when I run the comment python alignment_and_zeroshot.py, it occurs the following error,

cachefile ./caches/alignment_and_zeroshot//CharadesEgo_train.pkl
Loading cached result from './caches/alignment_and_zeroshot//CharadesEgo_train.pkl'
425450 samples loaded
cachefile ./caches/alignment_and_zeroshot//CharadesEgo_val.pkl
Loading cached result from './caches/alignment_and_zeroshot//CharadesEgo_val.pkl'
111643 samples loaded
cachefile ./caches/alignment_and_zeroshot//CharadesEgo_val_video.pkl
Loading cached result from './caches/alignment_and_zeroshot//CharadesEgo_val_video.pkl'
0 samples loaded
cachefile ./caches/alignment_and_zeroshot//CharadesEgoPlusRGB_train.pkl
Loading cached result from './caches/alignment_and_zeroshot//CharadesEgoPlusRGB_train.pkl'
0 samples loaded
cachefile ./caches/alignment_and_zeroshot//CharadesEgoPlusRGB_val.pkl
Loading cached result from './caches/alignment_and_zeroshot//CharadesEgoPlusRGB_val.pkl'
0 samples loaded
cachefile ./caches/alignment_and_zeroshot//CharadesEgoPlusRGB_val_video.pkl
Loading cached result from './caches/alignment_and_zeroshot//CharadesEgoPlusRGB_val_video.pkl'
0 samples loaded
cachefile ./caches/alignment_and_zeroshot//CharadesMeta_val_video.pkl
Loading cached result from './caches/alignment_and_zeroshot//CharadesMeta_val_video.pkl'
2425 samples loaded
fc7 norms: 49.0615310669 48.3031234741 49.2480506897
pairwise dist means: 14.2884759903 13.5230617523
scales:0.499999999048 0.499999999048 0.499999999048
./models/layers/ActorObserverLossAllWithClassifier.py:21: UserWarning: volatile was removed (Variable.volatile is always False)
if not cls.volatile:
./models/layers/ActorObserverLossAllWithClassifier.py:24: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
inds1 = [i for i, t in enumerate(target) if t.data[0] > 0]
./models/layers/ActorObserverLossAllWithClassifier.py:25: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
inds2 = [i for i, t in enumerate(target) if not t.data[0] > 0]
#triplets: 4 #class: 0
./models/layers/VideoSoftmax.py:37: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
self.storage[vid] = x.data[0]
./models/layers/ActorObserverLoss.py:38: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
x, w = x.data[0], w.data[0]
loss before 4.94595146179
loss after 4.94593954086
weight median: 1.0, var: 0.0
/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:56: void ClassNLLCriterion_updateOutput_no_reduce_kernel(int, THCDeviceTensor<Dtype, 2, int, DefaultPtrTraits>, THCDeviceTensor<long, 1, int, DefaultPtrTraits>, THCDeviceTensor<Dtype, 1, int, DefaultPtrTraits>, Dtype , int, int) [with Dtype = float]: block: [0,0,0], thread: [0,0,0] Assertion cur_target >= 0 && cur_target < n_classes failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THC/generated/../THCReduceAll.cuh line=339 error=59 : device-side assert triggered
Traceback (most recent call last):
File "alignment_and_zeroshot.py", line 41, in
main()
File "./main.py", line 77, in main
scores.update(trainer.train(train_loader, model, criterion, optimizer, epoch, args))
File "./train.py", line 99, in train
output, loss, weights = forward(inputs, target, model, criterion, meta['id'])
File "./train.py", line 76, in forward
loss, weights = criterion((list(output) + [target_var, ids]))
File "/home/csdept/anaconda3/envs/py27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "./models/layers/ActorObserverLossAllWithClassifier.py", line 58, in forward
f = self.clsweight * clsloss.sum()
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THC/generated/../THCReduceAll.cuh:339

/home/csdept/projects/actor-observer/models/layers/ActorObserverLossAllWithClassifier.py(58)forward()
-> f = self.clsweight * clsloss.sum()
(Pdb)
(Pdb)

And I double-check the paths, I think they are all correct. Do you have any suggestions?

gsig · 2018-07-02T03:11:01Z

The problem is below, it's not finding the original charades folder, make sure you have downloaded the original charades dataset as well and updated the path accordingly. Hope that helps. Let me know if that works. cachefile ./caches/alignment_and_zeroshot//CharadesEgoPlusRGB_train.pkl Loading cached result from './caches/alignment_and_ zeroshot//CharadesEgoPlusRGB_train.pkl' 0 samples loaded cachefile ./caches/alignment_and_zeroshot//CharadesEgoPlusRGB_val.pkl Loading cached result from './caches/alignment_and_ zeroshot//CharadesEgoPlusRGB_val.pkl' 0 samples loaded cachefile ./caches/alignment_and_zeroshot//CharadesEgoPlusRGB_val_video.pkl Loading cached result from './caches/alignment_and_ zeroshot//CharadesEgoPlusRGB_val_video.pkl' 0 samples loaded

…

On Sun, Jul 1, 2018, 11:01 PM Happsky ***@***.***> wrote: Thanks for your response! Now baseline_resnet152charades.py, baseline_resnet152imagenet.py and third_to_first_person.py worked for me, while when I run the comment python alignment_and_zeroshot.py, it occurs the following error, cachefile ./caches/alignment_and_zeroshot//CharadesEgo_train.pkl Loading cached result from './caches/alignment_and_zeroshot//CharadesEgo_train.pkl' 425450 samples loaded cachefile ./caches/alignment_and_zeroshot//CharadesEgo_val.pkl Loading cached result from './caches/alignment_and_zeroshot//CharadesEgo_val.pkl' 111643 samples loaded cachefile ./caches/alignment_and_zeroshot//CharadesEgo_val_video.pkl Loading cached result from './caches/alignment_and_zeroshot//CharadesEgo_val_video.pkl' 0 samples loaded cachefile ./caches/alignment_and_zeroshot//CharadesEgoPlusRGB_train.pkl Loading cached result from './caches/alignment_and_zeroshot//CharadesEgoPlusRGB_train.pkl' 0 samples loaded cachefile ./caches/alignment_and_zeroshot//CharadesEgoPlusRGB_val.pkl Loading cached result from './caches/alignment_and_zeroshot//CharadesEgoPlusRGB_val.pkl' 0 samples loaded cachefile ./caches/alignment_and_zeroshot//CharadesEgoPlusRGB_val_video.pkl Loading cached result from './caches/alignment_and_zeroshot//CharadesEgoPlusRGB_val_video.pkl' 0 samples loaded cachefile ./caches/alignment_and_zeroshot//CharadesMeta_val_video.pkl Loading cached result from './caches/alignment_and_zeroshot//CharadesMeta_val_video.pkl' 2425 samples loaded fc7 norms: 49.0615310669 48.3031234741 49.2480506897 pairwise dist means: 14.2884759903 13.5230617523 scales:0.499999999048 0.499999999048 0.499999999048 ./models/layers/ActorObserverLossAllWithClassifier.py:21: UserWarning: volatile was removed (Variable.volatile is always False) if not cls.volatile: ./models/layers/ActorObserverLossAllWithClassifier.py:24: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number inds1 = [i for i, t in enumerate(target) if t.data[0] > 0] ./models/layers/ActorObserverLossAllWithClassifier.py:25: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number inds2 = [i for i, t in enumerate(target) if not t.data[0] > 0] #triplets: 4 #class: 0 ./models/layers/VideoSoftmax.py:37: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number self.storage[vid] = x.data[0] ./models/layers/ActorObserverLoss.py:38: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number x, w = x.data[0], w.data[0] loss before 4.94595146179 loss after 4.94593954086 weight median: 1.0, var: 0.0 /opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THCUNN/ClassNLLCriterion.cu:56: void ClassNLLCriterion_updateOutput_no_reduce_kernel(int, THCDeviceTensor<Dtype, 2, int, DefaultPtrTraits>, THCDeviceTensor<long, 1, int, DefaultPtrTraits>, THCDeviceTensor<Dtype, 1, int, DefaultPtrTraits>, Dtype *, int, int) [with Dtype = float]: block: [0,0,0], thread: [0,0,0] Assertion cur_target >= 0 && cur_target < n_classes failed. THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THC/generated/../THCReduceAll.cuh line=339 error=59 : device-side assert triggered Traceback (most recent call last): File "alignment_and_zeroshot.py", line 41, in main() File "./main.py", line 77, in main scores.update(trainer.train(train_loader, model, criterion, optimizer, epoch, args)) File "./train.py", line 99, in train output, loss, weights = forward(inputs, target, model, criterion, meta['id']) File "./train.py", line 76, in forward loss, weights = criterion(*(list(output) + [target_var, ids])) File "/home/csdept/anaconda3/envs/py27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in *call* result = self.forward(*input, **kwargs) File "./models/layers/ActorObserverLossAllWithClassifier.py", line 58, in forward f = self.clsweight * clsloss.sum() RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THC/generated/../THCReduceAll.cuh:339 /home/csdept/projects/actor-observer/models/layers/ActorObserverLossAllWithClassifier.py(58)forward() -> f = self.clsweight * clsloss.sum() (Pdb) (Pdb) And I double-check the paths, I think they are all correct. Do you have any suggestions? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADbmNb3r2rIdgsZBEa16O95gar5QIJY8ks5uCYz-gaJpZM4UyvSs> .

happsky · 2018-07-03T17:34:31Z

Hey, I have downloaded the dataset, which part of original charades should I use, Data (scaled to 480p, 13GB), Data (original size) (55GB), RGB frames at 24fps (76GB) or Optical Flow at 24fps (45GB).
And how can I update the path, I mean, where should I modify for the original charades dataset?

Here is my third_to_first_person.py file,

#!/usr/bin/env python
import sys
import os
import subprocess
import traceback
import pdb
from bdb import BdbQuit
subprocess.Popen('find ./exp/.. -iname "*.pyc" -delete'.split())
sys.path.insert(0, '.')
os.nice(19)
from main import main
name = file.split('/')[-1].split('.')[0] # name is filename

args = [
'--name', name,
'--print-freq', '1',
'--train-file', './datasets/labels/CharadesEgo_v1_train.csv',
'--val-file', './datasets/labels/CharadesEgo_v1_test.csv',
'--dataset', 'charadesego',
'--data', '/home/csdept/projects/CharadesEgo/CharadesEgo_v1_rgb/CharadesEgo_v1_rgb/',
'--arch', 'ActorObserverBaseNoShare',
'--subarch', 'resnet152',
'--pretrained-subweights', './charades_pretrained/resnet_rgb.pth.tar',
'--loss', 'ActorObserverLossAll',
'--subloss', 'DistRatio',
'--decay', '0.95',
'--lr', '3e-5',
'--lr-decay-rate', '15',
'--batch-size', '4',
'--train-size', '0.2',
'--val-size', '0.5',
'--cache-dir', './caches/',
'--epochs', '50',
# '--evaluate',
'--alignment',
# '--usersalignment',
]
sys.argv.extend(args)
try:
main()
except BdbQuit:
sys.exit(1)
except Exception:
traceback.print_exc()
print ''
pdb.post_mortem()
sys.exit(1)

Here is my opts.py,

import argparse
import os

def parse():
print('parsing arguments')
parser = argparse.ArgumentParser(description='PyTorch Charades-Ego Training')
parser.add_argument('--data', metavar='DIR', default='/home/csdept/projects/CharadesEgo/CharadesEgo_v1_rgb/CharadesEgo_v1_rgb/', help='path to dataset')
parser.add_argument('--dataset', default='fake', help='name of dataset under datasets/')
parser.add_argument('--egocentric-test-data', default='./datasets/labels/CharadesEgo_v0_egocentric_test.csv', help='path to labels for egocentric classification')
parser.add_argument('--original-charades-train', default='./datasets/labels/Charades_v1_train.csv', help='Original Charades Train')
parser.add_argument('--original-charades-test', default='./datasets/labels/Charades_v1_test.csv', help='Original Charades Test')
parser.add_argument('--train-file', default='./datasets/labels/CharadesEgo_v1_train.csv', type=str)
parser.add_argument('--val-file', default='./datasets/labels/CharadesEgo_v1_test.csv', type=str)
parser.add_argument('--arch', '-a', metavar='ARCH', default='alexnet', help='model architecture: ')
parser.add_argument('--subarch', default='alexnet')
parser.add_argument('--subloss', default='MarginRank')
parser.add_argument('--loss', default='CrossEntropyLoss')
parser.add_argument('--workers', default=4, type=int, metavar='N', help='# data loading workers (default: 4)')
parser.add_argument('--epochs', default=20, type=int, metavar='N', help='number of total epochs to run')
parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number')
parser.add_argument('--batch-size', default=2, type=int, metavar='N', help='mini-batch size (default: 256)')
parser.add_argument('--lr', '--learning-rate', default=1e-3, type=float, metavar='LR', help='initial learning rate')
parser.add_argument('--lr-decay-rate', default=6, type=int)
parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum')
parser.add_argument('--decay', default=0.9, type=float)
parser.add_argument('--finaldecay', default=0.9, type=float)
parser.add_argument('--margin', default=0.0, type=float)
parser.add_argument('--clsweight', default=1.0, type=float)
parser.add_argument('--metric', default='wtop1val', help='metric to find best model')
parser.add_argument('--weight-decay', '--wd', default=1e-4, type=float, metavar='W', help='weight decay (1e-4)')
parser.add_argument('--print-freq', '-p', default=10, type=int, metavar='N', help='print frequency (10)')
parser.add_argument('--resume', default='', type=str, metavar='PATH', help='path to latest checkpoint (none)')
parser.add_argument('--evaluate', dest='evaluate', action='store_true', help='evaluate on val sets')
parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model')
parser.add_argument('--no-logger', dest='no_logger', action='store_true')
parser.add_argument('--cache-buster', dest='cache_buster', action='store_true')
parser.add_argument('--valvideo', dest='valvideo', action='store_true')
parser.add_argument('--valvideoego', dest='valvideoego', action='store_true')
parser.add_argument('--alignment', dest='alignment', action='store_true')
parser.add_argument('--usersalignment', dest='usersalignment', action='store_true')
parser.add_argument('--nopdb', dest='nopdb', action='store_true')
parser.add_argument('--pretrained-weights', default='', type=str)
parser.add_argument('--pretrained-subweights', default='', type=str)
parser.add_argument('--inputsize', default=224, type=int)
parser.add_argument('--world-size', default=1, type=int, help='number of distributed processes')
parser.add_argument('--manual-seed', default=0, type=int)
parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, help='url for distributed training')
parser.add_argument('--dist-backend', default='gloo', type=str, help='distributed backend')
parser.add_argument('--train-size', default=1.0, type=float)
parser.add_argument('--val-size', default=1.0, type=float)
parser.add_argument('--cache-dir', default='./caches/', type=str)
parser.add_argument('--name', default='test', type=str)
parser.add_argument('--nclass', default=157, type=int)
parser.add_argument('--accum-grad', default=4, type=int)
args = parser.parse_args()
args.distributed = args.world_size > 1
args.cache = args.cache_dir + args.name + '/'
if not os.path.exists(args.cache):
os.makedirs(args.cache)

return args

Thank you in advance!

gsig · 2018-07-06T14:11:33Z

RGB frames at 24fps (76GB) is what the code is expecting.

Good catch! Looks like I forgot to update the hardcoded path in

actor-observer/datasets/charadesegoplusrgb.py

Line 39 in 99a81f5

'data': args.original_charades_data})

You can edit this line directly in your code to your path. I pushed a commit adding an argument for this.

Let me know if that helps.

Best,
Gunnar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem of load pre-trained model. #1

Problem of load pre-trained model. #1

happsky commented Jun 21, 2018

gsig commented Jun 21, 2018 via email

happsky commented Jun 22, 2018

gsig commented Jun 27, 2018

happsky commented Jul 2, 2018

gsig commented Jul 2, 2018 via email

happsky commented Jul 3, 2018 •

edited

Loading

gsig commented Jul 6, 2018

Problem of load pre-trained model. #1

Problem of load pre-trained model. #1

Comments

happsky commented Jun 21, 2018

gsig commented Jun 21, 2018 via email

happsky commented Jun 22, 2018

gsig commented Jun 27, 2018

happsky commented Jul 2, 2018

gsig commented Jul 2, 2018 via email

happsky commented Jul 3, 2018 • edited Loading

gsig commented Jul 6, 2018

happsky commented Jul 3, 2018 •

edited

Loading