-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
empty meta_tags.csv #169
Comments
should be a path
|
Oh, I see |
Hello, I tested today from your reply!
Then output is
Then I loaded this pretrained model
But, I got this error.
And then I tried with including file name
But I got this error.
Would you give me some advice? |
You need to load your module, not the general LightningModule. Instead of pl.LightningModule.load_from_metrics(... Do this: CountingSystem.load_from_metrics( |
I changed code, but I got the same error message.
BTW, is this correct way to map gpu location? |
Still same error.. And this is meta_tags.csv file. In metric.csv, there are training logs |
no. the file should have something. it might have to do with you using ipynb. i believe the logger doesn’t support that atm |
fit and load without ipynb |
we have plenty of tests that save and load models. so it’s likely the ipynb |
I see.. then I can't save model with jupyter notebook.. |
enable auto_save to true or do exp.save() otherwise, tags won’t be saved so...
|
maybe auto_save makes sense to default to true? |
I dig into Experiment class in test-tube to figure out where a tag is not saved, but I couldn't get it.. It'd be nice if meta tags are saved automatically as default. |
ok. is this exactly the code you’re using? i’ll run it locally to see what’s happening. |
Yes this is the full code which I'm using. |
I see it now...
A few things are wrong with your code:
this works for me: import os
import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
import torchvision.transforms as transforms
import pytorch_lightning as pl
class CoolSystem(pl.LightningModule):
def __init__(self, hparams=None):
super(CoolSystem, self).__init__()
# not the best model...
self.l1 = torch.nn.Linear(28 * 28, 10)
def forward(self, x):
return torch.relu(self.l1(x.view(x.size(0), -1)))
def training_step(self, batch, batch_nb):
# REQUIRED
x, y = batch
y_hat = self.forward(x)
return {'loss': F.cross_entropy(y_hat, y)}
def validation_step(self, batch, batch_nb):
# OPTIONAL
x, y = batch
y_hat = self.forward(x)
return {'val_loss': F.cross_entropy(y_hat, y)}
def validation_end(self, outputs):
# OPTIONAL
avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
return {'avg_val_loss': avg_loss}
def configure_optimizers(self):
# REQUIRED
# can return multiple optimizers and learning_rate schedulers
return torch.optim.Adam(self.parameters(), lr=0.02)
@pl.data_loader
def tng_dataloader(self):
# REQUIRED
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)
@pl.data_loader
def val_dataloader(self):
# OPTIONAL
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)
@pl.data_loader
def test_dataloader(self):
# OPTIONAL
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)
from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import ModelCheckpoint, EarlyStopping
from test_tube import Experiment
model = CoolSystem()
checkpoint_callback = ModelCheckpoint(
filepath='base_ckpt',
save_best_only=True,
verbose=True,
monitor='avg_val_loss',
mode='auto'
)
early_stopping = EarlyStopping(
monitor='avg_val_loss',
patience=5,
verbose=True,
mode='auto'
)
exp = Experiment(save_dir=os.getcwd(), version=0)
trainer = Trainer(experiment=exp, max_nb_epochs=1000, train_percent_check=0.1, checkpoint_callback=checkpoint_callback, early_stop_callback=early_stopping)
# trainer.fit(model)
exp.save()
pretrained_model = CoolSystem.load_from_metrics(
weights_path='./base_ckpt/_ckpt_epoch_2.ckpt',
tags_csv='./default/version_0/meta_tags.csv',
on_gpu=True,
map_location=None
)
print('model loaded...') |
@Soo95 but your user experience highlights a good thing we need to fix.
|
Hello! Good Day.
I have already trained the model, and i wasnt able to make inference because of not enough GPU memory. Therefore a restarted my kernel, and decided to load the model again for inference with the next few lines of code:
Im getting the next error:
From what is have been testing, it seams that my file meta_data.csv file only has the headers key and value, but it didnt save any thing regarding the input parameters that i chose at the beggining of training. So, the first question is why it didnt save them? and the second one is: is there a way in which i can load those parameters in any way, so i can use the model?. This model training took almost 18 hours, and i dont have time to train it again right now. Any help would be much appreciated |
Before asking:
If you still can't find what you need:
What is your question?
For loading data, I should write down path for 'meta_tags.csv'.
However, in the 'meta_tags.csv', there is nothing.
Code
Please paste a code snippet if your question requires it!
Same get_started code with checkpoint callbacks
What have you tried?
I tried on my model and this base tutorial code. But both of them are failed.
I just started using this API today, but I've faced a lot of difficulties..
Sometimes, I need to import exit from sys.
Sometimes, loss doesn't decrease. When I restart jupyter notebook multiple times, suddenly loss gets decreasing.
Sometimes, memory can't be allocated during training (about 81 epochs).
And I can't tag something like 'Losses/train_loss', 'Losses/valid_loss' for tensorboard by using test-tube... The error said 'loss' is referenced before assignments.
I can tolerate these difficulties because I can still progress my training.
But I don't have any solution with loading models from empty file.
What's your environment?
conda version (no venv)
no conda
PyTorch version
torch==1.2.0+cu92
torchvision==0.4.0
Lightning version
pytorch-lightning==0.4.7
Test-tube version
test-tube==0.6.9
The text was updated successfully, but these errors were encountered: