Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default logger is now tensorboard #609

Merged
merged 5 commits into from
Jan 14, 2020
Merged

default logger is now tensorboard #609

merged 5 commits into from
Jan 14, 2020

Conversation

williamFalcon
Copy link
Contributor

@williamFalcon williamFalcon commented Dec 8, 2019

Sets Tensorboard logger as default instead of test-tube.

Also removes the test-tube dep from the package and thus the tensorflow dep

@neggert
Copy link
Contributor

neggert commented Dec 8, 2019

I was hoping to have a few people try it out before we make it the default...

@williamFalcon
Copy link
Contributor Author

williamFalcon commented Dec 8, 2019

sure. let's keep the PR open for a few days while we try it out.
This was the last main thing for this release.

@tullie

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace tensorboard longer also in tests as default logger

@@ -4,5 +4,4 @@ numpy>=1.16.4
torch>=1.1
torchvision>=0.4.0
pandas>=0.24 # lower version do not support py3.7
test-tube>=0.7.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test-tube to test/requirements.txt

@Borda
Copy link
Member

Borda commented Dec 8, 2019

The logger is broken:

tmpdir = local('/tmp/pytest-of-travis/pytest-0/test_tensorboard_logger0')
    def test_tensorboard_logger(tmpdir):
        """Verify that basic functionality of Tensorboard logger works."""
    
        hparams = tutils.get_hparams()
        model = LightningTestModel(hparams)
    
        logger = TensorboardLogger(save_dir=tmpdir, name="tensorboard_logger_test")
    
        trainer_options = dict(max_epochs=1, train_percent_check=0.01, logger=logger)
    
        trainer = Trainer(**trainer_options)
>       result = trainer.fit(model)
tests/test_logging.py:207: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pytorch_lightning/trainer/trainer.py:403: in fit
    self.run_pretrain_routine(model)
pytorch_lightning/trainer/trainer.py:455: in run_pretrain_routine
    self.logger.save()
pytorch_lightning/logging/base.py:14: in wrapped_fn
    fn(self, *args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <pytorch_lightning.logging.tensorboard.TensorboardLogger object at 0x7fd30a81e8d0>
    @rank_zero_only
    def save(self):
>       self.experiment.flush()
E       AttributeError: 'SummaryWriter' object has no attribute 'flush'
pytorch_lightning/logging/tensorboard.py:78: AttributeError

Lets fix the CI - Travis, Appveyor and CircleCI so we do not merge code which fails...

@Borda
Copy link
Member

Borda commented Dec 8, 2019

@williamFalcon rebase master, pls...

@snie2012
Copy link

Any update on this?

@CarloLucibello
Copy link

CarloLucibello commented Dec 21, 2019

I get the following error due to a call to SummaryWriter.add_hparams:

{'batch_size': 128, 'epochs': 10, 'lr': 0.1, 'weight_decay': 0.0005, 'save_model': False, 'load_model': '', 'droplr': 5, 'opt': 'nesterov', 'loss': 'nll', 'model': 'mlp_100_100', 'dataset': 'fashion', 'datapath': '~/data/', 'log_interval': 2, 'train_frac': 1.0, 'test_frac': 1.0, 'preprocess': False, 'gpus': None, 'use_16bit': False, 'seed': 872846804}
ERROR:LightDNN:Failed after 0:00:00!
Traceback (most recent calls WITHOUT Sacred internals):
  File "scripts/light_dnn.py", line 220, in main
    trainer.fit(model)
  File "/home/carlo/Git/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 403, in fit
    self.run_pretrain_routine(model)
  File "/home/carlo/Git/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 453, in run_pretrain_routine
    self.logger.log_hyperparams(ref_model.hparams)
  File "/home/carlo/Git/pytorch-lightning/pytorch_lightning/logging/base.py", line 14, in wrapped_fn
    fn(self, *args, **kwargs)
  File "/home/carlo/Git/pytorch-lightning/pytorch_lightning/logging/tensorboard.py", line 76, in log_hyperparams
    self.experiment.add_hparams(hparam_dict=dict(params), metric_dict={})
  File "/home/carlo/miniconda3/envs/sacred37/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 292, in add_hparams
    exp, ssi, sei = hparams(hparam_dict, metric_dict)
  File "/home/carlo/miniconda3/envs/sacred37/lib/python3.7/site-packages/torch/utils/tensorboard/summary.py", line 156, in hparams
    raise ValueError('value should be one of int, float, str, bool, or torch.Tensor')
ValueError: value should be one of int, float, str, bool, or torch.Tensor

This is because I have a NoneType value for the gpus params:

hparams = { 'gpus': None}

It can also be a problem, for list of gpus indexes, since lists are not supported as well.
I suggest we cast the gpus field (or any other non-supported field type) to string.

@Borda
Copy link
Member

Borda commented Dec 21, 2019

This is because I have a NoneType value for the gpus params:

hparams = { 'gpus': None}

It can also be a problem, for list of gpus indexes, since lists are not supported as well.
I suggest we cast the gpus field (or any other non-supported field type) to string.

good point, for list I would assume string with some common separator and of None I would drop such option from hparams...

1 similar comment
@Borda
Copy link
Member

Borda commented Dec 21, 2019

This is because I have a NoneType value for the gpus params:

hparams = { 'gpus': None}

It can also be a problem, for list of gpus indexes, since lists are not supported as well.
I suggest we cast the gpus field (or any other non-supported field type) to string.

good point, for list I would assume string with some common separator and of None I would drop such option from hparams...

@williamFalcon
Copy link
Contributor Author

@neggert @Borda is this ready to merge?

@Borda Borda added this to the 0.6.0 milestone Jan 14, 2020
@neggert
Copy link
Contributor

neggert commented Jan 14, 2020

It looks like somehow the test-tube dependency got added back in with the merge commit.

@neggert
Copy link
Contributor

neggert commented Jan 14, 2020

Oh, wait. I was just doing weird things with the merge commit summary. LGTM.

@williamFalcon williamFalcon merged commit 88b750a into master Jan 14, 2020
@Borda Borda mentioned this pull request Jan 15, 2020
@snie2012
Copy link

Quick question, why is test-tube still in requirements.txt?

@Borda
Copy link
Member

Borda commented Jan 15, 2020

we are fixing it in #609 together with some other minor issues... :]

@snie2012
Copy link

Do you mean #687 ? @Borda

@williamFalcon williamFalcon deleted the tb branch February 16, 2020 16:29
@shubhamagarwal92
Copy link
Contributor

shubhamagarwal92 commented Mar 12, 2020


This is because I have a NoneType value for the gpus params:

```python
hparams = { 'gpus': None}

It can also be a problem, for list of gpus indexes, since lists are not supported as well.
I suggest we cast the gpus field (or any other non-supported field type) to string.

@williamFalcon @Borda I am still facing this issue for pytorch-lightning==0.7.1.

Basically, I tried to pass a list of gpus in the NER example in the transformers repo here and was facing the same value error raise ValueError('value should be one of int, float, str, bool, or torch.Tensor'). I can see a simple fix for this. Tensorboard limits passing Nonetype or list or even dictionary values here

Instead of passing all the hparams to tensorboard via pl here, we put a small check to pass int, float, str, bool, or torch.Tensor for logging.

        from six import string_types
        from torch.utils.tensorboard.summary import hparams
        tensorboard_params = {}
        for k, v in params.items():
            if isinstance(v, int) or isinstance(v, float) or isinstance(v, string_types) or isinstance(
                    v, bool) or isinstance(v, torch.Tensor):
                tensorboard_params[k] = v
        exp, ssi, sei = hparams(tensorboard_params, {})

IMO, there could be cases when the user wants to parse a list or maybe nonetype objects through argparse but doesnt want them to be logged via tensorboard.

Please let me know if this makes sense and I should raise a PR!

@Borda
Copy link
Member

Borda commented Mar 12, 2020

@shubhamagarwal92 hs for this elaboration, sure PR is welcome!

shubhamagarwal92 added a commit to shubhamagarwal92/pytorch-lightning that referenced this pull request Mar 12, 2020
@shubhamagarwal92 shubhamagarwal92 mentioned this pull request Mar 12, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants