Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test: test_running_test_pretrained_model_ddp #979

Closed
neggert opened this issue Feb 28, 2020 · 2 comments · Fixed by #1017
Closed

Failing test: test_running_test_pretrained_model_ddp #979

neggert opened this issue Feb 28, 2020 · 2 comments · Fixed by #1017
Labels
bug Something isn't working help wanted Open to be worked on priority: 0 High priority task
Milestone

Comments

@neggert
Copy link
Contributor

neggert commented Feb 28, 2020

I think this is another problem stemming from the fact that we don't have a way to pass data back from torch.multiprocessing.spawn. Needs more investigation.

def test_running_test_pretrained_model_ddp(tmpdir):
        """Verify `test()` on pretrained model."""
        ...
        # run test set
        new_trainer = Trainer(**trainer_options)
>       new_trainer.test(pretrained_model)
tests/test_restore_models.py:60:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pytorch_lightning/trainer/trainer.py:1189: in test
    self.run_evaluation(test_mode=True)
pytorch_lightning/trainer/evaluation_loop.py:299: in run_evaluation
    if test_mode and not self.is_overriden('test_step'):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <pytorch_lightning.trainer.trainer.Trainer object at 0x7f845ec23f90>, f_name = 'test_step', model = None
    def is_overriden(self, f_name, model=None):
        if model is None:
            model = self.get_model()
        super_object = LightningModule
        # when code pointers are different, it was overriden
>       is_overriden = getattr(model, f_name).__code__ is not getattr(super_object, f_name).__code__
E       AttributeError: 'NoneType' object has no attribute 'test_step'
pytorch_lightning/trainer/model_hooks.py:20: AttributeError
@neggert neggert added bug Something isn't working help wanted Open to be worked on labels Feb 28, 2020
@Borda Borda added the need fix label Feb 28, 2020
@williamFalcon
Copy link
Contributor

williamFalcon commented Mar 1, 2020

https://pytorch.org/docs/stable/notes/multiprocessing.html#reuse-buffers-passed-through-a-queue

torch.multiprocessing is a drop in replacement for Python’s python:multiprocessing module. It supports the exact same operations, but extends it, so that all tensors sent through a python:multiprocessing.Queue, will have their data moved into shared memory and will only send a handle to another process.

Looks like we can use python:multiprocessing.Queue?

@Borda
Copy link
Member

Borda commented Mar 2, 2020

@williamFalcon Could we check these two commits on GPU - 20d15c8 and 5dd2afe?

@Borda Borda added the priority: 0 High priority task label Mar 2, 2020
@Borda Borda added this to the 0.7.0 milestone Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on priority: 0 High priority task
Projects
None yet
3 participants