Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix local variables being collected into module_arguments dict #2048

Merged
merged 13 commits into from
Jun 4, 2020

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Jun 2, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    • not these changes specifically but it was discussed that we should fix any issues
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes issues 6 and 7 in #1937

  • local variables that are not also init args will not be collected
  • "self", "*args" or "**kwargs" can now be renamed to something else and collection still works
  • removed the redundant recursive child search because the "self" instance is ALWAYS the most specific type anyway.

The new tests that I added fail on master, proofing that the this PR actually fixes the issue.
I will follow up with another PR that will add even more tests for things like multiple inheritance etc.

@awaelchli awaelchli added the bug Something isn't working label Jun 2, 2020
@mergify mergify bot requested a review from a team June 2, 2020 02:07
@codecov
Copy link

codecov bot commented Jun 2, 2020

Codecov Report

Merging #2048 into master will increase coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #2048   +/-   ##
======================================
  Coverage      86%     86%           
======================================
  Files          75      75           
  Lines        4705    4709    +4     
======================================
+ Hits         4064    4068    +4     
  Misses        641     641           

@awaelchli awaelchli requested a review from Borda June 2, 2020 04:30
@awaelchli
Copy link
Contributor Author

kindly asking @yukw777 to also have a look at this since you were involved in the hparams PR.

@awaelchli awaelchli marked this pull request as ready for review June 2, 2020 05:06
Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`some questions, but otherwise looks good to me :)

frame = inspect.currentframe()

frame_args = _collect_init_args(frame.f_back, [])
child = _get_latest_child(frame)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't there a reason with inheritance for this?

Copy link
Contributor Author

@awaelchli awaelchli Jun 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it and I found that child is always the specific type, i.e. "self", even if self.auto_collect() is called in a super class.
So in a concrete example:

class A(object):
    def __init__(self, *args, **kwargs):
        print(self)


class B(A):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        print(self)

B()  # prints twice <__main__.B object at 0x0000021540C94288>

so really it does not matter where we collect "self", it will always be the child (leaf node) in the inheritance tree.
The previous inheritance tests still pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The child = _get_latest_child(frame) does not break anything, it's just redudant as far as I can tell.

varargs_identifier = spec.varargs # by convention this is named "*args"
kwargs_identifier = spec.varkw # by convention this is named "**kwargs"
exclude_argnames = (
varargs_identifier, kwargs_identifier, self_identifier, '__class__', 'frame', 'frame_args'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why also ignore *args and **kwargs?

Shouldn't we collect them as well and use them for reinstantiation/loading? Otherwise the module might be instantiated with different params which could cause incompatibilities with module and loaded checkpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if we have this class

class B(object):

    def __init__(self, *args, **kwargs):
        super().__init__()
        something = kwargs.get('something')
        self.auto_collect_arguments()

and we call it like this
B(something=1, other=2)
then we want to module arguments to be
dict(something=1, other=2)
and not
dict(args=[], kwargs={something=1, other=2})

It was like this before, this has not changed. The code that I added there just makes it so that the name *args and **kwargs is not hardcoded, but can be named whatever the user wants.
The Pyhton inspection magic makes it possible to determine which is which.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the explanation why the ignore, and the second part of your question:
The args inside **kwargs are still collected, see this line
local_args.update(local_args.get('kwargs', {}))

@mergify mergify bot requested a review from a team June 2, 2020 05:22
pytorch_lightning/core/lightning.py Show resolved Hide resolved
@mergify mergify bot requested a review from a team June 2, 2020 05:51
@justusschock justusschock self-requested a review June 2, 2020 05:57
@mergify mergify bot requested a review from a team June 2, 2020 05:58
@Borda Borda added this to the 0.8.0 milestone Jun 2, 2020
@Borda Borda requested a review from jeremyjordan June 2, 2020 13:58
@Borda
Copy link
Member

Borda commented Jun 2, 2020

the same as here #2039 (comment)

@yukw777
Copy link
Contributor

yukw777 commented Jun 2, 2020

I don't have any huge concerns about this. I do worry _collect_init_args() may be too brittle given the heavy usage of reflections, but I recognize why we need it... good test coverage and documentation would go a long way.

@awaelchli
Copy link
Contributor Author

good test coverage and documentation would go a long way.

by a long way you mean it's not ready yet?
I would agree, yes. The todo list is still long and I have test cases in mind already to follow up. One thing we should take a close look at soon is multiple inheritance.

@mergify
Copy link
Contributor

mergify bot commented Jun 2, 2020

This pull request is now in conflict... :(

@williamFalcon
Copy link
Contributor

williamFalcon commented Jun 3, 2020

@awaelchli @Borda good effort on tackling this!

I propose the final API for init interactions is as follows:

case 1

User explicitly says what they want to save.

class LitModel(...):

    def __init__(self, conf):
        self.save_hyperparameters(conf)

Case 2:

User wants to save all the init stuff.
They can do it all manually or ask us to do it automatically

class LitModel(...):

    def __init__(self, arg1, arg2, arg3):
        # manually
        self.save_hyperparameters(arg_name=arg1, arg_name=arg2, arg_name=arg3)
        
        # equivalent automatic
        self.save_hyperparameters()

Case 3:

They want to save ONLY some of the init stuff

class LitModel(...):

    def __init__(self, arg1, arg2, arg3):
        # manually
        self.save_hyperparameters(arg_name=arg2)

Special cases:

  • namespace
    def __init__(self, hparams):
        # manually
        self.save_hyperparameters(hparams)
  • dict
    def __init__(self, some_dict):
        # manually
        self.save_hyperparameters(some_dict)
  • omniconf
    def __init__(self, conf):
        # manually
        self.save_hyperparameters(conf)
  • anything they want
    def __init__(self, some_random_alternative_to_config):
        # manually
        self.save_hyperparameters(some_random_alternative_to_config)

@PyTorchLightning/core-contributors

@yukw777
Copy link
Contributor

yukw777 commented Jun 3, 2020

@awaelchli nah i just meant that as a general observation, not specific to this PR, which looks good to me!

@Borda Borda force-pushed the bugfix/fix-hparams-super-call branch from 8dc4650 to 79a6023 Compare June 4, 2020 11:55
@Borda Borda requested a review from williamFalcon June 4, 2020 12:29
@williamFalcon williamFalcon merged commit 4234992 into master Jun 4, 2020
@Borda Borda deleted the bugfix/fix-hparams-super-call branch June 4, 2020 12:35
@Borda
Copy link
Member

Borda commented Jun 4, 2020

the API changes will be applied in #2047

justusschock pushed a commit that referenced this pull request Jun 29, 2020
* do not include local vars in auto collection

* add test

* add test for model with "self" renamed to "obj"

* skip decorator

* changelog

* changelog

* update docs

* remove obsolete child collection

* generalize **args, **kwargs names

* docs

* also update varargs passed in

* Revert "also update varargs passed in"

This reverts commit 3d7a30d.

* update test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants