Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a dedicated namespace for private objs on Parameterized #766

Merged
merged 34 commits into from
Jul 11, 2023

Conversation

maximlt
Copy link
Member

@maximlt maximlt commented Jun 20, 2023

At this stage this is more of a POC, that I wanted to share to see whether it's going in the right direction. The idea with this PR is to cleanup the private namespace of Parameterized classes and instances, mostly by defining a private namespace that we can defend and document, hopefully making it less likely for users to get name collisions. Some additional clean-up/renaming could also be done.

Take this example:

import param

class A(param.Parameterized):
    x = param.Number()

a = A(x=2)

Ignoring dunder methods and some public & private API meant to be removed for Param 2.0 (#767), these are the attributes (calling dir()) of A and a on the main branch currently:

A:

['_A__params',
 '_Parameterized__params',
 '_param',
 '_parameters_state',
 'name',
 'param',
 'x']

a:

['_A__params',
 '_Parameterized__params',
 '_dynamic_watchers',
 '_instance__params',
 '_name_param_value',
 '_param',
 '_param_watchers',
 '_parameters_state',
 '_x_param_value',
 'initialized',
 'name',
 'param',
 'x']

Among these:

  • _{{ClassName}}__params (class+instance) are Parameter dicts, mangled directly by Param. The double underscore after the class name is good IMO as it makes name clashes a lot less likely
    setattr(cls, '_%s__params' % cls.__name__, paramdict)
  • _{{AttributeName}}_param_value(instance) is the internal variable name that holds the current attribute value. It doesn't look too bad, the variable name including double underscore wouldn't hurt.
  • _param (class+instance) is the reference to the class Parameters object, that is returned when you call A.param (on a Parameterized instance like a calling a.param returns every time a new Parameters instance). _param seems very prone to name clashes.
  • _parameters_state (class+instance), _dynamic_watchers (instance), _param_watchers (instance), _instance__params (instance) and initialized (instance) are internal attributes. initialized is obviously the most embarrassing one, it's not private and is a pretty common variable name (@jlstevens had a problem with that just recently).

What I have done in this PR so far is to:

  • put the variables of the last item above under a private namespace (called _param__private for now) and adapted the code accordingly. I've renamed them under this namespace, removing their starting underscore.
  • rename _param to _param__parameters

These are the updated namespaces after these changes:

A:

['_A__params',
 '_Parameterized__params',
 '_param__parameters',
 '_param__private',
 'name',
 'param',
 'x']

a:

['_A__params',
 '_Parameterized__params',
 '_name_param_value',
 '_param__parameters',
 '_param__private',
 '_x_param_value',
 'name',
 'param',
 'x']

Additional notes:

  • The examples tests are currently failing due to Panel expecting somewhere the initialized attribute. I've also found 8 occurrences of _param_watchers being accessed in Panel and 1 in Lumen. Maybe there's a need for a public API to _param_watchers?
  • There aren't that many tests around pickling/unpickling (1?), if we go forward with the suggested approach reviewers will have to look closely at __setstate__ for instance.
  • I'm not entirely sure about the implementation yet, at this stage I'm more interested in feedback on the general approach of putting (part of) the private API under a single private namespace. Of course, pieces of advice on the implementation will be appreciated!

EDIT:

  • Update docs
  • Update docstrings
  • Test pickling more
  • Benchmark
  • Update Panel

@maximlt maximlt marked this pull request as draft June 20, 2023 08:45
@maximlt maximlt requested a review from jlstevens June 20, 2023 20:05
@maximlt
Copy link
Member Author

maximlt commented Jun 20, 2023

I had a look at a previous PR (#386) that turned .param into a property as its goal is similar to this PR, at least in the spirit. The comment on this PR is:

The reasoning behind this is that having accessors which hold a reference back to the Parameterized object create a circular reference which stops CPython's garbage collection from immediately collecting a Parameterized instance when it is no longer referenced by any external object instead having to wait for the next gc.collect() cycle.

So there was at the time a concern with circular references, which would indeed have been generated every time a Parameterized was created had .param not been a property (that creates a new Parameters instance every time it's called). However, I've noticed that circular references are generated anytime watchers are registered:

import param
import gc

gc.collect()

gc.set_debug(gc.DEBUG_SAVEALL)

class P(param.Parameterized):
    x = param.Number()

    @param.depends('x', watch=True)
    def debug(self): pass

p = P()
del p

gc.collect()
print(gc.garbage)

Output (would be an empty list without circular references):

[P(name='P00002', x=0.0), [], [], {'BATCH_WATCH': False, 'TRIGGER': False, 'events': [], 'watchers': []}, {'initialized': True, '_parameters_state': {'BATCH_WATCH': False, 'TRIGGER': False, 'events': [], 'watchers': []}, '_instance__params': {}, '_param_watchers': {'x': {'value': [Watcher(inst=P(name='P00002', x=0.0), cls=<class '__main__.P'>, fn=<function _m_caller.<locals>.caller at 0x102977700>, mode='args', onlychanged=True, parameter_names=('x',), what='value', queued=False, precedence=-1)]}}, '_dynamic_watchers': defaultdict(<class 'list'>, {}), '_name_param_value': 'P00002'}, defaultdict(<class 'list'>, {}), <cell at 0x1016e7580: NoneType object at 0x1012a4ac0>, <cell at 0x1016e7cd0: NoneType object at 0x1012a4ac0>, <cell at 0x1016e76a0: method object at 0x102a3fac0>, <cell at 0x1016e7100: str object at 0x101422cb0>, <bound method P.debug of P(name='P00002', x=0.0)>, (<cell at 0x1016e7580: NoneType object at 0x1012a4ac0>, <cell at 0x1016e7cd0: NoneType object at 0x1012a4ac0>, <cell at 0x1016e76a0: method object at 0x102a3fac0>, <cell at 0x1016e7100: str object at 0x101422cb0>), <function _m_caller.<locals>.caller at 0x102977700>, ('x',), Watcher(inst=P(name='P00002', x=0.0), cls=<class '__main__.P'>, fn=<function _m_caller.<locals>.caller at 0x102977700>, mode='args', onlychanged=True, parameter_names=('x',), what='value', queued=False, precedence=-1), {'x': {'value': [Watcher(inst=P(name='P00002', x=0.0), cls=<class '__main__.P'>, fn=<function _m_caller.<locals>.caller at 0x102977700>, mode='args', onlychanged=True, parameter_names=('x',), what='value', queued=False, precedence=-1)]}}, [Watcher(inst=P(name='P00002', x=0.0), cls=<class '__main__.P'>, fn=<function _m_caller.<locals>.caller at 0x102977700>, mode='args', onlychanged=True, parameter_names=('x',), what='value', queued=False, precedence=-1)], {'value': [Watcher(inst=P(name='P00002', x=0.0), cls=<class '__main__.P'>, fn=<function _m_caller.<locals>.caller at 0x102977700>, mode='args', onlychanged=True, parameter_names=('x',), what='value', queued=False, precedence=-1)]}]

I believe that circular reference is caused by the attribute p._param_watchers storing a watcher which itself holds a reference to the instance p.

As it stands I think that this PR doesn't make the situation worse wrt circular references.

@maximlt
Copy link
Member Author

maximlt commented Jun 20, 2023

An alternative to the private namespace implemented in this PR would be to rely on conventions, with e.g. every private attribute of Parameterized being prefixed with _param__ (e.g. _param__dynamic_watchers, _param__name_param_value, ...). I would personally also be fine with this approach, the real goals to me being to 1) make it less likely for users to override these private attributes and 2) somehow document what the private API of Parameterized is (to support 1.).

@hoxbro
Copy link
Member

hoxbro commented Jun 21, 2023

I'm in favor of separating private variables into their own namespace purely to clean up the code. I would maybe reduce the names from _param__private to __private, even though it has a higher chance of a name clash. It will make the code cleaner to look at.

If the dedicated namespace is used, I would also consider removing param for the private variable names, such as _InstancePrivate.param_watchers to _InstancePrivate.watchers.

Regarding the cyclic references for the dedicated namespace, is it possible to "hide" it in the weakref library, maybe with a property for easy access?

@maximlt
Copy link
Member Author

maximlt commented Jun 21, 2023

I would maybe reduce the names from _param__private to __private, even though it has a higher chance of a name clash.

Having a variable starting with two underscores exposes it to name mangling. This is not necessarily bad, but it'd need to be done and checked carefully as Param does some meta-stuff that maybe don't play well with it (there's code that manually mangles some names).

It will make the code cleaner to look at.

Just to make sure I understand, do you think that it would make Param's code cleaner or some other code? If it's Param's code, I'm not entirely sure as adding a private namespace adds a level of nesting which requires some more handling. It does make the overall Parameterized namespace cleaner.

Regarding the cyclic references for the dedicated namespace, is it possible to "hide" it in the weakref library, maybe with a property for easy access?

I also thought about that but then saw that #386 superseded #384 that was abandoned due to #384 (comment):

weakrefs won't pickle so a property definitely seems preferable.

@maximlt
Copy link
Member Author

maximlt commented Jun 22, 2023

I need to add that #740 added a new private attribute _ParameterizedMetaclass__renamed.

@maximlt
Copy link
Member Author

maximlt commented Jun 22, 2023

@philippjfr @jbednar I'm interested in your feedback on this PR and the various approaches suggested in the comments.

@jbednar
Copy link
Member

jbednar commented Jun 22, 2023

I am in favor of collecting the various attributes into a single private namespace; seems like a great way to clean things up! _param__private seems reasonable, or _p__private if that's too long. Is there something preventing the other private attributes listed above from also moving into that namespace?

I agree that pickling is likely to be affected and should be tested well. Unfortunately our most extensive usage of pickling is in the topographica code, which has not been updated for years and would take a lot of work to be compatible with param 2 so that we could use its tests.

I would also consider removing param for the private variable names

I agree.

@jbednar jbednar changed the title Add a dedicated namespace for private objs ob Parameterized Add a dedicated namespace for private objs on Parameterized Jun 23, 2023
@maximlt
Copy link
Member Author

maximlt commented Jun 28, 2023

Ok so compared to my original post we now have:

A:

['_param__parameters',
 '_param__private',
 '_repr_html_',  # new: HTML repr
 'name',
 'param',
 'x']

a:

['_param__parameters',
 '_param__private',
 '_repr_html_',  # new: HTML repr
 'name',
 'param',
 'x']

So Parameterized classes and instances have the same public and private attributes, all the previous private attributes plus initialized being now on the private namespace _param__private.


As now parameter values on a Parameterized instance are stored on the private name space (in a dict) and no longer in the Parameterized instance __dict__ (it was in _X_param_value), I have attempted to remove the _internal_name Parameter slot in b107092 and it passed the tests. However as that is just a potential side effect of this PR I decided to revert it, this can be done in a follow-up PR if reviewers think it's worth it.


In addition to updating the docs and docstrings, here's what I would like to do next on this PR:

  1. It's ready to be reviewed, so @philippjfr or @jbednar or @jlstevens please have a look at it :)
  2. There may be some performance implications, even if the private namespaces are classes with slots as it seems it's the fastest and cheapest in terms of memory data structure you can get. It'd be worth coming up with some benchmarks, which is anyway something I think would be worth doing before releasing Param 2 to compare it with the latest 1.x version
  3. Pickling/unpickling is a bit broken (there are a couple of open issues) and not much tested, I don't know much about pickling so it might well be that this PR made things worse. I'm planning to open a separate PR to add more pickling test and fix some easy bugs if I find any, and merge it before this one to be more confident about its impact.
  4. Panel (possibly Lumen too) is not compatible with the changes made in this PR, once it's been reviewed a first time I'd like to open a PR on Panel to add compatibility to this PR to see whether there aren't any fundamental issues, this may also show us that we need to elevate some API from private to public if they're reached often from Panel.

@jbednar
Copy link
Member

jbednar commented Jun 30, 2023

That all sounds great; thanks so much! @philippjfr or @jlstevens , please do review it carefully.

@maximlt maximlt marked this pull request as ready for review June 30, 2023 07:02
@philippjfr
Copy link
Member

philippjfr commented Jun 30, 2023

This indeed looks like great work, but it also makes me a little uneasy. One major issue here is that we never provided a public API to access watchers. This meant that Panel, Lumen and at least some user code likely started accessing the _param_watchers attribute and removing it seems overly disruptive in the near term. My general feeling here is that:

  • This PR looks great and we should merge it
  • We need to provide a public API for accessing watchers
  • We should not (yet) remove _param_watchers and give users (and ourselves) time to migrate to the public API

We did provide .param.watchers as a property but that only ever returned the class watchers (I think) and instance watchers were only made available on the _param_watchers attribute.

@maximlt
Copy link
Member Author

maximlt commented Jun 30, 2023

Thanks for your review! I opened #780 for us to have a discussion on what public API we'd like to add to access the watchers. Expecting you to chime in in there @philippjfr as the main user of the private _param_watchers ;)

I'll update this PR to keep _param_watchers.

@maximlt
Copy link
Member Author

maximlt commented Jul 11, 2023

I ran the newly added benchmark against this PR and saw no noticeable performance degradation, I'm going to merge it and soon follow up with an API to access the watchers (#780).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants