Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating hyperparameter constraints during optimisation & hyperparameter dependence #814

Open
T-Flet opened this issue Feb 15, 2020 · 0 comments

Comments

@T-Flet
Copy link

T-Flet commented Feb 15, 2020

  • Is updating hyperparameter constraints during optimisation an intended possibility within the GPy framework (in particular based on the data X and other hyperparameters)?

Example situation:

A kernel whose parameters include a location and a width, with constraints that both location and location + width should be contained within the data range;
in this case one would write something like the following in some kernel method which has X in scope (e.g. update_gradients_full):

if self.data_range is None: # This need only run once
    self.data_range = (X.min(), X.max())
    self.location.constrain_bounded(*self.data_range)
max_width = self.data_range[1] - self.location
if max_width <= 0: max_width = self.data_range[1] - self.data_range[0]
self.width.constrain_bounded(0, max_width) # While this needs to be adjusted every time

However this approach fails since any call to constrain_bounded or related constrain methods immediately triggers a call to parameters_changed, somehow generating an infinite loop.

(NOTE: The max_width parts are necessary because SPOLIER the location constraint does not seem to be very constraining; perhaps connected to this is the fact that sometimes the parameters just become nan and measures need to be taken to reset them to default values)

A very-bad-form work-around to this is instead to re-declare the parameters with the constraint already in place (it seems that the parameters do remain linked, as adding unlinking and relinking before and after each re-declaration does not result in different behaviour):

if self.data_range is None: # This need only run once
    self.data_range = (X.min(), X.max())
    self.location = Param('location', self.location, Logistic(*self.data_range))
max_width = self.data_range[1] - self.location
if max_width <= 0: max_width = self.data_range[1] - self.data_range[0]
self.width = Param('width', self.width, Logistic(0, max_width)) # While this needs to be adjusted every time

Be it from the method's brutality or GPy's optimising process' details, although the resulting models do acknowledge the appropriate constraints, the fit values are often unchanged from the starting ones or, even worse, ARE able to violate said constraints, e.g.:

KERNEL_NAME  |                                    value  |                        constraints  |  priors
location             |        -134.61606296857497   |                          -40.0,40.0  |        
width                 |  4.5964025037460997e-07   |  0.0,174.61606296857497  |

  • Is there a way to properly implement these after-__init__ and evolving constraints?

Separately but relatedly

  • Is there a better way to optimise dependent hyperparameters such as width?

Alternating optimising one and the other every round comes to mind.

Further related detail

The purpose of using width to begin with is to model a 2nd 'end-location' with an in-built dynamic constrain of being greater than the 1st; the 2-location version would then be the preferred implementation if the aforementioned dynamic constraints were an intended feature of GPy.

There is however one difference between these versions w.r.t. optimisation, i.e. the location/width version requires an additional fix in order to 'correctly' update the width:
after the parameters have been updated, width should be shifted by the opposite of the location change in that round in order to preserve the implied 2nd location (or rather to move the 2nd location based solely on width's gradient, except of course if the 1st location moved past it outright, where one would just set width close to 0).

So, if an old_location parameter is saved in update_gradients_full then one would write something like the following in parameters_changed:

location_diff = self.location - self.old_location
self.width = 1e-10 if location_diff > self.width else self.width - location_diff

However, this generates a worse infinite loop than earlier since both this assignment to width and re-declaring it with Param using a new value trigger a nested call to parameters_changed.

  • Is there a way to update a parameter WITHOUT triggering parameters_changed (at least when already within it)?
T-Flet added a commit to T-Flet/GPy-ABCD that referenced this issue Mar 2, 2020
opened a related issue on GPy: SheffieldML/GPy#814;
on hold for the moment, as is introducing stand-alone base-sigmoidal kernels into the mix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant