Skip to content
This repository has been archived by the owner on Feb 23, 2023. It is now read-only.

ValueError when saving model with ARD #125

Open
ghost opened this issue Nov 6, 2017 · 7 comments
Open

ValueError when saving model with ARD #125

ghost opened this issue Nov 6, 2017 · 7 comments

Comments

@ghost
Copy link

ghost commented Nov 6, 2017

Hi! When saving an ARD=True model using the 'models_file' argument, I get a ValueError that reads:
ValueError: Wrong number of items passed 20, placement implies 4

This doesn't happen when ARD=False. My script is:

bo = GPyOpt.methods.BayesianOptimization(
    bb.group_eval,   
    domain=bounds,
    initial_design_numdata=5,
    model_type='GP',
    acquisition_type='MPI',
    normalize_Y=False,
    exact_feval=False, 
    ARD=True)

bo.run_optimization(
    n_iterations, 
    max_time,
    models_file='test.txt')

Maybe the saving function is not keeping into account the larger number of parameters when ARD=True.

@javiergonzalezh
Copy link
Member

yap, it seems like that is the issue. Do you mind having a look to it and make a PR? Should be just a check on the dimensions when saving the results.

@ghost
Copy link
Author

ghost commented Nov 7, 2017

Hi Javier! It may be trickier than that. In bo.py we have these two lines (373, 374)

header  = ['Iteration'] + self.model.get_model_parameters_names()
df_results = pd.DataFrame(results, columns = header)

The issue is in the header as it does not contain the correct number of parameter names. self.model.get_model_parameters_names() is calling a method of the GPy object.
The method returns:
['Mat52.variance', 'Mat52.lengthscale', 'Gaussian_noise.variance']
but in my case the ARD model has 19 parameters. I think the problem should be fixed at the level of the GPy library, rather finding a workaround in GPyOpt.
What do you think?

@apaleyes
Copy link
Collaborator

apaleyes commented Nov 9, 2017

Can you please post the whole output of the error, with the stack trace? That would clearly indicate where the error is coming from.

@ghost
Copy link
Author

ghost commented Nov 9, 2017

Here is the output, but as I said the error actually comes from self.model.get_model_parameters_names().

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4616                 blocks = [make_block(values=blocks[0],
-> 4617                                      placement=slice(0, len(axes[0])))]
   4618 

~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
   2951 
-> 2952     return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
   2953 

~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, values, placement, ndim, fastpath)
    119                              'implies %d' % (len(self.values),
--> 120                                              len(self.mgr_locs)))
    121 

ValueError: Wrong number of items passed 20, placement implies 4

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-14-bd3ed2b27530> in <module>()
      8     eps=tolerance,
      9     context={},
---> 10     models_file='test_save')
     11 
     12 #save_models_parameters=True,

~/Articles/qahm_gateqc/code/GPyOpt/GPyOpt/core/bo.py in run_optimization(self, max_iter, max_time, eps, context, verbosity, save_models_parameters, report_file, evaluations_file, models_file)
    157             self.save_evaluations(self.evaluations_file)
    158         if self.models_file is not None:
--> 159             self.save_models(self.models_file)
    160 
    161 

~/Articles/qahm_gateqc/code/GPyOpt/GPyOpt/core/bo.py in save_models(self, models_file)
    372 
    373         header  = ['Iteration'] + self.model.get_model_parameters_names()
--> 374         df_results = pd.DataFrame(results,columns = header)
    375         df_results.to_csv(models_file,index=False, sep='\t')

~/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    359             else:
    360                 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 361                                          copy=copy)
    362         elif isinstance(data, (list, types.GeneratorType)):
    363             if isinstance(data, types.GeneratorType):

~/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _init_ndarray(self, values, index, columns, dtype, copy)
    531             values = maybe_infer_to_datetimelike(values)
    532 
--> 533         return create_block_manager_from_blocks([values], [columns, index])
    534 
    535     @property

~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4624         blocks = [getattr(b, 'values', b) for b in blocks]
   4625         tot_items = sum(b.shape[0] for b in blocks)
-> 4626         construction_error(tot_items, blocks[0].shape[1:], axes, e)
   4627 
   4628 

~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4601         raise ValueError("Empty data passed with indices specified.")
   4602     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4603         passed, implied))
   4604 
   4605 

ValueError: Shape of passed values is (20, 3), indices imply (4, 3)

@alansaul
Copy link
Contributor

alansaul commented Nov 9, 2017

Here is a minimal example:

import GPy
from GPyOpt.methods import BayesianOptimization
import numpy as np
def f(x): return (6*x[:,0]-2)**2*np.sin(12*x[:,1]-4)
bounds = [{'name': 'var_1', 'type': 'continuous', 'domain': (0,1)}, {'name': 'var_2', 'type': 'continuous', 'domain': (0,1)}]
k = GPy.kern.Matern52(input_dim=2, ARD=True)
from GPyOpt.models import GPModel
m = GPModel(kernel=k)
myBopt = BayesianOptimization(f=f, domain=bounds,  model= m)
myBopt.run_optimization(max_iter=15)
myBopt.save_models()

I think @mozerfazer is correct in his diagnosis. The issue is that the parameters are collected and saved by GPyOpt by getting the underlying GPy models param_array. The param_array is an array of all the models parameters. For models with ARD a subset of this array will be the lengthscales of the model, unfortunately these are collected under one model parameter name 'Mat52.lengthscale'.

It doesn't appear like it will be specific to ARD models, just any model parameters that are stored as a vector.

print(m.model)

makes it clear what the issue is - you are currently unpacking 2 value of the lengthscale under one column name 'Mat52.lengthscale'.

I'm not familiar with how the models are loaded in GPyOpt, but I believe simply replacing GPModel's function as below would help:

    def get_model_parameters_names(self):
        """
        Returns a list with the names of the parameters of the model
        """
        return self.model.parameter_names_flat()

I can make a PR if that is helpful.

@apaleyes
Copy link
Collaborator

apaleyes commented Nov 9, 2017

@alansaul sounds about right. Please go for it!

@javiergonzalezh
Copy link
Member

Thanks @alansaul for noticing this! As @apaleyes says, a PR on this would be great! :)

apaleyes pushed a commit that referenced this issue Dec 24, 2017
* Updated GPModel to save full parameter name list, added simple save tests, and initialised self.model_parameters_iterations in __init__
apaleyes added a commit that referenced this issue Dec 24, 2017
* Updated GPModel to save full parameter name list, added simple save tests, and initialised self.model_parameters_iterations in __init__
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants