You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During the modeling of the database in sdv.Modeler, extensions are created for each row of the parent tables containing the parameters to model the children tables.
On sampling time, this extensions are sampled too and later the parameters extracted and used to create the models to sample the children rows.
When creating new models from the sampled parameters, sometimes the models are created with inconsistent values. So far the following have been found:
The sampled covariance matrix may not be positive-semidefinite, which is a requirement for copulas.multivaritate.GaussianMultivariate copula, which raises this warning:
sdv_mit/lib/python3.6/site-packages/copulas/multivariate/gaussian.py:199: RuntimeWarning: covariance is not positive-semidefinite.
samples = np.random.multivariate_normal(means, clean_cov, size=size)
If by any chance the sampled value for the std of the copulas.univariate.GaussianUnivariate distribution is negative or zero the value of the generated sampled will be np.nan
The text was updated successfully, but these errors were encountered:
On point 1, instead of modelling and sampling the whole covariance matrix, do it with just the lower/upper half over the diagonal, and when creating the model from sampled parameters, completing the other half using simetry over diagonal, that is:
matrix[i][j] =matrix[j][i]
For the second point, I will transform the standard deviation using the positive transformer mentioned here before modelling and reverse transform it when recreating the model.
During the modeling of the database in
sdv.Modeler
, extensions are created for each row of the parent tables containing the parameters to model the children tables.On sampling time, this extensions are sampled too and later the parameters extracted and used to create the models to sample the children rows.
When creating new models from the sampled parameters, sometimes the models are created with inconsistent values. So far the following have been found:
The sampled covariance matrix may not be positive-semidefinite, which is a requirement for
copulas.multivaritate.GaussianMultivariate
copula, which raises this warning:If by any chance the sampled value for the
std
of thecopulas.univariate.GaussianUnivariate
distribution is negative or zero the value of the generated sampled will benp.nan
The text was updated successfully, but these errors were encountered: