You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please indicate the following details about the environment in which you found the bug:
RDT version: 1.1.0
Python version: 3.9
Operating System: Colab Notebook
Error Description
When there is added noise, the observed composition identity of this transformer is False even though it is listed as True. This is causing some problems with conditional sampling in the SDV
There seem to be 2 related issues:
The forward transform can noise the values outside of the allowable range for a category, and
Some the reverse transformed values are not following the intervals
Steps to reproduce
To replicate, download and use the student_placements dataset.
Observe that M is supposed to be mapped to the interval (0, 0.6465116279069767). Sometimes the forward transform is mapping outside that range (eg. 0.655336)
Reverse Transform
Observe that M is supposed to be mapped to the interval (0, 0.6465116279069767). But some values inside it -- like 0.606273 -- are reversed transformed to F.
This is likely due to this line -- we are taking the diff of the value with the average from that category and choosing the min distance. This doesn't make sense when it's noised. We should instead be checking to see if each value is within the correct interval.
The text was updated successfully, but these errors were encountered:
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
When there is added noise, the observed composition identity of this transformer is
False
even though it is listed asTrue
. This is causing some problems with conditional sampling in the SDVThere seem to be 2 related issues:
Steps to reproduce
To replicate, download and use the
student_placements
dataset.Observe that the original data and the reverse transformed data do not have the same values for two of the rows
Forward Transform
Observe that
M
is supposed to be mapped to the interval (0, 0.6465116279069767). Sometimes the forward transform is mapping outside that range (eg. 0.655336)Reverse Transform
Observe that
M
is supposed to be mapped to the interval (0, 0.6465116279069767). But some values inside it -- like 0.606273 -- are reversed transformed toF
.This is likely due to this line -- we are taking the diff of the value with the average from that category and choosing the min distance. This doesn't make sense when it's noised. We should instead be checking to see if each value is within the correct interval.
The text was updated successfully, but these errors were encountered: