You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My reasoning is that without binning, creating marginals via df.groupby leads to unique points for any marginal that includes a float value: e.g., records (0, 1, 0.05) and (0, 1, 0.05000001) would be considered distinct and count as errors in synthetic data, decreasing the metric value. This seems undesirable to me since the probability of getting exactly equal float values is vanishingly small. Binning the float values would address this issue.
I would appreciate if if you could help me better understand this matter.
The text was updated successfully, but these errors were encountered:
Hi,
I'm interested in the K-marginal score. In the implementation (https://github.com/usnistgov/SDNist/blob/main/sdnist/metrics/kmarginal.py), I'm not seeing that the float values are binned. Are they binned in some other place in the code, or not at all? If they are not binned anywhere, why is that?
My reasoning is that without binning, creating marginals via df.groupby leads to unique points for any marginal that includes a float value: e.g., records (0, 1, 0.05) and (0, 1, 0.05000001) would be considered distinct and count as errors in synthetic data, decreasing the metric value. This seems undesirable to me since the probability of getting exactly equal float values is vanishingly small. Binning the float values would address this issue.
I would appreciate if if you could help me better understand this matter.
The text was updated successfully, but these errors were encountered: