Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with zero values in computing DENSITY bins #25

Open
yoid2000 opened this issue Feb 7, 2024 · 0 comments
Open

Bug with zero values in computing DENSITY bins #25

yoid2000 opened this issue Feb 7, 2024 · 0 comments

Comments

@yoid2000
Copy link

yoid2000 commented Feb 7, 2024

There is a bug whereby sdnist crashes if there are zero values in the synthetic data for DENSITY.

The problem is that, although you bottom code DENSITY to zero here:

d.loc[d['DENSITY'] < 0, 'DENSITY'] = float(0)

The left-most bin boundary of 0.0 is not inclusive at the pd.cut() function:

if update:
d['DENSITY'] = pd.cut(d['DENSITY'], bins=bins, labels=labels)
return d
else:
d['binned_density'] = pd.cut(d['DENSITY'], bins=bins, labels=labels)

As a result, any 0.0 DENSITY values in the synthetic data are set to NaN, and then later in transform.py when the values are set to numeric:

data[c] = pd.to_numeric(data[c]).astype(int)

The NaN values cause a crash.

The solution is to make the left-most bin boundary inclusive:

    if update:
        d['DENSITY'] = pd.cut(d['DENSITY'], bins=bins, labels=labels, include_lowest=True)
        return d
    else:
        d['binned_density'] = pd.cut(d['DENSITY'], bins=bins, labels=labels, include_lowest=True)
@yoid2000 yoid2000 closed this as completed Feb 7, 2024
@yoid2000 yoid2000 changed the title Bug with outlier values in computing DENSITY bins Bug with zero values in computing DENSITY bins Feb 7, 2024
@yoid2000 yoid2000 reopened this Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant