Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

Sketch.num_na() is broken for float #1759

Closed
guyinbar opened this issue Apr 22, 2019 · 2 comments · Fixed by #2579
Closed

Sketch.num_na() is broken for float #1759

guyinbar opened this issue Apr 22, 2019 · 2 comments · Fixed by #2579

Comments

@guyinbar
Copy link

guyinbar commented Apr 22, 2019

According to documentation, Sketch.num_na() should return the exact number of missing values (https://apple.github.io/turicreate/docs/api/generated/turicreate.Sketch.html#turicreate-sketch), however, when dtype is float, it always returns 0.

sf = tc.SFrame({'A':[1.0, 2.0] + [float('nan')]*100})
print sf['A'].summary().num_na() 
print len(sf['A']) - len(sf['A'].dropna()) 
sf = tc.util.generate_random_sframe(10000, 'R', random_seed=11)
print sf['X1-R'].dtype
print sf['X1-R'].summary().num_na() 
print len(sf) - len(sf.dropna())
print sum(sf['X1-R'].apply(lambda x : math.isnan(x)))
@guyinbar guyinbar changed the title sketch does not provide accurate results for num_na() sketch.num_na() is broken for float Apr 22, 2019
@guyinbar guyinbar changed the title sketch.num_na() is broken for float Sketch.num_na() is broken for float Apr 22, 2019
@hoytak
Copy link
Collaborator

hoytak commented Apr 24, 2019

The name of this should be changed. A float('nan') is not a missing value; that would be a python None. dropna() and num_na() are not consistent in that they handle NAN differently.

@hoytak
Copy link
Collaborator

hoytak commented Apr 24, 2019

Let's change num_na to be num_missing.

@znation znation added the engine label Apr 25, 2019
@syoutsey syoutsey added the p2 label Aug 15, 2019
@syoutsey syoutsey added this to the 6.0 milestone Aug 15, 2019
@TobyRoseman TobyRoseman self-assigned this Nov 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants