-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add number of columns to native data iterator. #5202
Conversation
660f545
to
b43adc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Last time I looked at the NativeDataIter I had the feeling that it uses a lot of extra memory.
We do something similar on dask, but delegate the task to np/pd. |
Pls don't merge. |
Will wait for the corresponding jvm PR and merge them together. |
fa3282b
to
9131210
Compare
* Change native data iter into an adapter.
9131210
to
d0f0222
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, it seems to fit the adapter pattern well and now we have more consistency in data construction.
This helps mitigating imputing sparse dataset from JVM package, where some columns might be completely missing. @wbo4958 will submit a follow up PR on JVM side.
For consistency and simplicity,
NativeDataIter
is turned into an adapter.As a new data field is added, we break the ABI.