Add number of columns to native data iterator. #5202

trivialfis · 2020-01-14T08:56:38Z

This helps mitigating imputing sparse dataset from JVM package, where some columns might be completely missing. @wbo4958 will submit a follow up PR on JVM side.

For consistency and simplicity, NativeDataIter is turned into an adapter.

As a new data field is added, we break the ABI.

trivialfis · 2020-01-14T08:57:16Z

@CodingCat

RAMitchell

LGTM.

Last time I looked at the NativeDataIter I had the feeling that it uses a lot of extra memory.

trivialfis · 2020-01-15T02:51:24Z

We do something similar on dask, but delegate the task to np/pd.

trivialfis · 2020-01-15T05:32:26Z

Pls don't merge.

trivialfis · 2020-01-16T17:36:52Z

Will wait for the corresponding jvm PR and merge them together.

* Change native data iter into an adapter.

RAMitchell

Looks good, it seems to fit the adapter pattern well and now we have more consistency in data construction.

src/data/adapter.h

trivialfis requested a review from RAMitchell January 14, 2020 08:57

trivialfis force-pushed the jvm-get-num-columns branch from 660f545 to b43adc3 Compare January 14, 2020 09:22

RAMitchell approved these changes Jan 15, 2020

View reviewed changes

trivialfis changed the title ~~Add number of columns to native data iterator.~~ [WIP] Add number of columns to native data iterator. Jan 15, 2020

trivialfis force-pushed the jvm-get-num-columns branch from fa3282b to 9131210 Compare February 11, 2020 07:59

wbo4958 mentioned this pull request Feb 12, 2020

[jvm-packages]add feature size for LabelPoint and DataBatch #5303

Merged

hcho3 mentioned this pull request Feb 21, 2020

[Roadmap] 1.1.0 Roadmap #5337

Closed

12 tasks

Add number of columns to native data iterator.

d0f0222

* Change native data iter into an adapter.

trivialfis force-pushed the jvm-get-num-columns branch from 9131210 to d0f0222 Compare February 23, 2020 19:50

trivialfis changed the title ~~[WIP] Add number of columns to native data iterator.~~ Add number of columns to native data iterator. Feb 23, 2020

RAMitchell approved these changes Feb 24, 2020

View reviewed changes

RAMitchell reviewed Feb 24, 2020

View reviewed changes

src/data/adapter.h Show resolved Hide resolved

trivialfis merged commit f2b8cd2 into dmlc:master Feb 25, 2020

trivialfis deleted the jvm-get-num-columns branch February 25, 2020 15:42

lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add number of columns to native data iterator. #5202

Add number of columns to native data iterator. #5202

trivialfis commented Jan 14, 2020 •

edited

Loading

trivialfis commented Jan 14, 2020

RAMitchell left a comment

trivialfis commented Jan 15, 2020

trivialfis commented Jan 15, 2020

trivialfis commented Jan 16, 2020

RAMitchell left a comment

Add number of columns to native data iterator. #5202

Add number of columns to native data iterator. #5202

Conversation

trivialfis commented Jan 14, 2020 • edited Loading

trivialfis commented Jan 14, 2020

RAMitchell left a comment

Choose a reason for hiding this comment

trivialfis commented Jan 15, 2020

trivialfis commented Jan 15, 2020

trivialfis commented Jan 16, 2020

RAMitchell left a comment

Choose a reason for hiding this comment

trivialfis commented Jan 14, 2020 •

edited

Loading