Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Dataprep integration #146

Closed
ganik opened this issue Jun 17, 2019 · 1 comment · Fixed by #181
Closed

Dataprep integration #146

ganik opened this issue Jun 17, 2019 · 1 comment · Fixed by #181
Assignees

Comments

@ganik
Copy link
Member

ganik commented Jun 17, 2019

pip install dataprep.
dataprep is a data preparation and cleansing package with internals in .NET CLR
We will need to do changes so the code below is possible:

from nimbusml import Pipeline
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from nimbusml.feature_extraction.text import NGramFeaturizer
 
train_data = dataflow(customer—provided--traindata)
test_data = dataflow(customer—provided-testdata)
 
pipeline = Pipeline([ # nimbusml pipeline
    NGramFeaturizer(columns={'Features': ['Text']}),
    FastTreesBinaryClassifier(feature=['Features'], label='Label')
])
 
# fit and predict
pipeline.fit(train_data)
results = pipeline.predict(test_data)
@ganik
Copy link
Member Author

ganik commented Jun 17, 2019

NimbusML Pipeline should accept dataprep dataflow() object same way it accepts FileDataStream.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant