plugins for structured data and relative models #566

rickycao-qy · 2020-09-08T03:26:46Z

Currently, pipcook official-plugins have supported various types of tasks in CV and NLP area. The data in CV are normally pictures (png, jpg...) and will be texts in NLP.

However, there are still a lot of tasks that are closely related to front-end that will use structured data. For example, the front-end can collect much information about user's behaviours and fetch information about user's attrubutes. These structued data can be used in the recommendation tasks and regression tasks.

For structured-data process, we have found a good library built on top of js: Danfo.js. We could seek the chance to work closely with Danfo.js team to handle relational or labeled data.

Accordingly, it's good to have some machine learning models that are fast and accurate, like GBDT, SVM.

This issue is to suggest to have plugins:

data-collect / data-access
- csv data collect
- text data collect
data-process
- missing-value process
- normalization
- Outliers process
model
- GBDT
- SVM

Welcome for more opinions

risenW · 2020-09-08T08:51:48Z

Looks great, just went through it. Danfojs pretty much has the capability to solve 1 and 2. As we can easily port it. But for modeling part, we'd have to design that from scratch.

Overall it's doable

yorkie · 2020-09-08T10:26:27Z

@risenW Thank you for this review and do you mind working on the modeling part with us together?

risenW · 2020-09-08T10:56:49Z

Yes, sure. We can work on it together. Should be fun implementing it JS 😊

yorkie · 2020-09-08T11:06:28Z

That's great @risenW! And @rickycao-qy shall we have more implementation details about this issue?

rickycao-qy · 2020-09-09T02:47:45Z

@risenW Recommendation system has been heavily used in eCommerce industry companies. One of the traditional tasks is CTR prediction. We can start with some open-source dataset, like this one. For the model, we can start with GBDT+LR model to do this task. I remember @WenheLI has already developed a JS-version GBDT. Maybe we can make it easy to transfer this to our plugin to see if it meets our requirments. @WenheLI could you provide more details

WenheLI · 2020-09-11T02:44:21Z

@risenW Any idea on implementing the models in JS?
I was thinking of exporting the wasm code directly from c++/rust implementation. In this way, we can have a good performance & less workload.

risenW · 2020-09-11T06:06:50Z

I think that's a great idea as well. We would still write JS wrappers to call methods right?

WenheLI · 2020-09-11T09:04:33Z

I think that's a great idea as well. We would still write JS wrappers to call methods right?

Yep, we still need to write the wrapper in js. But it is trivial compared with writing the whole logic in JS. Do you happen to know any library/implementation about SVM or GBDT written in c/rust/go?

WenheLI · 2020-09-13T03:58:53Z

Just made an MVP SVM-wasm export based on libsvm.
https://github.com/WenheLI/libsvm-wasm
@risenW Any suggestions?

risenW · 2020-09-13T06:15:50Z

Just made an MVP SVM-wasm export based on libsvm.
https://github.com/WenheLI/libsvm-wasm
@risenW Any suggestions?

Cool, I'll check it out later today. Also as regards GBDT, I think the popular Xgboost and Lightgbm are both written in C, and then wrappers were written on top of that. We could try exporting the core module

risenW · 2020-09-14T06:48:46Z

Just made an MVP SVM-wasm export based on libsvm.
https://github.com/WenheLI/libsvm-wasm
@risenW Any suggestions?

So I'm trying to test out this package, I'm getting an error when compiling to Js. I'm also confused about this line:
import * as module from '../dist/libsvm'

because I can't find the module you're importing.

WenheLI · 2020-09-14T07:06:34Z

https://github.com/WenheLI/libsvm-wasm

You need to build it first.
Run the following command under the root folder.

make .

WenheLI · 2020-09-14T07:21:39Z

Just made an MVP SVM-wasm export based on libsvm.
https://github.com/WenheLI/libsvm-wasm
@risenW Any suggestions?

So I'm trying to test out this package, I'm getting an error when compiling to Js. I'm also confused about this line:
import * as module from '../dist/libsvm'

because I can't find the module you're importing.

@risenW I will update detailed documentation on building later.

risenW · 2020-09-14T07:27:35Z

https://github.com/WenheLI/libsvm-wasm

You need to build it first.
Run the following command under the root folder.
make .

Now I'm getting the output nothing to be done for

WenheLI · 2020-09-14T10:09:19Z

@risenW This is the full command:

 git submodule update
make

Be sure to install emscripten before running make

risenW · 2020-09-26T08:12:45Z

Hi @WenheLI ,

Have you seen this repo. It looks useful for what we intend to do https://github.com/nok/sklearn-porter

yorkie · 2020-09-26T15:05:18Z

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

@risenW It might be translating the trained models to executable by C/Java/JavaScript, developers have to use Python to write training scripts. However @WenheLI's libsvm exports train/inference abilities to Web developers.

By the way, we could also use boa to create JavaScript APIs by scikit-learn package, and use sklearn-porter to convert trained model to an executable for JavaScript runtimes, just like what @WenheLI have done at #582, which uses boa to call tensorflow/pytorch to train models out, and generating wasm format executables via TVM.

yorkie · 2020-09-26T15:10:10Z

See https://github.com/nok/sklearn-porter/blob/stable/examples/estimator/classifier/SVC/js/basics.pct.ipynb, it seems to generate pure JavaScript, which should be compatible with #582, that sounds really good but I'm also considering the performance :)

risenW · 2020-09-26T16:15:13Z

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

@risenW It might be translating the trained models to executable by C/Java/JavaScript, developers have to use Python to write training scripts. However @WenheLI's libsvm exports train/inference abilities to Web developers.

By the way, we could also use boa to create JavaScript APIs by scikit-learn package, and use sklearn-porter to convert trained model to an executable for JavaScript runtimes, just like what @WenheLI have done at #582, which uses boa to call tensorflow/pytorch to train models out, and generating wasm format executables via TVM.

Oh I get it now. Thanks for the clarification. Will definitely take a look at it.

rickycao-qy self-assigned this Sep 8, 2020

rickycao-qy added the model Machine learning model related issues and discussions label Sep 8, 2020

FeelyChau mentioned this issue Sep 8, 2020

v1.3.0 Roadmap #557

Closed

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plugins for structured data and relative models #566

plugins for structured data and relative models #566

rickycao-qy commented Sep 8, 2020

risenW commented Sep 8, 2020

yorkie commented Sep 8, 2020

risenW commented Sep 8, 2020

yorkie commented Sep 8, 2020

rickycao-qy commented Sep 9, 2020

WenheLI commented Sep 11, 2020

risenW commented Sep 11, 2020

WenheLI commented Sep 11, 2020

WenheLI commented Sep 13, 2020

risenW commented Sep 13, 2020

risenW commented Sep 14, 2020

WenheLI commented Sep 14, 2020

WenheLI commented Sep 14, 2020 •

edited

Loading

risenW commented Sep 14, 2020

WenheLI commented Sep 14, 2020

risenW commented Sep 26, 2020

yorkie commented Sep 26, 2020

yorkie commented Sep 26, 2020

risenW commented Sep 26, 2020

plugins for structured data and relative models #566

plugins for structured data and relative models #566

Comments

rickycao-qy commented Sep 8, 2020

risenW commented Sep 8, 2020

yorkie commented Sep 8, 2020

risenW commented Sep 8, 2020

yorkie commented Sep 8, 2020

rickycao-qy commented Sep 9, 2020

WenheLI commented Sep 11, 2020

risenW commented Sep 11, 2020

WenheLI commented Sep 11, 2020

WenheLI commented Sep 13, 2020

risenW commented Sep 13, 2020

risenW commented Sep 14, 2020

WenheLI commented Sep 14, 2020

WenheLI commented Sep 14, 2020 • edited Loading

risenW commented Sep 14, 2020

WenheLI commented Sep 14, 2020

risenW commented Sep 26, 2020

yorkie commented Sep 26, 2020

yorkie commented Sep 26, 2020

risenW commented Sep 26, 2020

WenheLI commented Sep 14, 2020 •

edited

Loading