Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DataTable/DataView #141

Closed
aspcompiler opened this issue May 14, 2018 · 6 comments
Closed

Support DataTable/DataView #141

aspcompiler opened this issue May 14, 2018 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@aspcompiler
Copy link

Currently, we have to create a model that maps to the data. It is difficult to use it in a scripting language like F# interactive or Powershell. We will be able use scripting language if there is a general data container such as DataFrame in numpy. DataTable/DataView is the closest thing in .net.

Machine learning requires a lot of experiments. Scripting language will make it much easier. Otherwise we have to keep recompiling and rerun the steps.

@isaacabraham
Copy link

Linked to #91 . I would say try to avoid data table / view - Accord supports them but they're a pain to work with really. Rather just an sequence of any generic type with either some mapper functions to highlight labels and features or attributes etc.. Or arrays / lists of numbers.

@veikkoeeva
Copy link
Contributor

@isaac2004 #38 (comment) looks like what you're suggesting also. There seems to be a bit of a problem in with locales at least when reading and writing data. Also at #109 (comment). I might have misunderstood something, but would like to highlight this case from another perspective too.

@shauheen shauheen added the enhancement New feature or request label May 14, 2018
@aspcompiler
Copy link
Author

When we work with data, we do a lot of feature engineering, i.e., encode/transform data. It is possible to use mapper functions to describe those. Without a generic data structure to hold on the intermediate results, encoding/transformation would have to be lazy evaluated every time.

Those intermediate data structure cannot be internal. In numpy, we use .head(), .tail(), .info(), .decribe() to look at the data all the time.

@veikkoeeva
Copy link
Contributor

@aspcompiler Sounds like LINQ, or F#'s pipe operator (or streams: https://nessos.github.io/Streams/ and an interesting take with GPUs: https://devblogs.nvidia.com/jet-gpu-powered-fulfillment/). It'd be nice to compose streams.

@TomFinley
Copy link
Contributor

I see that I've been assigned to this! For that reason I would like to understand this issue a bit more, since there are one or two points that are not clear to me.

@aspcompiler if I were to rephrase your original concern about "it is difficult to use", do you mean the API in Microsoft.ML specifically, that is, as currently phrased it is not easy to use from F# or Powershell? A worked example on the current problems you are facing would help me enormously in appreciating the problem and, hopefully, help me and perhaps others imagine a good solution to the problem. I guess some conveniences to casually inspect data more readily than opening cursors, etc., is primarily what's being asked for, or do I misunderstand?

@isaacabraham I wonder if the change under consideration in PR 106 helps, at least as a convenience to map some List<T> or IEnumerable<T> of records T into the pipeline.

@codemzs
Copy link
Member

codemzs commented Jun 30, 2019

Please consider using python bindings for ML.NET - NimbusML.

@codemzs codemzs closed this as completed Jun 30, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants