-
Notifications
You must be signed in to change notification settings - Fork 44
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[HOPSWORKS-2077] add support for reading csv files as tf data object …
…and rename feeder to tf_data (#113)
- Loading branch information
Showing
13 changed files
with
786 additions
and
371 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# Training Dataset | ||
|
||
The training dataset abstraction in Hopsworks Feature Store allows users to group a set of features (potentially from | ||
multiple different feature groups) with labels for training a model to do a particular prediction task. The training | ||
dataset is a versioned and managed dataset and is stored in HopsFS as `tfrecords`, `parquet`, `csv`, or `tsv` files. | ||
|
||
## Versioning | ||
|
||
Training Dataset can be versioned. Data Scientist should use the version to indicate to the model, as well as to the | ||
schema or the feature engineering logic of the features associated to this training dataset. | ||
|
||
## Creation | ||
|
||
To create training dataset, the user supplies a Pandas, Numpy or Spark dataframe with features and labels | ||
together with metadata. Once the training dataset has been created, the dataset is discoverable in the feature registry | ||
and users can use it to train models. | ||
|
||
{{td_create}} | ||
|
||
## Tagging Training Datasets | ||
The feature store enables users to attach tags to training dataset in order to make them discoverable across feature | ||
stores. A tag is a simple {key: value} association, providing additional information about the data, such as for | ||
example geographic origin. This is useful in an organization as it makes easier to discover for data scientists, reduces | ||
duplicated work in terms of for example data preparation. The tagging feature is only available in the enterprise version. | ||
|
||
#### Define tags that can be attached | ||
The first step is to define a set of tags that can be attached. Such as for example “Country” to tag data as being from | ||
a certain geographic location and “Sport” to further associate a type of Sport with the data. | ||
|
||
![Define tags that can be attached](../../assets/images/creating_tags.gif) | ||
|
||
#### Attach tags using the UI | ||
Tags can then be attached using the feature store UI or programmatically using the API. | ||
Attaching tags to feature group. | ||
|
||
![Attach tags using the UI](../../assets/images/attach_tags.gif) | ||
|
||
## Retrieval | ||
|
||
{{td_get}} | ||
|
||
## Properties | ||
|
||
{{td_properties}} | ||
|
||
## Methods | ||
|
||
{{td_methods}} | ||
|
||
## TFData engine | ||
|
||
{{tf_record_dataset}} | ||
|
||
{{tf_csv_dataset}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.