Hopsworks Feature Store API

HSFS is the new library to interact with the Hopsworks Feature Store. The library makes creating new features, feature groups and training datasets easier.

The library can be used in two modes:

Spark mode : For data engineering jobs that create and write features into the feature store or generate training datasets. It requires a Spark environment such as the one provided in the Hopsworks platform or Databricks. In Spark mode, HSFS provides binding both for Python and JVM languages.
Python mode : For data science jobs to explore the features available in the feature store, generate training datasets and feed them in a training pipeline. Python mode requires just a Python interpreter and can be used both in Hopsworks from Python Jobs/Jupyter Kernels, Amazon SageMaker, KubeFlow.

The library automatically configures itself based on the environment it is run.

You can read more about the Hopsworks Feature Store and its concepts here

Getting Started

Instantiate a connection and get the project feature store handler

import hsfs

connection = hsfs.connection()
fs = connection.get_feature_store()

Create a new feature group

fg = fs.create_feature_group("rain", 
                        version=1,
                        description="Rain features",
                        primary_key=['date', 'location_id'],
                        online_enabled=True)

fg.save(dataframe)

Join features together

feature_join = rain_fg.select_all()
                    .join(temperature_fg.select_all(), ["date", "location_id"])
                    .join(location_fg.select_all()))

feature_join.show(5)

Use the query object to create a training dataset:

td = fs.create_training_dataset("training_dataset",
                           version=1,
                           data_format="tfrecords",
                           description="A test training dataset saved in TfRecords format",
                           splits={'train': 0.7, 'test': 0.2, 'validate': 0.1})

td.save(feature_join)

Feed the training dataset to a TensorFlow model:

train_input_feeder = training_dataset.feed(target_name='label',split='train', is_training=True)
train_input = train_input_feeder.tf_record_dataset()

You can find more examples on how to use the library in our hops-examples repository.

Issues

Please report any issue using Github issue tracking

Name	Name	Last commit message	Last commit date
Latest commit SirOibaf Fix versions to match Hopsworks releases (logicalclocks#72 ) Aug 6, 2020 ac4df84 · Aug 6, 2020 History 45 Commits
java	java	Fix versions to match Hopsworks releases (logicalclocks#72 )	Aug 6, 2020
python	python	Fix versions to match Hopsworks releases (logicalclocks#72 )	Aug 6, 2020
.gitignore	.gitignore	Feed model engine (logicalclocks#1 ) (logicalclocks#62 )	Jul 28, 2020
CONTRIBUTING.rst	CONTRIBUTING.rst	Add tag api methods to training dataset plus unify documentation (log…	Jul 15, 2020
LICENSE	LICENSE	Rename library to hsfs and add LICENSE (logicalclocks#32 )	Apr 24, 2020
README.md	README.md	Fix versions to match Hopsworks releases (logicalclocks#72 )	Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hopsworks Feature Store API

Getting Started

Issues

About

Releases

Packages

Languages

License

SirOibaf/feature-store-api

Folders and files

Latest commit

History

Repository files navigation

Hopsworks Feature Store API

Getting Started

Issues

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages