Skip to content

Commit

Permalink
Add a Sandbox for Feathr (feathr-ai#966)
Browse files Browse the repository at this point in the history
* Update registry-access-control.md

* Update README.md

* add logo

* Update README.md

* Add docs for how to create bacpac file

* update dockerfile

* update

* Update local_quickstart_nyc_taxi_demo.ipynb

* Update FeathrSandbox.Dockerfile

* add SQLIte connection

* Update local_quickstart_nyc_taxi_demo.ipynb

* update local registry

* update registry

* update

* add dockerfile

* Change to ORM

* Update db_registry.py

* update registry

* delete unused files

* don't change the existing registry code

* update

* Update main.py

* update configs

* make jupyter runnable

* add readme

* Update start.sh

* Revert "Add docs for how to create bacpac file"

This reverts commit 2837926.

* delete unused files

* Update local_quickstart_nyc_taxi_demo.ipynb

* Update local_quickstart_nyc_taxi_demo.ipynb

* Fix redis issues

* Update client.py

* Update _env_config_reader.py

* add docs

* Update quickstart_local_sandbox.md

* Update quickstart_local_sandbox.md

* Update quickstart_local_sandbox.md

* Update quickstart_local_sandbox.md

* merge ORM based sql registry to sql registry

* fix typo

* improve usability

* Update FeathrSandbox.Dockerfile

* Update FeathrSandbox.Dockerfile

* Update start_local.sh

* Update FeathrSandbox.Dockerfile

* update instructions

* Add code server

* Remove unused dockerfile

* disable code server

* update samples

* Update feathr_init_script.py

* update notebook

* Update FeathrSandbox.Dockerfile

* Update local_quickstart_notebook.ipynb

* Update _feathr_registry_client.py

* Update setup.py

* remove numpy

* Update quickstart_local_sandbox.md

* Update quickstart_local_sandbox.md

* Add search function in sandbox

* Update db_registry_orm.py

* Update db_registry_orm.py

* Update db_registry_orm.py

* fix search issue

* udpate

* Update FeathrSandbox.Dockerfile

* update

* Update feathr_init_script.py

* merge ORM based registry

* Merge

* Update main.py

* Delete db_registry_orm.py

* update dependencies

* Update .prettierrc

* update docs

* Update database.py

* Update database.py

* Update database.py

* Add CI docker push

* Optimize image size

* Update local_quickstart_notebook.ipynb

* Update start_local.sh

* update based on comments
  • Loading branch information
xiaoyongzhu authored Jan 17, 2023
1 parent ae752c5 commit 290ceb3
Show file tree
Hide file tree
Showing 25 changed files with 2,004 additions and 205 deletions.
28 changes: 28 additions & 0 deletions .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,34 @@ jobs:
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}


build_and_push_feathr_sandbox_image:
name: Push Feathr Sandbox image to Docker Hub
runs-on: ubuntu-latest
steps:
- name: Check out the repo
uses: actions/checkout@v3

- name: Log in to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: feathrfeaturestore/feathr-sandbox

- name: Build and push Docker image
uses: docker/build-push-action@v3
with:
context: .
file: FeathrSandbox.Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
# Trigger Azure Web App webhooks to pull the latest nightly image
deploy:
runs-on: ubuntu-latest
Expand Down
87 changes: 87 additions & 0 deletions FeathrSandbox.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# TODO: persist the SQLite file in the volumes

# Stage 1: build frontend ui
FROM node:16-alpine as ui-build
WORKDIR /usr/src/ui
COPY ./ui .

## Use api endpoint from same host and build production static bundle
RUN echo 'REACT_APP_API_ENDPOINT=http://localhost:8000' >> .env.production
RUN npm install && npm run build


FROM jupyter/pyspark-notebook

USER root

## Install dependencies
RUN apt-get update -y && apt-get install -y nginx freetds-dev sqlite3 libsqlite3-dev lsb-release redis gnupg redis-server lsof

# UI Sectioin
## Remove default nginx index page and copy ui static bundle files
RUN rm -rf /usr/share/nginx/html/*
COPY --from=ui-build /usr/src/ui/build /usr/share/nginx/html
COPY ./deploy/nginx.conf /etc/nginx/nginx.conf


# Feathr Package Installation Section
# always install feathr from main
WORKDIR /home/jovyan/work
COPY --chown=1000:100 ./feathr_project ./feathr_project
RUN python -m pip install -e ./feathr_project


# Registry Section
# install registry
COPY ./registry /usr/src/registry
WORKDIR /usr/src/registry/sql-registry
RUN pip install -r requirements.txt



## Start service and then start nginx
WORKDIR /usr/src/registry
COPY ./feathr-sandbox/start_local.sh /usr/src/registry/

# install code server
# RUN curl -fsSL https://code-server.dev/install.sh | sh

# default dir by the jupyter image
WORKDIR /home/jovyan/work
USER jovyan
# copy as the jovyan user
# UID is like this: uid=1000(jovyan) gid=100(users) groups=100(users)
COPY --chown=1000:100 ./docs/samples/local_quickstart_notebook.ipynb .
COPY --chown=1000:100 ./feathr-sandbox/feathr_init_script.py .

# Run the script so that maven cache can be added for better experience. Otherwise users might have to wait for some time for the maven cache to be ready.
RUN python feathr_init_script.py
RUN python -m pip install interpret

USER root
WORKDIR /usr/src/registry
RUN ["chmod", "+x", "/usr/src/registry/start_local.sh"]

# remove ^M chars in Linux to make sure the script can run
RUN sed -i "s/\r//g" /usr/src/registry/start_local.sh


# install a Kafka single node instance
# Reference: https://www.looklinux.com/how-to-install-apache-kafka-single-node-on-ubuntu/
RUN wget https://downloads.apache.org/kafka/3.3.1/kafka_2.12-3.3.1.tgz && tar xzf kafka_2.12-3.3.1.tgz && mv kafka_2.12-3.3.1 /usr/local/kafka && rm kafka_2.12-3.3.1.tgz

# /usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
# /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties

WORKDIR /home/jovyan/work


# 80: Feathr UI
# 8000: Feathr REST API
# 8888: Jupyter
# 8080: VsCode
# 7080: Interpret
EXPOSE 80 8000 8080 8888 7080 2181
# run the service so we can initialize
# RUN ["/bin/bash", "/usr/src/registry/start.sh"]
CMD ["/bin/bash", "/usr/src/registry/start_local.sh"]
35 changes: 0 additions & 35 deletions docker/Dockerfile

This file was deleted.

39 changes: 0 additions & 39 deletions docker/supervisord.conf

This file was deleted.

8 changes: 7 additions & 1 deletion docs/how-to-guides/feathr-configuration-and-env.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Feathr will get the configurations in the following order:
2. If it's not set in the environment, then a value is retrieved from the feathr_config.yaml file with the same config key.
3. If it's not available in the feathr_config.yaml file, Feathr will try to retrieve the value from a key vault service. Currently only Azure Key Vault is supported.

# A list of environment variables that Feathr uses
# A list of environment variables that Feathr uses when running Spark job

| Environment Variable | Description | Required? |
| ----------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
Expand Down Expand Up @@ -85,6 +85,12 @@ Feathr will get the configurations in the following order:
| FEATURE_REGISTRY__PURVIEW__TYPE_SYSTEM_INITIALIZATION (Deprecated Soon) | Controls whether the type system (think this as the "schema" for the registry) will be initialized or not. Usually this is only required to be set to `True` to initialize schema, and then you can set it to `False` to shorten the initialization time. | Required if using Purview directly without registry service. Deprecate soon, see [here](#deprecation) for more details. |
| MAVEN_ARTIFACT_VERSION | Version number like `0.9.0`. Used to define maven package version when main jar is not defined. | Optional |

# A list of environment variables that Feathr uses when running registry and service
| Environment Variable | Description | Required? |
| ----------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| FEATHR_SANDBOX | If it is set to any value, the registry server will be running in sandbox mode and will connect to a local database with SQLite. | Optional |
| FEATHR_SANDBOX_REGISTRY_URL | If it's set, Feathr will be using a registry file pointed by the user. This is useful when users want to persist the SQLite file to a volume, so it won't lost if you restart docker constantly. | Optional |

# Explanation for selected configurations

## MAVEN_ARTIFACT_VERSION
Expand Down
Binary file added docs/images/feathr-sandbox-dev-experience.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/feathr-sandbox-lineage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/feathr-sandbox-ui.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/feathr-sandbox.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
85 changes: 85 additions & 0 deletions docs/quickstart_local_sandbox.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
layout: default
title: Quick Start Guide with Local Sandbox
---

# Feathr Quick Start Guide with Local Sandbox

We provide a local sandbox so users can use Feathr easily. The goal of the Feathr Sandbox is to:

- make it easier for users to get started,
- make it easy to validate feature definitions and new ideas
- make it easier for Feathr developers to setup environment and develop new things
- Interactive experience, usually try to run a job takes less than 1 min.

As an end user, you can become productive in less than 5 mins and try out Feathr.

The Sandbox is ideal for:

- Feathr users who want to get started quickly
- Feathr developers to test new features since this docker should everything they need. It comes with the python package as editable model so developers can iterate easily.

## Getting Started

To get started, simply run the command below. Note that the image is around 5GB so it might take a while to pull it from DockerHub.

```bash
# 80: Feathr UI 8000: Feathr API 8888: Jupyter 8080: VsCode 7080: Interpret
docker run -it --rm -p 8888:8888 -p 8000:8000 -p 80:80 -p 8080:8080 -p 7080:7080 --env CONNECTION_STR="Server=" --env API_BASE="api/v1" --env FEATHR_SANDBOX=True -e GRANT_SUDO=yes feathrfeaturestore/feathr-sandbox
```

It should pop up a Jupyter link in `http://127.0.0.1:8888/`. Double click on the notebook file to start the Jupyter Notebook, and you should be able to see the Feathr sample notebook. Click the triangle button on the Jupyter notebook and the whole notebook will run locally.

The default jupyter notebook is here:
```bash
http://localhost:8888/lab/workspaces/auto-w/tree/local_quickstart_notebook.ipynb
```

![Feathr Notebook](./images/feathr-sandbox.png)


After running the Notebooks, all the features will be registered in the UI, and you can visit the Feathr UI at:

```bash
http://localhost:80
```


After executing those scripts, you should be able to see a project called `local_spark` in the Feathr UI. You can also view lineage in the Feathr UI and explore all the details.
![Feathr UI](./images/feathr-sandbox-ui.png)

![Feathr UI](./images/feathr-sandbox-lineage.png)

## Components

The Feathr sandbox comes with:
- Built-in Jupyter Notebook
- Pre-installed data science packages such as `interpret` so that data science development becomes easy
- Pre-installed Feathr package
- A local spark environment for dev/test purpose
- Feathr samples that can run locally
- A local Feathr registry backed by SQLite
- Feathr UI
- Feathr Registry API
- Local Redis server


## Build Docker Container

If you want to build the Feathr sandbox, run the below command in the Feathr root directory:

```bash
docker build -f FeathrSandbox.Dockerfile -t feathrfeaturestore/feathr-sandbox .
```


## For Feathr Developers
The Feathr package is copied to the user folder, and is installed with `pip install -e` option, which means you can do interactive development in the python package. For example you want to validate changes, instead of setting up the environment, you can simply go to the


note that if you are using Jupyter notebook to run the code, make sure you restart jupyter notebook so the kernel can reload Feathr package.
You should be able to see the

![Feathr Dev Experience](./images/feathr-sandbox-dev-experience.png)

In the future, an VSCode Server might be installed so that you can do interactive development in the docker container.
Loading

0 comments on commit 290ceb3

Please sign in to comment.