Skip to content

Latest commit

 

History

History
45 lines (29 loc) · 2.28 KB

README.md

File metadata and controls

45 lines (29 loc) · 2.28 KB

Getting Started

  1. Download the datasets from the Microsoft Azure Predictive Maintenance Kaggle project either directly or by using the Kaggle CLI. Then, upload it to a GCS bucket (in the correct region / project).
kaggle datasets download -d arnabbiswas1/microsoft-azure-predictive-maintenance
  1. Create a virtual environment and pip install requirements.txt locally to ensure you have the necessary versions of the google cloud and kfp libraries installed.
pip install --trusted-host pypip.python.org -r requirements.txt
  1. Create a service account key and copy the resulting json file into the ./data subdirectory of this project.

  2. Fill out the environment variables in env.sh and source the file.

source env.sh
  1. Make sure that have setup your gcloud CLI, including authorizing your service account and/or switching to an active account that you'd like to use that has the correct privileges / permissions.

  2. Run the build_image.sh script to build the component container image and push it to the Artifact Registry (make sure your Docker daemon is running locally).

. ./build_image.sh
  1. Run the run_pipeline.py script to trigger the prepackaged Vertex AI pipeline (ingests the data into BigQuery from Step 1, dbt run, creates and pushes features into Vertex feature store, trains the model using Vertex AutoML, evaluates the model, and then finally deploys the model to a Vertex endpoint if a certain threshold has been met).
python run_pipeline.py

NOTE: Pipeline will take approximately 4 hours to complete

  1. (Optional) If you would like, run the cleanup.py script once you're done and if you don't need the underlying BQ dataset, feature store, model, and/or other Vertex AI resources anymore.
python src/cleanup.py

(Note: You may need to undeploy the model first from the endpoint before being able to delete it)