TrackML at CERN

Author: Michelle Casbon

An example of LHC data on Kubeflow based on the Kaggle TrackML Particle Tracking Challenge.

Create a cluster and install Kubeflow
Run a notebook
Run a pipeline
Run hyperparameter tuning

Create a cluster and install Kubeflow

Create a cluster with Click-to-deploy using default settings. Follow the provided instructions to setup OAuth credentials.

After the cluster is available and you are able to access the Kubeflow central dashboard, enable auto-provisioning with the following command:

gcloud beta container clusters update kubeflow \
  --zone europe-west1-b \
  --enable-autoprovisioning \
  --max-cpu 128 \
  --max-memory 1120 \
  --max-accelerator type=nvidia-tesla-k80,count=4 \
  --verbosity error

Run a notebook

From the Kubeflow central dashboard, click on Notebooks and spawn a new instance. Use all defaults except for the following parameters:

CPU: 2

Memory: 12.0Gi

When the notebook instance is ready, click Connect and open a new Terminal. Run this command to import necessary libraries:

git clone https://github.com/LAL/trackml-library.git src/trackml-library
pip install src/trackml-library
pip install pandas
pip install matplotlib
pip install seaborn

Download sample data with this command:

mkdir input
gsutil cp gs://chasm-data/kaggle/trackml-particle-identification/train_sample.zip input
cd input
unzip train_sample.zip

Upload the file notebooks/trackml-problem-explanation-and-data-exploration.ipynb, which was adapted from Wesam Elshamy's Kaggle Kernel for use on Kubeflow v0.5.0, and open the notebook.

Run a pipeline

Build docker images

Each step in a pipeline references a container image. Build the necessary docker images with these commands:

docker/build.sh kfp_kubectl
docker/build.sh trackml

Compile the pipeline

In a local Terminal or Cloud Shell, install the Kubeflow pipelines python SDK by running this command:

pip install -U kfp

Compile a pipeline by running it directly:

curl -O https://raw.githubusercontent.com/texasmichelle/kubeflow-cern/master/pipelines/trackml.py
./trackml.py

Upload and run the pipeline

From the Kubeflow central dashboard, click on Pipeline Dashboard, then Upload pipeline. Select the file you just created (trackml.py.tar.gz) and then Upload.

Run the pipeline by first creating an experiment, then a run.

Run the pipeline from a notebook

From the Kubeflow central dashboard, click on Notebooks, then Upload the file notebooks/trackml-pipeline.ipynb.

Run the notebook and click on the resulting links to view the pipeline executing.

Run once more, requesting GPU resources and watch auto-provisioning add a GPU node to the cluster before executing training.

Run hyperparameter tuning

Run the gpu-example on the cluster with this command:

kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha1/gpu-example.yaml

Observe auto-provisioning spin up 2 extra GPU nodes (5 total: 2 CPU, 3 GPU).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
bin		bin
docker		docker
k8s		k8s
notebooks		notebooks
pipelines		pipelines
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrackML at CERN

Create a cluster and install Kubeflow

Run a notebook

Run a pipeline

Build docker images

Compile the pipeline

Upload and run the pipeline

Run the pipeline from a notebook

Run hyperparameter tuning

About

Releases

Packages

Languages

License

texasmichelle/kubeflow-cern

Folders and files

Latest commit

History

Repository files navigation

TrackML at CERN

Create a cluster and install Kubeflow

Run a notebook

Run a pipeline

Build docker images

Compile the pipeline

Upload and run the pipeline

Run the pipeline from a notebook

Run hyperparameter tuning

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages