Databricks CI/CD

This is a tool for building CI/CD pipelines for Databricks. It is a python package that works in conjunction with a custom GIT repository (or a simple file structure) to validate and deploy content to databricks. Currently, it can handle the following content:

Workspace - a collection of notebooks written in Scala, Python, R or SQL
Jobs - list of Databricks jobs
Clusters
Instance Pools
DBFS - an arbitrary collection of files that may be deployed on a Databricks workspace

Installation

pip install databricks-cicd

Requirements

To use this tool, you need a source directory structure (preferably as a private GIT repository) that has the following structure:

any_local_folder_or_git_repo/
├── workspace/
│   ├── some_notebooks_subdir
│   │   └── Notebook 1.py
│   ├── Notebook 2.sql
│   ├── Notebook 3.r
│   └── Notebook 4.scala
├── jobs/
│   ├── My first job.json
│   └── Side gig.json
├── clusters/
│   ├── orion.json
│   └── Another cluster.json
├── instance_pools/
│   ├── Pool 1.json
│   └── Pool 2.json
└── dbfs/
    ├── strawbery_jam.jar
    ├── subdir
    │   └── some_other.jar
    ├── some_python.egg
    └── Ice cream.jpeg

Note: All folder names represent the default and can be configured. This is just a sample.

Usage

For the latest options and commands run:

cicd -h

A sample command could be:

cicd deploy \
   -w sample_12432.7.azuredatabricks.net \
   -u john.smith@domain.com \
   -t dapi_sample_token_0d5-2 \
   -lp '~/git/my-private-repo' \
   -tp /blabla \
   -c DEV.ini \
   --verbose

Note: Paths for windows need to be in double quotes

The default configuration is defined in default.ini and can be overridden with a custom ini file using the -c option, usually one config file per target environment. (sample)

Create content

Notebooks:

Add a notebook to source
1. On the databricks UI go to your notebook.
2. Click on File -> Export -> Source file.
3. Add that file to the workspace folder of this repo without changing the file name.

Jobs:

Add a job to source
1. Get the source of the job and write it to a file. You need to have the Databricks CLI and JQ installed. For Windows, it is easier to rename the jq-win64.exe to jq.exe and place it in c:\Windows\System32 folder. Then on Windows/Linux/MAC:
```
databricks jobs get --job-id 74 | jq .settings > Job_Name.json
```
  This downloads the source JSON of the job from the databricks server and pulls only the settings from it, then writes it in to a file.
  
  Note: The file name should be the same as the job name within the json file. Please, avoid spaces in names.
2. Add that file to the jobs folder

Clusters:

Add a cluster to source
1. Get the source of the cluster and write it to a file.
```
databricks clusters get --cluster-name orion > orion.json
```
  Note: The file name should be the same as the cluster name within the json file. Please, avoid spaces in names.
2. Add that file to the clusters folder

Instance pools:

Add an instance pool to source
1. Similar to clusters, just use instance-pools instead of clusters

DBFS:

Add a file to dbfs
1. Just add a file to the the dbfs folder.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
databricks_cicd		databricks_cicd
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
config_sample.ini		config_sample.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Databricks CI/CD

Installation

Requirements

Usage

Create content

Notebooks:

Jobs:

Clusters:

Instance pools:

DBFS:

About

Releases 17

Packages

Languages

License

man40/databricks-cicd

Folders and files

Latest commit

History

Repository files navigation

Databricks CI/CD

Installation

Requirements

Usage

Create content

Notebooks:

Jobs:

Clusters:

Instance pools:

DBFS:

About

Resources

License

Stars

Watchers

Forks

Releases 17

Packages 0

Languages

Packages