Skip to content

Commit

Permalink
Make the default databricks bundle init template more self-explanat…
Browse files Browse the repository at this point in the history
…ory (#796)

This makes the default-python template more self-explanatory and adds a
few other tweaks for a better out-of-the-box experience.
  • Loading branch information
lennartkats-db authored Sep 26, 2023
1 parent 757d5ef commit 0c1516c
Show file tree
Hide file tree
Showing 8 changed files with 72 additions and 11 deletions.
3 changes: 2 additions & 1 deletion libs/template/renderer_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ func assertBuiltinTemplateValid(t *testing.T, settings map[string]any, target st

templatePath, err := prepareBuiltinTemplates("default-python", tempDir)
require.NoError(t, err)
libraryPath := filepath.Join(templatePath, "library")

w := &databricks.WorkspaceClient{
Config: &workspaceConfig.Config{Host: "https://myhost.com"},
Expand All @@ -52,7 +53,7 @@ func assertBuiltinTemplateValid(t *testing.T, settings map[string]any, target st
ctx = root.SetWorkspaceClient(ctx, w)
helpers := loadHelpers(ctx)

renderer, err := newRenderer(ctx, settings, helpers, templatePath, "./testdata/template-in-path/library", tempDir)
renderer, err := newRenderer(ctx, settings, helpers, templatePath, libraryPath, tempDir)
require.NoError(t, err)

// Evaluate template
Expand Down
7 changes: 7 additions & 0 deletions libs/template/templates/default-python/library/versions.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{{define "latest_lts_dbr_version" -}}
13.3.x-scala2.12
{{- end}}

{{define "latest_lts_db_connect_version_spec" -}}
>=13.3,<13.4
{{- end}}
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"python.analysis.extraPaths": ["src"],
"files.exclude": {
"**/*.egg-info": true
"**/*.egg-info": true,
"**/__pycache__": true,
".pytest_cache": true,
},
}
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The '{{.project_name}}' project was generated by using the default-python templa

5. To run a job or pipeline, use the "run" comand:
```
$ databricks bundle run {{.project_name}}_job
$ databricks bundle run
```

6. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## requirements-dev.txt: dependencies for local development.
##
## For defining dependencies used by jobs in Databricks Workflows, see
## https://docs.databricks.com/dev-tools/bundles/library-dependencies.html

## pytest is the default package used for testing
pytest

## databricks-connect can be used to run parts of this project locally.
## See https://docs.databricks.com/dev-tools/databricks-connect.html.
##
## databricks-connect is automatically installed if you're using Databricks
## extension for Visual Studio Code
## (https://docs.databricks.com/dev-tools/vscode-ext/dev-tasks/databricks-connect.html).
##
## To manually install databricks-connect, either follow the instructions
## at https://docs.databricks.com/dev-tools/databricks-connect.html
## to install the package system-wide. Or uncomment the line below to install a
## version of db-connect that corresponds to the Databricks Runtime version used
## for this project.
#
# databricks-connect{{template "latest_lts_db_connect_version_spec"}}
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,17 @@ resources:
package_name: {{.project_name}}
entry_point: main
libraries:
# By default we just include the .whl file generated for the {{.project_name}} package.
# See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
# for more information on how to add other libraries.
- whl: ../dist/*.whl

{{else}}
{{end -}}
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
{{- /* we should always use an LTS version in our templates */}}
spark_version: 13.3.x-scala2.12
spark_version: {{template "latest_lts_dbr_version"}}
node_type_id: {{smallest_node_type}}
autoscale:
min_workers: 1
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
"""
Setup script for {{.project_name}}.
setup.py configuration script describing how to build and package this project.

This script packages and distributes the associated wheel file(s).
Source code is in ./src/. Run 'python setup.py sdist bdist_wheel' to build.
This file is primarily used by the setuptools library and typically should not
be executed directly. See README.md for how to deploy, test, and run
the {{.project_name}} project.
"""
from setuptools import setup, find_packages

Expand All @@ -16,9 +17,18 @@ setup(
version={{.project_name}}.__version__,
url="https://databricks.com",
author="{{user_name}}",
description="my test wheel",
description="wheel file based on {{.project_name}}/src",
packages=find_packages(where='./src'),
package_dir={'': 'src'},
entry_points={"entry_points": "main={{.project_name}}.main:main"},
install_requires=["setuptools"],
entry_points={
"packages": [
"main={{.project_name}}.main:main"
]
},
install_requires=[
# Dependencies in case the output wheel file is used as a library dependency.
# For defining dependencies, when this package is used in Databricks, see:
# https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
"setuptools"
],
)
Original file line number Diff line number Diff line change
@@ -1,5 +1,21 @@
from databricks.connect import DatabricksSession
from pyspark.sql import SparkSession
from {{.project_name}} import main

# Create a new Databricks Connect session. If this fails,
# check that you have configured Databricks Connect correctly.
# See https://docs.databricks.com/dev-tools/databricks-connect.html.
{{/*
The below works around a problematic error message from Databricks Connect.
The standard SparkSession is supported in all configurations (workspace, IDE,
all runtime versions, CLI). But on the CLI it currently gives a confusing
error message if SPARK_REMOTE is not set. We can't directly use
DatabricksSession.builder in main.py, so we're re-assigning it here so
everything works out of the box, even for CLI users who don't set SPARK_REMOTE.
*/}}
SparkSession.builder = DatabricksSession.builder
SparkSession.builder.getOrCreate()

def test_main():
taxis = main.get_taxis()
assert taxis.count() > 5

0 comments on commit 0c1516c

Please sign in to comment.