Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation mutator for volume artifact_path #2050

Merged
merged 19 commits into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions bundle/config/validate/fast_validate.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
package validate

import (
"context"

"github.com/databricks/cli/bundle"
"github.com/databricks/cli/libs/diag"
)

// FastValidate runs a subset of fast validation checks. This is a subset of the full
// suite of validation mutators that satisfy ANY ONE of the following criteria:
//
// 1. No file i/o or network requests are made in the mutator.
// 2. The validation is blocking for bundle deployments.
//
// The full suite of validation mutators is available in the [Validate] mutator.
type fastValidateReadonly struct{}

func FastValidateReadonly() bundle.ReadOnlyMutator {
return &fastValidateReadonly{}
}

func (f *fastValidateReadonly) Name() string {
return "fast_validate(readonly)"
}

func (f *fastValidateReadonly) Apply(ctx context.Context, rb bundle.ReadOnlyBundle) diag.Diagnostics {
return bundle.ApplyReadOnly(ctx, rb, bundle.Parallel(
// Fast mutators with only in-memory checks
JobClusterKeyDefined(),
JobTaskClusterSpec(),
SingleNodeCluster(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do all of these return errors?

Warnings currently only show up at the end of deploy, so they're not actionable (yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, SingleNodeCluster() returns a warning that is useful even with a deployment. The others, though indeed, are no longer useful.

I can follow up with a PR that serializes the warnings (and below like INFO) on a rolling basis in the bundle.Seq mutator. That should make the warnings useful before deployment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The others, though indeed, are no longer useful.

How so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to revisit how we output these diags anyway, so no need to address here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so?

You'd end up with runtime errors on JobClusterKeyDefined. The validation warning has a runtime error equivalent:

.venv➜  bundle-playground git:(master) ✗ databricks bundle validate
Warning: job_cluster_key c1 is not defined
  at resources.jobs.foo.tasks[0].job_cluster_key
  in databricks.yml:17:28
Error: terraform apply: exit status 1

Error: cannot create job: Job cluster 'c1' is not defined in field 'job_clusters'.

  with databricks_job.foo,
  on bundle.tf.json line 35, in resource.databricks_job.foo:
  35:       }

JobTaskClusterSpec returns errors though so it's useful for returning an error early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably promote this warning to an error though.


// Blocking mutators. Deployments will fail if these checks fail.
ValidateArtifactPath(),
))
}

type fastValidate struct{}

func FastValidate() bundle.Mutator {
return &fastValidate{}
}

func (f *fastValidate) Name() string {
return "fast_validate"
}

func (f *fastValidate) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
return bundle.ApplyReadOnly(ctx, bundle.ReadOnly(b), FastValidateReadonly())
}
9 changes: 5 additions & 4 deletions bundle/config/validate/validate.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,13 @@ func (l location) Path() dyn.Path {
// Apply implements bundle.Mutator.
func (v *validate) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
return bundle.ApplyReadOnly(ctx, bundle.ReadOnly(b), bundle.Parallel(
JobClusterKeyDefined(),
FastValidateReadonly(),

// Slow mutators that require network or file i/o. These are only
// run in the `bundle validate` command.
FilesToSync(),
ValidateSyncPatterns(),
JobTaskClusterSpec(),
ValidateFolderPermissions(),
SingleNodeCluster(),
ValidateSyncPatterns(),
))
}

Expand Down
129 changes: 129 additions & 0 deletions bundle/config/validate/validate_artifact_path.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
package validate

import (
"context"
"errors"
"fmt"
"slices"
"strings"

"github.com/databricks/cli/bundle"
"github.com/databricks/cli/bundle/config"
"github.com/databricks/cli/bundle/libraries"
"github.com/databricks/cli/libs/diag"
"github.com/databricks/cli/libs/dyn"
"github.com/databricks/cli/libs/dyn/dynvar"
"github.com/databricks/databricks-sdk-go/apierr"
)

type validateArtifactPath struct{}

func ValidateArtifactPath() bundle.ReadOnlyMutator {
return &validateArtifactPath{}
}

func (v *validateArtifactPath) Name() string {
return "validate:artifact_paths"
}

func extractVolumeFromPath(artifactPath string) (string, string, string, error) {
Copy link
Contributor Author

@shreyas-goenka shreyas-goenka Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved from another file

if !libraries.IsVolumesPath(artifactPath) {
return "", "", "", fmt.Errorf("expected artifact_path to start with /Volumes/, got %s", artifactPath)
}

parts := strings.Split(artifactPath, "/")
volumeFormatErr := fmt.Errorf("expected UC volume path to be in the format /Volumes/<catalog>/<schema>/<volume>/..., got %s", artifactPath)

// Incorrect format.
if len(parts) < 5 {
return "", "", "", volumeFormatErr
}

catalogName := parts[2]
schemaName := parts[3]
volumeName := parts[4]

// Incorrect format.
if catalogName == "" || schemaName == "" || volumeName == "" {
return "", "", "", volumeFormatErr
}

return catalogName, schemaName, volumeName, nil
}

func findVolumeInBundle(r config.Root, catalogName, schemaName, volumeName string) (dyn.Path, []dyn.Location, bool) {
Copy link
Contributor Author

@shreyas-goenka shreyas-goenka Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved from another file

volumes := r.Resources.Volumes
for k, v := range volumes {
if v.CatalogName != catalogName || v.Name != volumeName {
continue
}
// UC schemas can be defined in the bundle itself, and thus might be interpolated
// at runtime via the ${resources.schemas.<name>} syntax. Thus we match the volume
// definition if the schema name is the same as the one in the bundle, or if the
// schema name is interpolated.
// We only have to check for ${resources.schemas...} references because any
// other valid reference (like ${var.foo}) would have been interpolated by this point.
p, ok := dynvar.PureReferenceToPath(v.SchemaName)
isSchemaDefinedInBundle := ok && p.HasPrefix(dyn.Path{dyn.Key("resources"), dyn.Key("schemas")})
if v.SchemaName != schemaName && !isSchemaDefinedInBundle {
continue
}
pathString := fmt.Sprintf("resources.volumes.%s", k)
return dyn.MustPathFromString(pathString), r.GetLocations(pathString), true
}
return nil, nil, false
}

func (v *validateArtifactPath) Apply(ctx context.Context, rb bundle.ReadOnlyBundle) diag.Diagnostics {
// We only validate UC Volumes paths right now.
if !libraries.IsVolumesPath(rb.Config().Workspace.ArtifactPath) {
return nil
}

wrapErrorMsg := func(s string) diag.Diagnostics {
return diag.Diagnostics{
{
Summary: s,
Severity: diag.Error,
Locations: rb.Config().GetLocations("workspace.artifact_path"),
Paths: []dyn.Path{dyn.MustPathFromString("workspace.artifact_path")},
},
}
}

catalogName, schemaName, volumeName, err := extractVolumeFromPath(rb.Config().Workspace.ArtifactPath)
if err != nil {
return wrapErrorMsg(err.Error())
}
volumeFullName := fmt.Sprintf("%s.%s.%s", catalogName, schemaName, volumeName)
w := rb.WorkspaceClient()
_, err = w.Volumes.ReadByName(ctx, volumeFullName)

if errors.Is(err, apierr.ErrPermissionDenied) {
return wrapErrorMsg(fmt.Sprintf("cannot access volume %s: %s", volumeFullName, err))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API itself returns well formatted error messages in this case so we do not need to add more context here. Example:

➜  cli git:(validate-artifact-path) ✗ databricks grants get-effective volume capital.whatever.my_volume -p aws-prod-ucws-alice
Error: User does not have USE SCHEMA on Schema 'capital.whatever'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to use the GET Volume API. That API still provides nice error messages and more context:

➜  ~ databricks volumes read capital.whatever.my_volume -p aws-prod-ucws-alice
Error: User does not have READ VOLUME on Volume 'capital.whatever.my_volume'.

}
if errors.Is(err, apierr.ErrNotFound) {
path, locations, ok := findVolumeInBundle(rb.Config(), catalogName, schemaName, volumeName)
if !ok {
return wrapErrorMsg(fmt.Sprintf("volume %s does not exist", volumeFullName))
}

// If the volume is defined in the bundle, provide a more helpful error diagnostic,
// with more details and location information.
return diag.Diagnostics{{
Summary: fmt.Sprintf("volume %s does not exist", volumeFullName),
Severity: diag.Error,
Detail: `You are using a volume in your artifact_path that is managed by
this bundle but which has not been deployed yet. Please first deploy
the volume using 'bundle deploy' and then switch over to using it in
the artifact_path.`,
Locations: slices.Concat(rb.Config().GetLocations("workspace.artifact_path"), locations),
Paths: append([]dyn.Path{dyn.MustPathFromString("workspace.artifact_path")}, path),
}}

}
if err != nil {
return wrapErrorMsg(fmt.Sprintf("cannot read volume %s: %s", volumeFullName, err))
}
return nil
}
Loading
Loading