Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning when using workspace.file_path variable in base_parameters #2300

Open
kenmyers-8451 opened this issue Feb 6, 2025 · 3 comments
Open
Labels
Bug Something isn't working DABs DABs related issues

Comments

@kenmyers-8451
Copy link

kenmyers-8451 commented Feb 6, 2025

Describe the issue

Sorry if I tagged this wrong. This is an issue with DABs but is primarily about a warning issue I have when deploying from the CLI.

My team has a bunch of jobs where we are running a notebook that runs sql scripts organized across a number of directories. The task looks something like this

        - task_key: some_task
          existing_cluster_id: ${var.some_cluster}
          notebook_task:
            notebook_path: ${workspace.file_path}/path/to/notebook
            base_parameters:
              schema_name: some_schema
              table_name: some_table
              sql_directory: ./sql
              env: ${bundle.environment}
            source: WORKSPACE

In the above example sql/ is located in the same location as the notebook, so the path is like ${workspace.file_path}/path/to/sql/.

However if the sql is located somewhere else then we have to define the path like this:

        - task_key: some_task
          existing_cluster_id: ${var.some_cluster}
          notebook_task:
            notebook_path: ${workspace.file_path}/path/to/notebook
            base_parameters:
              schema_name: some_schema
              table_name: some_table
              sql_directory: /Workspace/${workspace.file_path}/other/path/to/sql
              env: ${bundle.environment}
            source: WORKSPACE

I recently upgraded from cli version 0.222.0 to 0.240.0 and we started getting a lot of warnings like this:

Warning: substring "/Workspace/${workspace.file_path}" found in "/Workspace/${workspace.file_path}/path/to/sql". Please update this to "${workspace.file_path}/path/to/sql".
  at resources.jobs.some_job.tasks[1].existing_cluster_id.base_parameters.sql_directory
  in bundle/resources/some_job.yml:37:30

So I went to upgrade all of these and remove that /Workspace/ at the start. This deployed fine without warning, but when I went to run the job I got an error. We do an assertion check on that param, assert os.path.exists(sql_directory) and this failed with sql_directory=/Users/[my user id]/.bundle/[project name]/dev/files/other/path/to/sql.

But I have had no issue running this on 0.240.0 when it looks like sql_directory: /Workspace/${workspace.file_path}/other/path/to/sql, we pass the path check and we can find the sql files, we just get bombarded by the warnings. So I think there is an issue with this erroneous warning when using /Workspace/${workspace.file_path}/ in the base_parameters. I've since tried deploying and running again with the /Workspace/ at the front and it seems I was mistaken and the deployment changes the parameter default to remove /Workspace/. This is bad and I think we'd have to make some hacky workaround so this isn't detected (like a separate variable that stores these locations, not even sure if this would bypass the detection) or write some custom code in the notebook that prepends /Workspace/ when the path isn't a relative path. We don't want to do this so for now my team is going to have to stick with the lower CLI version.

Steps to reproduce the behavior

I think I have outlined enough above but please let me know if I can offer additional guidance.

Expected Behavior

Either base_parameters should prepend /Workspace/ to ${workspace.file_path} values or it should not remove it when it is needed in the path nor give a warning to remove the /Workspace/ from the path.

Actual Behavior

CLI spits out a bad warning for each instance of this and fixing the warnings causes failures on job run.

OS and CLI version

MacOS 14.5 (23F79), cli 0.240.0

Is this a regression?

Previously my team was on 0.222.0, sorry we have not upgraded in a while. We were not getting the warnings then and otherwise everything seems to run the same way.

Debug Logs

I can provide these if needed.

@kenmyers-8451 kenmyers-8451 added the CLI CLI related issues label Feb 6, 2025
@kenmyers-8451
Copy link
Author

kenmyers-8451 commented Feb 6, 2025

this seems related to https://github.com/databricks/cli/pull/1724/files but afaict it does not seem like the prepend is happening to these base_parameters values

@kenmyers-8451
Copy link
Author

kenmyers-8451 commented Feb 6, 2025

Okay I think I have sorta figured out what is causing this. In our bundle file we have

workspace:
  file_path: ${var.WORKING_DIRECTORY}

and as you can guess WORKING_DIRECTORY is a variable we define elsewhere to basically be /Users/${workspace.current_user.userName}/.bundle/${bundle.name}... but when I changed the bundle file to look like

workspace:
  file_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}...

the sql_directory variable has /Workspace/ prepended to it as expected (essentially, no issue).

We use this WORKING_DIRECTORY variable because when we deploy through github actions we change the WORKING_DIRECTORY variable to be a databricks /Repos/ location.

Afaik most of our organization does this and I don't know if there is a better way for them to handle this file_path swap for local and production deployments. ccing @jacobford-8451 to see if you have any thoughts.

This still seems like a bug though. Like something is out of order. The warning is raised so the system is aware that we are defining the variable the old way but it fails to prepend because the file_path is abstracted to a variable? That doesn't seem right

@andrewnester andrewnester added Bug Something isn't working DABs DABs related issues and removed CLI CLI related issues labels Feb 6, 2025
@andrewnester
Copy link
Contributor

Thanks for diving deep into this! You indeed figured out how DABs do this workspace prepend.

To give a bit more details, what DABs do for this is:

  1. Looks at the content of workspace.root_path, workspace.file_path and etc. variables and see if they don't start with /Workspace. If they don't -> prepend with /Workspace
  2. It has to do it before variable interpolations because, like in your example ${workspace.root_path} might be used in some other places and has to be replaced with correct (already prepended) value.
  3. Because it does before variable interpolations, if workspace.root_path start with a variable we don't do any prepends because we don't know the value of it (didn't do the interpolation)

We have a related issue here touching on this topic as well #2181

For now, I can think of a couple of options:

  1. Replace file_path with file_path: /Workspace${var.WORKING_DIRECTORY} and use different bundle targets to replace the value of it instead of variable
  2. Fix the value of WORKING_DIRECTORY. The reason why we prepend this is /Users/... is technically incorrect path and now Databricks platform enforces use of /Workspace/Users... paths instead

On a more generic note, we're also looking into order of variable interpolations and this issue is on our radar as well to be looked / addressed as part of this story. Unfortunately no ETA yet.

@pietern @denik any other comments from your side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working DABs DABs related issues
Projects
None yet
Development

No branches or pull requests

2 participants