Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor LoadMethod.LOCAL to use symlinks as opposed to copying the dbt source dir #614

Closed
tatiana opened this issue Oct 19, 2023 · 1 comment · Fixed by #660
Closed
Labels
area:performance Related to performance, like memory usage, CPU usage, speed, etc execution:local Related to Local execution environment good first issue Good for newcomers
Milestone

Comments

@tatiana
Copy link
Collaborator

tatiana commented Oct 19, 2023

As of Cosmos 1.0-1.2, it copies the entire dbt project for every operator that runs dbt:

As reported in the #airflow-dbt slack, this can lead to performance issues:
https://apache-airflow.slack.com/archives/C059CC42E9W/p1697718075031609?thread_ts=1697708239.918849&cid=C059CC42E9W

We successfully adopted a strategy of using symbolic links from a temporary folder to run dbt ls:

tmpdir_path = Path(tmpdir)
ignore_paths = (DBT_LOG_DIR_NAME, DBT_TARGET_DIR_NAME, "dbt_packages", "profiles.yml")
for child_name in os.listdir(self.project.dir):
if child_name not in ignore_paths:
os.symlink(self.project.dir / child_name, tmpdir_path / child_name)

The goal with this ticket is to adopt this same strategy when running tasks using LoadMethod.LOCAL.

@tatiana tatiana added enhancement New feature or request good first issue Good for newcomers labels Oct 19, 2023
@tatiana tatiana added this to the 1.3.0 milestone Oct 19, 2023
@tatiana tatiana added area:performance Related to performance, like memory usage, CPU usage, speed, etc and removed enhancement New feature or request labels Oct 19, 2023
@jbandoro
Copy link
Collaborator

jbandoro commented Nov 8, 2023

I can work on this one, in #629 to reduce complexity I created the function:

def create_symlinks(project_path: Path, tmp_dir: Path) -> None:
"""Helper function to create symlinks to the dbt project files."""
ignore_paths = (DBT_LOG_DIR_NAME, DBT_TARGET_DIR_NAME, "dbt_packages", "profiles.yml")
for child_name in os.listdir(project_path):
if child_name not in ignore_paths:
os.symlink(project_path / child_name, tmp_dir / child_name)

which could be used here in DbtLocalBaseOperator.run_command.

@tatiana tatiana added the execution:local Related to Local execution environment label Nov 8, 2023
tatiana pushed a commit that referenced this issue Nov 14, 2023
#660)

This PR refactors the `create_symlinks` function that was previously
used in load via dbt ls so that it can be used in
`DbtLocalBaseOperator.run_command` instead of copying the entire
directory.

Closes: #614
tatiana pushed a commit that referenced this issue Nov 15, 2023
#660)

This PR refactors the `create_symlinks` function that was previously
used in load via dbt ls so that it can be used in
`DbtLocalBaseOperator.run_command` instead of copying the entire
directory.

Closes: #614
(cherry picked from commit 5d23758)
tatiana pushed a commit that referenced this issue Nov 15, 2023
#660)

This PR refactors the `create_symlinks` function that was previously
used in load via dbt ls so that it can be used in
`DbtLocalBaseOperator.run_command` instead of copying the entire
directory.

Closes: #614
(cherry picked from commit 5d23758)
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this issue Jul 14, 2024
astronomer#660)

This PR refactors the `create_symlinks` function that was previously
used in load via dbt ls so that it can be used in
`DbtLocalBaseOperator.run_command` instead of copying the entire
directory.

Closes: astronomer#614
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:performance Related to performance, like memory usage, CPU usage, speed, etc execution:local Related to Local execution environment good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants