-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor LoadMethod.LOCAL to use symlinks instead of copying directory #660
Refactor LoadMethod.LOCAL to use symlinks instead of copying directory #660
Conversation
This allows you to fully refresh a model from the console. Full-refresh/backfill is a common task. Using Airflow parameters makes this easy. Without this, you'd have to trigger an entire deployment. In our setup, company analysts manage their models without modifying the DAG code. This empowers such users. Example of usage: ```python with DAG( dag_id="jaffle", params={"full_refresh": Param(default=False, type="boolean")}, render_template_as_native_obj=True ): task = DbtTaskGroup( operator_args={"full_refresh": "{{ params.get('full_refresh') }}", "install_deps": True}, ) ``` Closes: astronomer#151
👷 Deploy Preview for amazing-pothos-a3bca0 processing.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #660 +/- ##
=======================================
Coverage 92.75% 92.76%
=======================================
Files 54 55 +1
Lines 2235 2238 +3
=======================================
+ Hits 2073 2076 +3
Misses 162 162 ☔ View full report in Codecov by Sentry. |
It looks like symlinking this file: https://github.com/astronomer/astronomer-cosmos/blob/main/dev/dags/dbt/simple/data/imdb.db is locking the database causing the Sqlite integration test to fail.
INFO cosmos.hooks.subprocess:subprocess.py:94 02:05:05 Database Error in model movies_ratings_simplified (models/movies_ratings_simplified.sql)
INFO cosmos.hooks.subprocess:subprocess.py:94 02:05:05 no such table: imdb.movies_ratings
INFO cosmos.hooks.subprocess:subprocess.py:94 02:05:05 compiled Code at target/run/simple/models/movies_ratings_simplified.sql |
@jbandoro Perhaps the issue is with our test/code structure. I believe it is unlike someone who would have the database based into the dbt project folder in a real use case, what do you think? We could move it to a folder outside of the DBT project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbandoro this is looking very good, one comment regarding the issue and only one comment regarding the code. Great job!
Thanks @tatiana! I've been struggling to get the sqlite integration test to pass with this refactor. I tried moving the I tried to move the sqlite db file on the main branch to see if it was related still to the symbolic linking and it results in the same error for me locally 😕.
|
@tatiana I was able to fix the sqlite integration test after some troubleshooting. The issue was that the current copying of directories was actually fixing a problem with the ❯ dbt run --models movies_ratings_simplified --profiles-dir /Users/justin.bandoro/astronomer-cosmos/dev/dags/dbt/simple --profile simple --target dev
20:16:24 Running with dbt=1.4.0
20:16:24 Unable to do partial parsing because profile has changed
20:16:24 Unable to do partial parsing because env vars used in profiles.yml have changed
20:16:24 Found 2 models, 0 tests, 0 snapshots, 0 analyses, 290 macros, 0 operations, 0 seed files, 1 source, 1 exposure, 0 metrics
20:16:24
20:16:24 Concurrency: 1 threads (target='dev')
20:16:24
20:16:24 1 of 1 START sql table model main.movies_ratings_simplified .................... [RUN]
20:16:24 1 of 1 ERROR creating sql table model main.movies_ratings_simplified ........... [ERROR in 0.02s]
20:16:24
20:16:24 Finished running 1 table model in 0 hours 0 minutes and 0.06 seconds (0.06s).
20:16:24
20:16:24 Completed with 1 error and 0 warnings:
20:16:24
20:16:24 Database Error in model movies_ratings_simplified (models/movies_ratings_simplified.sql)
20:16:24 no such table: imdb.movies_ratings
20:16:24 compiled Code at target/run/simple/models/movies_ratings_simplified.sql
20:16:24
20:16:24 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 This is because in the This works currently because cosmos is copying the entire dbt directory, what this did was have 2 different directories: schema: 'main'
schemas_and_paths:
main: "{{ env_var('DBT_SQLITE_PATH', '.') }}/data/imdb.db"
schema_directory: 'data' The main schema which points to the original The update here is to change the sources to reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant work, @jbandoro !
Thank you so much for digging into the SQLite issue and solving it. we'll be releasing this as part of 1.3. As soon as the tests pass, I'll merge this change and we'll have a new 1.3 alpha release.
Bug fixes * Store `compiled_sql` even when task fails by @agreenburg in #671 * Refactor `LoadMethod.LOCAL` to use symlinks instead of copying directory by @jbandoro in #660 * Fix 'Unable to find the dbt executable: dbt' error by @tatiana in #666 * Fix installing deps when using `profile_mapping` & `ExecutionMode.LOCAL` by @joppevos in #659 Others * Docs fix: add execution config to MWAA code example by @ugmuka in #674
Bug fixes * Store `compiled_sql` even when task fails by @agreenburg in #671 * Refactor `LoadMethod.LOCAL` to use symlinks instead of copying directory by @jbandoro in #660 * Fix 'Unable to find the dbt executable: dbt' error by @tatiana in #666 * Fix installing deps when using `profile_mapping` & `ExecutionMode.LOCAL` by @joppevos in #659 Others * Docs fix: add execution config to MWAA code example by @ugmuka in #674 (cherry picked from commit aa9b7bb)
Bug fixes * Store `compiled_sql` even when task fails by @agreenburg in #671 * Refactor `LoadMethod.LOCAL` to use symlinks instead of copying directory by @jbandoro in #660 * Fix 'Unable to find the dbt executable: dbt' error by @tatiana in #666 * Fix installing deps when using `profile_mapping` & `ExecutionMode.LOCAL` by @joppevos in #659 Others * Docs fix: add execution config to MWAA code example by @ugmuka in #674
Thanks for the contribution, @jbandoro , we released it as part of Cosmos 1.2.4 yesterday: |
**Bug fixes** * Store `compiled_sql` even when task fails by @agreenburg in #671 * Refactor `LoadMethod.LOCAL` to use symlinks instead of copying directory by @jbandoro in #660 * Fix 'Unable to find the dbt executable: dbt' error by @tatiana in #666 * Fix installing deps when using `profile_mapping` & `ExecutionMode.LOCAL` by @joppevos in #659 **Others** * Docs fix: add execution config to MWAA code example by @ugmuka in #674 (cherry picked from commit aa9b7bb)
[Justin Bandoro](https://www.linkedin.com/in/justin-bandoro-592b14a7/) (@jbandoro) is a Data Engineer at Kevala Inc. He's based in San Francisco (USA) and has been an early adopter of Cosmos, using it regularly at his company. Not only has he been using Cosmos since the early stages, but he has consistently improved Cosmos since January 2023: ![Screenshot 2023-12-04 at 16 28 29](https://github.com/astronomer/astronomer-cosmos/assets/272048/43197938-d1ab-431f-b101-b6026e5cd3ab) Some of his contributions include new features, code quality, documentation and overall improvements. Some examples: * Speed up integration tests in 67% #732 * Prevent override of dbt profile fields #702 * Add support for env vars in `RenderConfig` in #690 * Use symbolic links to run local tasks, avoiding to copy potentially huge dbt project folders in #660 * Improve documentation in #638 * Automated and improved the code complexity checks in #629 * Added `DbtDocsGCSOperator` in #616 * Added support for Python 3.7 in #88 and #214 Additionally, he has been interacting with users in the #airflow-dbt Slack channel in a very collaborative and supportive way. We want to promote him as a Cosmos committer and maintainer for all these, recognising his constant efforts and achievements towards our community. Thank you very much, @jbandoro !
astronomer#660) This PR refactors the `create_symlinks` function that was previously used in load via dbt ls so that it can be used in `DbtLocalBaseOperator.run_command` instead of copying the entire directory. Closes: astronomer#614
**Bug fixes** * Store `compiled_sql` even when task fails by @agreenburg in astronomer#671 * Refactor `LoadMethod.LOCAL` to use symlinks instead of copying directory by @jbandoro in astronomer#660 * Fix 'Unable to find the dbt executable: dbt' error by @tatiana in astronomer#666 * Fix installing deps when using `profile_mapping` & `ExecutionMode.LOCAL` by @joppevos in astronomer#659 **Others** * Docs fix: add execution config to MWAA code example by @ugmuka in astronomer#674 (cherry picked from commit aa9b7bb)
[Justin Bandoro](https://www.linkedin.com/in/justin-bandoro-592b14a7/) (@jbandoro) is a Data Engineer at Kevala Inc. He's based in San Francisco (USA) and has been an early adopter of Cosmos, using it regularly at his company. Not only has he been using Cosmos since the early stages, but he has consistently improved Cosmos since January 2023: ![Screenshot 2023-12-04 at 16 28 29](https://github.com/astronomer/astronomer-cosmos/assets/272048/43197938-d1ab-431f-b101-b6026e5cd3ab) Some of his contributions include new features, code quality, documentation and overall improvements. Some examples: * Speed up integration tests in 67% astronomer#732 * Prevent override of dbt profile fields astronomer#702 * Add support for env vars in `RenderConfig` in astronomer#690 * Use symbolic links to run local tasks, avoiding to copy potentially huge dbt project folders in astronomer#660 * Improve documentation in astronomer#638 * Automated and improved the code complexity checks in astronomer#629 * Added `DbtDocsGCSOperator` in astronomer#616 * Added support for Python 3.7 in astronomer#88 and astronomer#214 Additionally, he has been interacting with users in the #airflow-dbt Slack channel in a very collaborative and supportive way. We want to promote him as a Cosmos committer and maintainer for all these, recognising his constant efforts and achievements towards our community. Thank you very much, @jbandoro !
Description
This PR refactors the
create_symlinks
function that was previously used in load via dbt ls so that it can be used inDbtLocalBaseOperator.run_command
instead of copying the entire directory.Related Issue(s)
closes #614
Breaking Change?
None
Checklist