Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default source nodes rendering #661

Closed
wants to merge 223 commits into from

Conversation

arojasb3
Copy link
Contributor

@arojasb3 arojasb3 commented Nov 9, 2023

Description

I'm aiming to give a default behaviour to source nodes for checking their source freshness and their tests.
One of the main limitations I found while using the custom_callback functions on source nodes to check freshness is that those node were being created to 100% of sources but not all of them required freshness checks, this made workers waste compute time.

I'm adding a new variable into the DbtNode class called has_freshness which would be True for sources with freshness checks and False for the ones that not and any other resource type.

All sources with the has_freshness == False will be rendered as Empty Operators, to keep the dbt's behavior of showing sources as suggested in issue #630.

Related Issue(s)

#630

Breaking Change?

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

Copy link

netlify bot commented Nov 9, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit 99d4be7
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/654c55a5e83e2700089f7f86

task_full_name = node.unique_id[len("source.") :]
task_id = f"{task_full_name}_source"
args["select"] = f"source:{node.unique_id[len('source.'):]}"
args["models"] = None
Copy link
Contributor Author

@arojasb3 arojasb3 Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana I have a feeling that this lines can be written somewhere else.
My dbt testing showed that
dbt source freshness --models [any] fails, while using the --select works. I ended up manually setting as None the models argument.
Any thoughts?

if node.has_freshness is False:
return TaskMetadata(
id=task_id,
# arguments=args,
Copy link
Contributor Author

@arojasb3 arojasb3 Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana Returning an EmptyOperator just seemed "grayish" in the dag UI. Looking at all the cool cosmos-related operators "DbtLocal..." "Dbt...", etc and having an EmptyOperator didn't seemed to go with the "cosmos pattern", Should we create a dummySourceOperator that inherits EmptyOperator just for the sake of having all Operators follow the same naming pattern?

@@ -742,3 +742,15 @@ def __init__(self, **kwargs: str) -> None:
raise DeprecationWarning(
"The DbtDepsOperator has been deprecated. " "Please use the `install_deps` flag in dbt_args instead."
)


class DbtSourceLocalOperator(DbtLocalBaseOperator):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana

Since #655 also creates a pretty similar Operator, I can wait until that is merged to follow the same naming pattern.

I tried using the DbtSourceLocalOperator name for this function to work.

@tatiana tatiana added the area:rendering Related to rendering, like Jinja, Airflow tasks, etc label Nov 9, 2023
@tatiana tatiana added the status:awaiting-reviewer The issue/PR is awaiting for a reviewer input label Dec 14, 2023
@tatiana tatiana added this to the 1.6.0 milestone Apr 30, 2024
@pankajastro
Copy link
Contributor

Hi @arojasb3, it looks like you've made great progress on this feature! Would you be able to rebase it so it can be reviewed and merged? Thanks!

pre-commit-ci bot and others added 16 commits July 13, 2024 21:02
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.1.3 →
v0.1.4](astral-sh/ruff-pre-commit@v0.1.3...v0.1.4)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Bug fixes

* Support ProjectConfig.dbt_project_path = None & different paths for Rendering and Execution by @MrBones757 in astronomer#634
* Fix adding test nodes to DAGs built using LoadMethod.DBT_MANIFEST and LoadMethod.CUSTOM by @edgga in astronomer#615

Others

* Add pre-commit hook for McCabe max complexity check and fix errors by @jbandoro in astronomer#629
* Update contributing docs for running integration tests by @jbandoro in astronomer#638
* Fix CI issue running integration tests by @tatiana in astronomer#640 and astronomer#644
* pre-commit updates in astronomer#637

(cherry picked from commit fa0620a)
…ronomer#649)

Adds a snowflake mapping for encrypted private key using an environment variable

Closes: astronomer#632

Breaking Change?
This does rename the previous SnowflakeEncryptedPrivateKeyFilePemProfileMapping to SnowflakeEncryptedPrivateKeyFilePemProfileMapping but this makes it clearer as a new SnowflakeEncryptedPrivateKeyPemProfileMapping is added which supports the env variable. Also was only released as a pre-release change
This allows you to fully refresh a model from the console.
Full-refresh/backfill is a common task. Using Airflow parameters makes
this easy. Without this, you'd have to trigger an entire deployment. In our
setup, company analysts manage their models without modifying
the DAG code. This empowers such users.

Example of usage:
```python
with DAG(
        dag_id="jaffle",
        params={"full_refresh": Param(default=False, type="boolean")},
        render_template_as_native_obj=True
):
    task = DbtTaskGroup(
        operator_args={"full_refresh": "{{ params.get('full_refresh') }}", "install_deps": True},
    )

```

Closes: astronomer#151
…opagation if desired (astronomer#648)

Add Airflow config check for cosmos/propagate_logs to allow override of
default propagation behavior.
Expose entry-point so that Airflow can theoretically detect
configuration default.

Closes astronomer#639 

## Breaking Change?

This is backward-compatible as it falls back to default behavior if the
`cosmos` section or `propagate_logs` option don't exist.

## Checklist

- [X] I have made corresponding changes to the documentation (if
required)
- [X] I have added tests that prove my fix is effective or that my
feature works

---------

Co-authored-by: Andrew Greenburg <agreenburg@vergeventures.net>
If execution_config was reused, Cosmos 1.2.2 would raise:

```
astronomer-cosmos/dags/basic_cosmos_task_group.py
Traceback (most recent call last):
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venv-38/lib/python3.8/site-packages/airflow/models/dagbag.py", line 343, in parse
    loader.exec_module(new_module)
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/basic_cosmos_task_group.py", line 74, in <module>
    basic_cosmos_task_group()
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/venv-38/lib/python3.8/site-packages/airflow/models/dag.py", line 3817, in factory
    f(**f_kwargs)
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/basic_cosmos_task_group.py", line 54, in basic_cosmos_task_group
    orders = DbtTaskGroup(
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/airflow/task_group.py", line 26, in __init__
    DbtToAirflowConverter.__init__(self, *args, **specific_kwargs(**kwargs))
  File "/Users/tati/Code/cosmos-clean/astronomer-cosmos/cosmos/converter.py", line 113, in __init__
    raise CosmosValueError(
cosmos.exceptions.CosmosValueError: ProjectConfig.dbt_project_path is mutually exclusive with RenderConfig.dbt_project_path and ExecutionConfig.dbt_project_path.If using RenderConfig.dbt_project_path or ExecutionConfig.dbt_project_path, ProjectConfig.dbt_project_path should be None
```

This has been raised by an Astro customer and our field engineer, who
tried to run: https://github.com/astronomer/cosmos-demo
Adds the `aws_session_token` argument to Athena, which was added to
dbt-athena 1.6.4 in dbt-labs/dbt-athena#459

Closes: astronomer#609

Also addresses this comment:
astronomer#578 (comment)
…AL` (astronomer#659)

Extends the local operator when running `dbt deps` with the provides
profile flags.

This makes the logic consistent between DAG parsing and task running as
referenced below

https://github.com/astronomer/astronomer-cosmos/blob/8e2d5908ce89aa98813af6dfd112239e124bd69a/cosmos/dbt/graph.py#L247-L266

Closes: astronomer#658
Since Cosmos 1.2.2 users who used `ExecutionMode.DBT_LS` (directly or
via `ExecutionMode.AUTOMATIC`) and set
`ExecutionConfig.dbt_executable_path` (most, if not all, Astro CLI
users), like:

```
execution_config = ExecutionConfig(
    dbt_executable_path = f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
)
```

Started facing the issue:
```
Broken DAG: [/usr/local/airflow/dags/my_example.py] Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/cosmos/dbt/graph.py", line 178, in load
    self.load_via_dbt_ls()
  File "/usr/local/lib/python3.11/site-packages/cosmos/dbt/graph.py", line 233, in load_via_dbt_ls
    raise CosmosLoadDbtException(f"Unable to find the dbt executable: {self.dbt_cmd}")
cosmos.dbt.graph.CosmosLoadDbtException: Unable to find the dbt executable: dbt
```

This issue was initially reported in the Airflow #airflow-astronomer
Slack channel:
https://apache-airflow.slack.com/archives/C03T0AVNA6A/p1699584315506629

The workaround to avoid this error in Cosmos 1.2.2 and 1.2.3 is to set
the `dbt_executable_path` in the `RenderConfig`:
```
render_config=RenderConfig(dbt_executable_path = f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",),
```

This PR solves the bug from Cosmos 1.2.4 onwards.
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.1.4 →
v0.1.5](astral-sh/ruff-pre-commit@v0.1.4...v0.1.5)
- [github.com/psf/black: 23.10.1 →
23.11.0](psf/black@23.10.1...23.11.0)
- [github.com/pre-commit/mirrors-mypy: v1.6.1 →
v1.7.0](pre-commit/mirrors-mypy@v1.6.1...v1.7.0)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
astronomer#660)

This PR refactors the `create_symlinks` function that was previously
used in load via dbt ls so that it can be used in
`DbtLocalBaseOperator.run_command` instead of copying the entire
directory.

Closes: astronomer#614
…stronomer#671)

Update `DbtLocalBaseOperator` code to store `compiled_sql` prior to
exception handling so that when a task fails, the `compiled_sql` can still
be reviewed.

In the process found and fixed a related bug where `compiled_sql` was
being dropped on some operations due to the way that the `full_refresh`
field was being added to the `template_fields`.

Closes astronomer#369

Fixes bug introduced in
astronomer#623 where
compiled_sql was being lost in `DbtSeedLocalOperator` and
`DbtRunLocalOperator`

Co-authored-by: Andrew Greenburg <agreenburg@vergeventures.net>
dwreeves and others added 20 commits July 13, 2024 21:05
…stronomer#1054)

Using the Airflow metadata database connection as an example connection
is misleading.

The mismatch in the environment variable value used in the Cosmos
integration tests, particularly with sqlite as the Airflow metadata
database, is an issue that can hide other underlining problems.

This PR decouples the test connection used by Cosmos example DAGs from
the Airflow metadata Database connection.

Since this change affects the Github action configuration, it will only
work for the branch-triggered GH action runs, such as:
https://github.com/astronomer/astronomer-cosmos/actions/runs/9596066209

Because this is a breaking change to the CI script itself, all the tests
`pull_request_target` are expected to fail during the PR - and will pass
once this is merged to `main`.

This improvement was originally part of astronomer#1014

---------

Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
As part of the CI build, we create a Python virtual environment with the
dependencies necessary to run the tests.

Currently, we recreate this environment variable every time a Github
Action job is run.
This PR caches the folder hatch and stores the Python virtualenv.

It seems to have helped to reduce a bit, although the jobs are still
very slow:
- Unit tests execution from
~[2:40](https://github.com/astronomer/astronomer-cosmos/actions/runs/9550554350/job/26322778438)
to
[~2:25](https://github.com/astronomer/astronomer-cosmos/actions/runs/9598977261/job/26471650029)
- Integration tests execution from
[~11:07](https://github.com/astronomer/astronomer-cosmos/actions/runs/9550554350/job/26322894839)
to
[~10:27](https://github.com/astronomer/astronomer-cosmos/actions/runs/9598977261/job/26471677561)
## Description
~shutil.copy includes permission copying via chmod.
If the user lacks permission to run chmod, a PermissionError occurs.
To avoid this, we split the operation into two steps:
first, copy the file contents; then, copy metadata if feasible without
raising exceptions.
Step 1: Copy file contents (no metadata)
Step 2: Copy file metadata (permission bits and other metadata) without
raising exception~

use shutil.copyfile(...) instead of shutil.copy(...) to avoid running
chmod

## Related Issue(s)

closes: astronomer#1008

## Breaking Change?

No

## Checklist

- [ ] I have made corresponding changes to the documentation (if
required)
- [ ] I have added tests that prove my fix is effective or that my
feature works
Add the node's attributes (config, tags, etc, ...) into a TaskInstance
context for retrieval by callback functions in Airflow through the use
of `pre_execute` to store these attributes into a task's context.

As [this
PR](https://github.com/astronomer/astronomer-cosmos/pull/700/files)
seems to be closed, and I have a use case for this feature, I attempt to
recreate the needed feature.

We leverage the `context_merge` utility function from Airflow to merge
the extra context into the `Context` object of a `TaskInstance`.

Closes astronomer#698
…#1063)

Add ability to specify `host`/`port` for Snowflake connection.

At LocalStack, we have recently started building a Snowflake emulator that allows running SF queries entirely on the local machine:
https://blog.localstack.cloud/2024-05-22-introducing-localstack-for-snowflake/

. As part of a sample application we're building, we have an Apache
Airflow DAG that uses Cosmos (and DBT) to connect to the local Snowflake
emulator running on `localhost`. Here is a link to the sample app:
localstack-samples/localstack-snowflake-samples#12

Currently, we're hardcoding this integration in the user DAG file
itself, [see
here](https://github.com/localstack-samples/localstack-snowflake-samples/pull/12/files#diff-559d4f883ad589522b8a9d33f87fe95b0da72ac29b775e98b273a8eb3ede9924R10-R19):
```
...
from cosmos.profiles.snowflake.user_pass import SnowflakeUserPasswordProfileMapping
...
SnowflakeUserPasswordProfileMapping.airflow_param_mapping["host"] = "extra.host"
SnowflakeUserPasswordProfileMapping.airflow_param_mapping["port"] = "extra.port"
...
```
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.4.9 →
v0.4.10](astral-sh/ruff-pre-commit@v0.4.9...v0.4.10)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…le (astronomer#1014)

Improve significantly the `LoadMode.DBT_LS` performance. The example
DAGs tested reduced the task queueing time significantly (from ~30s to
~0.5s) and the total DAG run time for Jaffle Shop from 1 min 25s to 40s
(by more than 50%). Some users[ reported improvements of
84%](astronomer#1014 (comment))
in the DAG run time when trying out these changes. This difference can
be even more significant on larger dbt projects.

The improvement was accomplished by caching the dbt ls output as an
Airflow Variable. This is an alternative to astronomer#992, when we cached the
pickled DAG/TaskGroup into a local file in the Airflow node. Unlike
astronomer#992, this approach works well for distributed deployments of Airflow.

As with any caching solution, this strategy does not guarantee optimal
performance on every run—whenever the cache is regenerated, the
scheduler or DAG processor will experience a delay. It was also observed
that the key value could change across platforms (e.g., `Darwin` and
`Linux`). Therefore, if using a deployment with heterogeneous OS, the
key may be regenerated often.

Closes: astronomer#990
Closes: astronomer#1061

**Enabling/disabling this feature**

This feature is enabled by default.
Users can disable it by setting the environment variable
`AIRFLOW__COSMOS__ENABLE_CACHE_DBT_LS=0`.

**How the cache is refreshed**

Users can purge or delete the cache via Airflow UI by identifying and
deleting the cache key.

The cache will be automatically refreshed in case any files of the dbt
project change. Changes are calculated using the SHA256 of all the files
in the directory. Initially, this feature was implemented using the
files' modified timestamp, but this did not work well for some Airflow
deployments (e.g., `astro --dags` since the timestamp was changed during
deployments).

Additionally, if any of the following DAG configurations are changed,
we'll automatically purge the cache of the DAGs that use that specific
configuration:
* `ProjectConfig.dbt_vars`
* `ProjectConfig.env_vars`
* `ProjectConfig.partial_parse`
* `RenderConfig.env_vars`
* `RenderConfig.exclude`
* `RenderConfig.select`
* `RenderConfig.selector`

The following argument was introduced in case users would like to define
Airflow variables that could be used to refresh the cache (it expects a
list with Airflow variable names):
* `RenderConfig.airflow_vars_to_purge_cache`

Example:
```
RenderConfig(
    airflow_vars_to_purge_cache==["refresh_cache"]
)
```

**Cache key**

The Airflow variables that represent the dbt ls cache are prefixed by
`cosmos_cache`. When using `DbtDag`, the keys use the DAG name. When
using `DbtTaskGroup`, they consider the TaskGroup and parent task groups
and DAG.

Examples:
1. The `DbtDag` "cosmos_dag" will have the cache represented by
`"cosmos_cache__basic_cosmos_dag"`.
2. The `DbtTaskGroup` "customers" declared inside teh DAG
"basic_cosmos_task_group" will have the cache key
`"cosmos_cache__basic_cosmos_task_group__customers"`.

**Cache value**

The cache values contain a few properties:
- `last_modified` timestamp, represented using the ISO 8601 format.
- `version` is a hash that represents the version of the dbt project and
arguments used to run dbt ls by the time the cache was created
- `dbt_ls_compressed` represents the dbt ls output compressed using zlib
and encoded to base64 to be recorded as a string to the Airflow metadata
database.

Steps used to compress:
```
        compressed_data = zlib.compress(dbt_ls_output.encode("utf-8"))
        encoded_data = base64.b64encode(compressed_data)
        dbt_ls_compressed = encoded_data.decode("utf-8")
```

We are compressing this value because it will be significant for larger
dbt projects, depending on the selectors used, and we wanted this
approach to be safe and not clutter the Airflow metadata database.

Some numbers on the compression
* A dbt project with 100 models can lead to a dbt ls output of 257k
characters when using JSON. Zlib could compress it by 20x.
* Another [real-life dbt
project](https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt?ref_type=heads)
with 9,285 models led to a dbt ls output of 8.4 MB, uncompressed. It
reduces to 489 KB after being compressed using `zlib` and encoded using
`base64` - to 6% of the original size.
* Maximum cell size in Postgres: 20MB

The latency used to compress is in the order of milliseconds, not
interfering in the performance of this solution.

**Future work**

* How this will affect the Airflow db in the long term
* How does this performance compare to `ObjectStorage`?

**Example of results before and after this change**

Task queue times in Astro before the change:
<img width="1488" alt="Screenshot 2024-06-03 at 11 15 26"
src="https://github.com/astronomer/astronomer-cosmos/assets/272048/20f6ae8f-02e0-4974-b445-740925ab1b3c">

Task queue times in Astro after the change on the second run of the DAG:
<img width="1624" alt="Screenshot 2024-06-03 at 11 15 44"
src="https://github.com/astronomer/astronomer-cosmos/assets/272048/c7b8a821-8751-4d2c-8feb-1d0c9bbba97e">

This feature will be available in `astronomer-cosmos==1.5.0a8`.
## Description

It appears there was an accident resolving conflicts in the changelog,
which resulted in 1.4.2 and 1.4.1 (with the content for 1.4.3) being
listed twice.

## Related Issue(s)

N/A

## Breaking Change?

No

## Checklist

- [ ] I have made corresponding changes to the documentation (if
required)
- [ ] I have added tests that prove my fix is effective or that my
feature works
Look like rendering for conf `enable_cache_dbt_ls` is broken in docs

**Before change**
<img width="834" alt="Screenshot 2024-06-27 at 1 36 27 AM"
src="https://github.com/astronomer/astronomer-cosmos/assets/98807258/38565e3c-0b23-4764-936a-be40c53c0a00">

**After change**

<img width="815" alt="Screenshot 2024-06-27 at 1 37 09 AM"
src="https://github.com/astronomer/astronomer-cosmos/assets/98807258/1c301d6a-c233-440d-801f-f9475435fc69">
Add dbt profile caching mechanism.

1. Introduced env `enable_cache_profile` to enable or disable profile
caching. This will be enabled only if global `enable_cache` is enabled.
2. Users can set the env `profile_cache_dir_name`. This will be the name
of a sub-dir inside `cache_dir` where cached profiles will be stored.
This is optional, and the default name is `profile`
3. Example Path for versioned profile:
`{cache_dir}/{profile_cache_dir}/592906f650558ce1dadb75fcce84a2ec09e444441e6af6069f19204d59fe428b/profiles.yml`
4. Implemented profile mapping hashing: first, the profile is serialized
using pickle. Then, the profile_name and target_name are appended before
hashing the data using the SHA-256 algorithm

**Perf test result:**
In local dev env with command
```
AIRFLOW_HOME=`pwd` AIRFLOW_CONN_EXAMPLE_CONN="postgres://postgres:postgres@0.0.0.0:5432/postgres"  AIRFLOW_HOME=`pwd` AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=20000 AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=20000 hatch run tests.py3.10-2.8:test-performance
```

NUM_MODELS=100
- TIME=167.45248413085938 (with profile cache enabled)
- TIME=173.94845390319824 (with profile cache disabled)

NUM_MODELS=200
- TIME=376.2585120201111 (with profile cache enabled)
- TIME=418.14210200309753 (with profile cache disabled)

Closes: astronomer#925
Closes: astronomer#647
Partial parsing support was introduced in astronomer#800 and improved in astronomer#904
(caching). However, as the caching layer was introduced, we removed
support to use partial parsing if the cache was disabled.

This PR solves the issue.

Fix: astronomer#1041
New Features

* Speed up ``LoadMode.DBT_LS`` by caching dbt ls output in Airflow
Variable by @tatiana in astronomer#1014
* Support to cache profiles created via ``ProfileMapping`` by
@pankajastro in astronomer#1046
* Support for running dbt tasks in AWS EKS in astronomer#944 by @VolkerSchiewe
* Add Clickhouse profile mapping by @roadan and @pankajastro in astronomer#353 and
astronomer#1016
* Add node config to TaskInstance Context by @linchun3 in astronomer#1044

Bug fixes

* Support partial parsing when cache is disabled by @tatiana in astronomer#1070
* Fix disk permission error in restricted env by @pankajastro in astronomer#1051
* Add CSP header to iframe contents by @dwreeves in astronomer#1055
* Stop attaching log adaptors to root logger to reduce logging costs by
@glebkrapivin in astronomer#1047

Enhancements

* Support ``static_index.html`` docs by @dwreeves in astronomer#999
* Support deep linking dbt docs via Airflow UI by @dwreeves in astronomer#1038
* Add ability to specify host/port for Snowflake connection by @whummer
in astronomer#1063

Docs

* Fix rendering for env ``enable_cache_dbt_ls`` by @pankajastro in astronomer#1069

Others

* Update documentation for DbtDocs generator by @arjunanan6 in astronomer#1043
* Use uv in CI by @dwreeves in astronomer#1013
* Cache hatch folder in the CI by @tatiana in astronomer#1056
* Change example DAGs to use ``example_conn`` as opposed to
``airflow_db`` by @tatiana in astronomer#1054
* Mark plugin integration tests as integration by @tatiana in astronomer#1057
* Ensure compliance with linting rule D300 by using triple quotes for
docstrings by @pankajastro in astronomer#1049
* Pre-commit hook updates in astronomer#1039, astronomer#1050, astronomer#1064
* Remove duplicates in changelog by @jedcunningham in astronomer#1068
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.4.10 →
v0.5.0](astral-sh/ruff-pre-commit@v0.4.10...v0.5.0)
- [github.com/asottile/blacken-docs: 1.16.0 →
1.18.0](adamchainz/blacken-docs@1.16.0...1.18.0)
- [github.com/pre-commit/mirrors-mypy: v1.10.0 →
v1.10.1](pre-commit/mirrors-mypy@v1.10.0...v1.10.1)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Teradata has
[Provider](https://airflow.apache.org/docs/apache-airflow-providers-teradata/stable/index.html)
in airflow and [adapter](https://github.com/Teradata/dbt-teradata) in
dbt. The cosmos library doesn't have profile configuration with mapping
support. This PR address this issue.

Closes: astronomer#1053
When Airflow is getting temporary AWS credentials by assuming role with
`role_arn` as only `Connection` parameter, this cause task to fail due
to missing credentials. This is due to the latest changes related to
profile caching. The `env_vars` are accessed before `profile` which, in
this case, means required values are not populated yet.
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.5.0 →
v0.5.1](astral-sh/ruff-pre-commit@v0.5.0...v0.5.1)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…ofile", target "dev" invalid: Runtime Error Must specify `schema` in Teradata profile (astronomer#1088)

`TeradataUserPassword` profile mapping throws below error for mock
profile

```
Credentials in profile "generated_profile", target "dev" invalid:
Runtime Error Must specify the schema in Teradata profile
```

Closes astronomer#1087
Copy link

netlify bot commented Jul 14, 2024

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit c0e280e
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/6693328cee35e50008d391be

@arojasb3
Copy link
Contributor Author

Hi @arojasb3, it looks like you've made great progress on this feature! Would you be able to rebase it so it can be reviewed and merged? Thanks!

I messed up the rebase, I'll just open a new PR

@arojasb3 arojasb3 closed this Jul 14, 2024
@pankajastro
Copy link
Contributor

I messed up the rebase, I'll just open a new PR

No worries! Please go ahead and open a new PR; I’d be happy to review and test it.

@arojasb3 arojasb3 mentioned this pull request Jul 19, 2024
2 tasks
tatiana pushed a commit that referenced this pull request Aug 14, 2024
Re-Opening of PR #661 

This PR features a new way of rendering source nodes:
- Check freshness for sources with freshness checks
- Source tests
- Empty operators for nodes without tests or freshness.

One of the main limitations I found while using the `custom_callback`
functions on source nodes to check freshness is that nodes were being
created on 100% of sources but not all of them required freshness
checks, this made workers waste compute time.

I'm adding a new variable into the DbtNode class called has_freshness
which would be True for sources with freshness checks and False for any
other resource type.

If this feature is enabled with the option `ALL`:
All sources with the has_freshness == False will be rendered as Empty
Operators, to keep the dbt's behavior of showing sources as suggested in
issue #630
<!-- Add a brief but complete description of the change. -->

A new rendered template field is included too: `freshness` which is the
sources.json generated by dbt when running `dbt source freshness`

This adds a new node type (source), which changes some tests behavior.
This PR also updates the dev dbt project jaffle_shop to include source
nodes when enabled.

![image](https://github.com/user-attachments/assets/e972ac58-8741-4c13-9905-e78775f9cc80)

As seen in the image, source nodes with freshness checks are rendered
with a blue color, while the ones rendered as EmptyOperator show a
white/light green color

## Related Issue(s)
Closes: #630
Closes: #572 
Closes: #875
<!-- If this PR closes an issue, you can use a keyword to auto-close.
-->
<!-- i.e. "closes #0000" -->

## Breaking Change?
This won't be a breaking change since the default behavior will still be
ignoring this new feature. That can be changed with the new RenderConfig
variable called `source_rendering_behavior`.

Co-authored-by: Pankaj <pankaj.singh@astronomer.io>
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
tatiana pushed a commit that referenced this pull request Aug 14, 2024
Re-Opening of PR #661

This PR features a new way of rendering source nodes:
- Check freshness for sources with freshness checks
- Source tests
- Empty operators for nodes without tests or freshness.

One of the main limitations I found while using the `custom_callback`
functions on source nodes to check freshness is that nodes were being
created on 100% of sources but not all of them required freshness
checks, this made workers waste compute time.

I'm adding a new variable into the DbtNode class called has_freshness
which would be True for sources with freshness checks and False for any
other resource type.

If this feature is enabled with the option `ALL`:
All sources with the has_freshness == False will be rendered as Empty
Operators, to keep the dbt's behavior of showing sources as suggested in
issue #630
<!-- Add a brief but complete description of the change. -->

A new rendered template field is included too: `freshness` which is the
sources.json generated by dbt when running `dbt source freshness`

This adds a new node type (source), which changes some tests behavior.
This PR also updates the dev dbt project jaffle_shop to include source
nodes when enabled.

![image](https://github.com/user-attachments/assets/e972ac58-8741-4c13-9905-e78775f9cc80)

As seen in the image, source nodes with freshness checks are rendered
with a blue color, while the ones rendered as EmptyOperator show a
white/light green color

Closes: #630
Closes: #572
Closes: #875
<!-- If this PR closes an issue, you can use a keyword to auto-close.
-->
<!-- i.e. "closes #0000" -->

This won't be a breaking change since the default behavior will still be
ignoring this new feature. That can be changed with the new RenderConfig
variable called `source_rendering_behavior`.

Co-authored-by: Pankaj <pankaj.singh@astronomer.io>
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rendering Related to rendering, like Jinja, Airflow tasks, etc status:awaiting-reviewer The issue/PR is awaiting for a reviewer input
Projects
None yet
Development

Successfully merging this pull request may close these issues.