Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profile_config arg to DbtKubernetesBaseOperator #505

Conversation

david-mag
Copy link
Contributor

@david-mag david-mag commented Sep 2, 2023

Description

When creating DbtTaskGroups and DbtDags, a ProfileConfig needs to be provided, since it's a required positional argument of the DbtAirflowConverter class. However, when providing a profile_config in execution_mode="kubernetes", the following Exception is thrown:

AirflowException: Invalid arguments were passed to DbtRunKubernetesOperator (task_id: my_first_dbt_model_run). Invalid arguments were:
**kwargs: {'profile_config': ProfileConfig(profile_name='jaffle_shop', target_name='dev', profiles_yml_filepath='jaffle_shop/profiles.yml', profile_mapping=None)}

In my humble attempt of a solution, I make the profile_config an optional argument of the DbtKubernetesBaseOperator class (mainly to satisfy the tests ... since its a required argument in the previous part of the "pipeline", having it being a required argument would make sense imho) and parse it inside the build_kube_args method (similar, but not identically) to how it is done in the run_command method of the DbtLocalBaseOperator class.

This solution fixes the thrown exception and has the added benefit of giving the user that want to run dbt inside kubernetes the option to specify --profile and --target, which is a useful feature when wanting to schedule the same model in airflow, pointing to dev and prod targets.

I've tested this solution in my GCC environment.

Related Issue(s)

This PR resolves the issue in issue #493

Breaking Change?

This should fix something that is broken atm, as far as I can see.

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

@david-mag david-mag requested a review from a team as a code owner September 2, 2023 20:53
@david-mag david-mag requested a review from a team September 2, 2023 20:53
@netlify
Copy link

netlify bot commented Sep 2, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit 776ec59
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/64f6f84973c8540008cac3d4

@david-mag david-mag mentioned this pull request Sep 2, 2023
@david-mag david-mag temporarily deployed to external September 5, 2023 09:43 — with GitHub Actions Inactive
@codecov
Copy link

codecov bot commented Sep 5, 2023

Codecov Report

Patch coverage: 71.42% and project coverage change: -0.18% ⚠️

Comparison is base (033a4b3) 91.48% compared to head (776ec59) 91.30%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #505      +/-   ##
==========================================
- Coverage   91.48%   91.30%   -0.18%     
==========================================
  Files          49       49              
  Lines        1914     1921       +7     
==========================================
+ Hits         1751     1754       +3     
- Misses        163      167       +4     
Files Changed Coverage Δ
cosmos/operators/kubernetes.py 70.00% <71.42%> (-2.05%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@jlaneve jlaneve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this looks great! we should add the same support in the Docker base operator. I've filed #512 as a follow up!

@jlaneve jlaneve merged commit 7f996a0 into astronomer:main Sep 5, 2023
@tatiana tatiana added this to the 1.1.0 milestone Sep 6, 2023
@tatiana tatiana mentioned this pull request Sep 6, 2023
tatiana added a commit that referenced this pull request Sep 6, 2023
**Features**

* Support dbt global flags (via dbt_cmd_global_flags in operator_args)
by @tatiana in #469
* Support parsing DAGs when there are no connections by @jlaneve in #489

**Enhancements**

* Hide sensitive field when using BigQuery keyfile_dict profile mapping
by @jbandoro in #471
* Consistent Airflow Dataset URIs, inlets and outlets with `Openlineage
package <https://pypi.org/project/openlineage-integration-common/>`_ by
@tatiana in #485. `Read more
<https://astronomer.github.io/astronomer-cosmos/configuration/lineage.html>`_.
* Refactor ``LoadMethod.DBT_LS`` to run from a temporary directory with
symbolic links by @tatiana in #488
* Run ``dbt deps`` when using ``LoadMethod.DBT_LS`` by @DanMawdsleyBA in
#481
* Update Cosmos log color to purple by @harels in #494
* Change operators to log ``dbt`` commands output as opposed to
recording to XCom by @tatiana in #513

**Bug fixes**

* Fix bug on select node add exclude selector subset ids logic by
@jensenity in #463
* Refactor dbt ls to run from a temporary directory, to avoid Read-only
file system errors during DAG parsing, by @tatiana in #414
* Fix profile_config arg in DbtKubernetesBaseOperator by @david-mag in
#505
* Fix SnowflakePrivateKeyPemProfileMapping private_key reference by
@nacpacheco in #501
* Fix incorrect temporary directory creation in VirtualenvOperator init
by @tatiana in #500
* Fix log propagation issue by @tatiana in #498
* Fix PostgresUserPasswordProfileMapping to retrieve port from
connection by @jlneve in #511

**Others**

* Docs: Fix RenderConfig load argument by @jbandoro in #466
* Enable CI integration tests from external forks by @tatiana in #458
* Improve CI tests runtime by @tatiana in #457
* Change CI to run coverage after tests pass by @tatiana in #461
* Fix forks code revision in code coverage by @tatiana in #472
* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #467
* Drop support to Python 3.7 in the CI test matrix by @harels in #490
* Add Airflow 2.7 to the CI test matrix by @tatiana in #487
* Add MyPy type checks to CI since we exceeded pre-commit disk quota
usage by @tatiana in #510
@agneku
Copy link

agneku commented Sep 8, 2023

Hi @david-mag, @jlaneve, I'm now trying to create a DbtDag with execution_mode='kubernetes'. Even though I don't receive the profile_config error anymore (since the 1.1.0 release), I got a new error:
airflow.exceptions.AirflowException: Invalid arguments were passed to DbtRunKubernetesOperator (task_id: final_model2_run). Invalid arguments were: **kwargs: {'emit_datasets': True}

The emit_datasets arg is not something I set on my own. Here's the DbtDag for reference:

transform_data = DbtDag(
    profile_config=profile_config,
    project_config=ProjectConfig(DBT_PROJECT_PATH),
    execution_config=ExecutionConfig(
        execution_mode=ExecutionMode.KUBERNETES,
    ),
    operator_args={
        "queue": "kubernetes",
        "image": DBT_IMAGE,
        "image_pull_policy": "Always",
        "get_logs": True,
        "is_delete_operator_pod": False,
    },
    dag_id = "dbt_test",
    start_date=datetime(2023, 8, 1),
    schedule=None,
    catchup=False,
)

is this a related issue?

@david-mag
Copy link
Contributor Author

As far as I can see, with the release of this fix, more "unknown" arguments have been added, which broke the kubernetes DAGs and TaskGroups again right away.

@qimumu9406
Copy link

@david-mag I also encountered the same problem. If using cosmos on kubernetes, it seems this is a fatal problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants