Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-65: Add DAG versioning support #42913

Merged
merged 49 commits into from
Nov 5, 2024
Merged

Conversation

ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Oct 10, 2024

This commit introduces versioning for DAGs.

Changes:

  • Introduced DagVersion model to handle versioning of DAGs.
  • Added version_name field to DAG for use in tracking the dagversion by users
  • Modified DAG execution logic to reference dag_version_id instead of the dag_hash to ensure DAG runs are linked to specific versions.

The table relations:
Screenshot 2024-10-25 at 09 21 19

The versioning is based on the serialized dict changing. If a dag's serialized dict changes, a new serialized dag will be registered based on the hash diference, and consequently, a new dag version and dag code. The link from dag_version to TI is because of TaskInstance clearing. It helps us retain the previous dag version the task ran with.

Closes: #42333, #42334, #42336

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:CLI area:db-migrations PRs with DB migration area:dev-tools area:Scheduler including HA (high availability) scheduler area:serialization area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation labels Oct 10, 2024
@ephraimbuddy ephraimbuddy force-pushed the versioned-dag2 branch 5 times, most recently from ad4f57c to dd272b0 Compare October 16, 2024 12:27
@ephraimbuddy ephraimbuddy added the legacy api Whether legacy API changes should be allowed in PR label Oct 16, 2024
@ephraimbuddy ephraimbuddy force-pushed the versioned-dag2 branch 3 times, most recently from eb13cdd to 1b81f2f Compare October 16, 2024 13:27
@ephraimbuddy ephraimbuddy marked this pull request as ready for review October 16, 2024 13:27
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good enough

@ephraimbuddy ephraimbuddy merged commit 1116f28 into apache:main Nov 5, 2024
109 checks passed
@ephraimbuddy ephraimbuddy deleted the versioned-dag2 branch November 5, 2024 14:19
potiuk added a commit that referenced this pull request Nov 6, 2024
potiuk added a commit that referenced this pull request Nov 6, 2024
* Revert "Delete the Serialized Dag and DagCode before DagVersion migration (#43700)"

This reverts commit 438f71d.

* Revert "AIP-65: Add DAG versioning support (#42913)"

This reverts commit 1116f28.
ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
* AIP-65: Add DAG versioning support

This commit introduces versioning for DAGs

Changes:
- Introduced DagVersion model to handle versioning of DAGs.
- Added version_name field to DAG for use in tracking the dagversion by users
- Added support for version retrieval in the get_dag_source API endpoint
- Modified DAG execution logic to reference dag_version_id instead of the
dag_hash to ensure DAG runs are linked to specific versions.

Fix tests

revert RESTAPI changes

* fixup! AIP-65: Add DAG versioning support

* fixup! fixup! AIP-65: Add DAG versioning support

* fix migration

* fix test

* more test fixes

* update query count

* fix static checks

* Fix query and add created_at to dag_version table

* improve code

* Change to using UUID for primary keys

* DagCode.bulk_write_code is no longer used

* fixup! Change to using UUID for primary keys

* fix tests

* fixup! fix tests

* use uuid for version_name

* fixup! use uuid for version_name

* use row lock when writing dag version

* use row lock when writing dag version

* fixup! use row lock when writing dag version

* deactivating dag should not remove serialized dags

* save version_name as string not uuid

* Make dag_version_id unique

* fixup! Make dag_version_id unique

* Fix tests

* Use uuid7

* fix test

* fixup! fix test

* use binary=False for uuid field to fix sqlite issue

* apply suggestions from code review

* Remove unnecessary version_name on dagmodel

* Fix sqlalchemy 2 warning

* Fix conflicts

* Apply suggestions from code review

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>

* fixup! Apply suggestions from code review

* fixup! fixup! Apply suggestions from code review

* add test for dagversion model and make version_name, number and dag_id unique

* Remove commented test as serdag can no longer disappear

* Add SQLAlchemy-utils to requirements

* mark test_dag_version.py as db_test

* make version_name nullable

* Apply suggestions from code review

* fixup! Apply suggestions from code review

* remove file_updater

* Use dag_version for creating dagruns instead of dag_version_id

* fix conflicts

* use if TYPE_CHECKING

* Add docstrings to methods

* Move getting latest serdags to SerializedDagModel
ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
* Revert "Delete the Serialized Dag and DagCode before DagVersion migration (apache#43700)"

This reverts commit 438f71d.

* Revert "AIP-65: Add DAG versioning support (apache#42913)"

This reverts commit 1116f28.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-65: DAG history in UI area:API Airflow's REST/HTTP API area:CLI area:db-migrations PRs with DB migration area:dev-tools area:Scheduler including HA (high availability) scheduler area:serialization area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation legacy api Whether legacy API changes should be allowed in PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calculate and track DAG version
7 participants