-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Fix MySQL UUID generation in task_instance migration
#54814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kaxil
merged 1 commit into
apache:main
from
astronomer:fix/mysql-uuid-migration-malformed-ids
Aug 22, 2025
Merged
Fix MySQL UUID generation in task_instance migration
#54814
kaxil
merged 1 commit into
apache:main
from
astronomer:fix/mysql-uuid-migration-malformed-ids
Aug 22, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 tasks
The MySQL UUID v7 generation function in migration 0042 was creating malformed UUIDs
that fail Pydantic validation when the scheduler attempts to enqueue task instances.
## Problem
The original MySQL function had two critical issues:
1. Generated only 16 random hex characters instead of the required 20
2. Used SUBSTRING(rand_hex, 9) without length limit, producing 8-character final
segments instead of the required 12 characters
This resulted in malformed UUIDs like:
- Bad: 0198cf6d-fb98-4555-7301-e29b8403 (32 chars, last segment: 8 chars)
- Good: 0198cf6d-fb98-4555-7301-e29b8403abcd (36 chars, last segment: 12 chars)
## Simple Reproduction
The issue can be demonstrated with pure Python and the malformed UUIDs:
```python
from pydantic import BaseModel
from uuid import UUID
class TaskInstanceDemo(BaseModel):
id: UUID
# This fails with the exact error from the issue
bad_uuid = "0198cf6d-fb98-4555-7301-e29b8403" # 32 chars
TaskInstanceDemo(id=bad_uuid)
# ValidationError: Input should be a valid UUID, invalid group length in group 4: expected 12, found 8
# This works fine
good_uuid = "0198cf6d-fb98-4555-7301-e29b8403abcd" # 36 chars
TaskInstanceDemo(id=good_uuid) # ✓ Success
```
## When This Issue Occurs
The validation error happens when:
1. Task instances exist in 'scheduled' state before migrating from 2.10 to 3.0.x
2. These tasks receive malformed UUIDs during migration
3. Scheduler tries to enqueue these tasks via ExecuteTask.make()
4. Pydantic validation fails: 'invalid group length in group 4: expected 12, found 8'
Users with no scheduled tasks during migration or who create new DAG runs typically
don't encounter this issue since new task instances get proper UUIDs from the
Python uuid7() function.
## Solution
Updated the MySQL uuid_generate_v7 function to:
- Use RANDOM_BYTES(10) for cryptographically secure 20-character hex data
- Apply explicit SUBSTRING(rand_hex, 9, 12) to ensure 12-character final segment
- Mark function as NOT DETERMINISTIC (correct for random functions)
- Use CHAR(20) declaration matching actual usage
## Why No Data Migration
We decided against creating a separate migration to fix existing malformed UUIDs because:
1. **Limited scope** - Only affects task instances in 'scheduled' state during migration
2. **Self-healing** - System recovers as old tasks complete and new ones are created
3. **Risk mitigation** - Avoid complex primary key modifications in production
4. **Alternative available** - Manual fix script provided below for affected users
5. **Prevention focus** - Fixing root cause prevents future occurrences
## Manual Fix for Affected Users
If you encounter the UUID validation error, you can fix existing malformed UUIDs:
```sql
-- Fix malformed UUIDs by extending them to proper length
UPDATE task_instance
SET id = CONCAT(
SUBSTRING(id, 1, 23), -- Keep first 23 chars (including last dash)
LPAD(HEX(FLOOR(RAND() * POW(2,32))), 8, '0') -- Add 8 random hex chars
)
WHERE LENGTH(SUBSTRING_INDEX(id, '-', -1)) = 8; -- Find 8-char final segments
-- Verify the fix
SELECT id, LENGTH(id) as uuid_length,
LENGTH(SUBSTRING_INDEX(id, '-', -1)) as last_segment_length
FROM task_instance
WHERE LENGTH(SUBSTRING_INDEX(id, '-', -1)) != 12
LIMIT 5;
```
## Testing
Verified the fix generates valid UUIDs:
- Format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (36 chars total)
- Final segment: 12 characters (not 8)
- Passes standard UUID validation patterns
Fixes apache#54554
debe60a to
c48b1ef
Compare
vatsrahul1001
approved these changes
Aug 22, 2025
Backport failed to create: v3-0-test. View the failure log Run details
You can attempt to backport this manually by running: cherry_picker 600716f v3-0-testThis should apply the commit to the v3-0-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continue |
kaxil
added a commit
that referenced
this pull request
Aug 22, 2025
(cherry picked from commit 600716f)
37 tasks
mangal-vairalkar
pushed a commit
to mangal-vairalkar/airflow
that referenced
this pull request
Aug 30, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:db-migrations
PRs with DB migration
backport-to-v3-1-test
Mark PR with this label to backport to v3-1-test branch
kind:documentation
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The MySQL UUID v7 generation function in migration 0042 was creating malformed UUIDs that fail Pydantic validation when the scheduler attempts to enqueue task instances.
Problem
The original MySQL function had two critical issues:
SUBSTRING(rand_hex, 9)without length limit, producing 8-character final segments instead of the required 12 charactersThis resulted in malformed UUIDs like:
0198cf6d-fb98-4555-7301-e29b8403(32 chars, last segment: 8 chars)0198cf6d-fb98-4555-7301-e29b8403abcd(36 chars, last segment: 12 chars)Simple Reproduction
The issue can be demonstrated with pure Python and the malformed UUIDs:
When This Issue Occurs
The validation error happens when:
ExecuteTask.make()'invalid group length in group 4: expected 12, found 8'Users with no scheduled tasks during migration or who create new DAG runs typically don't encounter this issue since new task instances get proper UUIDs from the Python
uuid7()function.Solution
Updated the MySQL
uuid_generate_v7function to:RANDOM_BYTES(10)for cryptographically secure 20-character hex dataSUBSTRING(rand_hex, 9, 12)to ensure 12-character final segmentNOT DETERMINISTIC(correct for random functions)CHAR(20)declaration matching actual usageWhy No New Data Migration
I decided against creating a separate migration to fix existing malformed UUIDs because:
Manual Fix for Affected Users
If you encounter the UUID validation error, you can fix existing malformed UUIDs:
Testing
Error in Scheduler:
Closes #54554