-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Migrate retry handler in task SDK API client to use tenacity instead of retryhttp #56762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate retry handler in task SDK API client to use tenacity instead of retryhttp #56762
Conversation
kaxil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a bug that retryhttp brings in requests!
They do have the codebase structured and documentation that you can either use httpx or requests. Let me create a PR on their repo.
|
Created a PR on their repo austind/retryhttp#26 to fix that cc @austind |
That can't be true though. Something must be wrong in how you are testing it. Stamina uses tenacity under the hood. |
|
@kaxil yes you are right, something was wrong with my setup. Testing using simple CLI now and tenacity stands better: cd /Users/amoghdesai/Documents/OSS/repos/airflow
source .venv/bin/activate
cat > /tmp/test_retry_tenacity.py << 'EOF'
from tenacity import retry
print("Imported retry from tenacity")
EOF
cat > /tmp/test_retry_stamina.py << 'EOF'
from stamina import retry
print("Imported retry from stamina")
EOF
memray run -o retry-tenacity.bin /tmp/test_retry_tenacity.py
memray run -o retry-stamina.bin /tmp/test_retry_stamina.py
echo "=== TENACITY ==="
memray stats retry-tenacity.bin | head -20
echo ""
echo "=== STAMINA ==="
memray stats retry-stamina.bin | head -20Results: TENACITY: from tenacity import retry STAMINA: from stamina import retry |
|
@amoghrajesh It might be worth adding |
kaxil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few comments but overall lgtm.
cc @jscheffl you might be interested since you added it
|
Yeah looking at it! Also have a gap in retryhttp as I would like to switch to aiohttp in edge and thought about how to continue. Might be I follow your path... |
Good idea. Ran a test for that, posting results below. |
Test Setup
"""Test baseline: pydantic + httpx only"""
import pydantic
import httpx
from pydantic import BaseModel
class TestModel(BaseModel):
name: str
model = TestModel(name="test")
client = httpx.Client()
client.close()
print("Baseline: pydantic + httpx loaded")
"""Test tenacity with equivalent retry configuration"""
import pydantic
import httpx
import tenacity
from pydantic import BaseModel
from tenacity import retry, retry_if_exception, stop_after_attempt, wait_exponential
class TestModel(BaseModel):
name: str
model = TestModel(name="test")
client = httpx.Client()
client.close()
# Equivalent retry configuration
@retry(
retry=retry_if_exception(lambda e: isinstance(e, Exception)),
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=0.1, max=1.0),
reraise=True,
)
def dummy():
pass
print("Tenacity with equivalent config loaded")
"""Test stamina with equivalent retry configuration"""
import pydantic
import httpx
import stamina
from pydantic import BaseModel
class TestModel(BaseModel):
name: str
model = TestModel(name="test")
client = httpx.Client()
client.close()
# Equivalent retry configuration
@stamina.retry(
on=Exception,
attempts=3,
wait_initial=0.1,
wait_max=1.0,
wait_jitter=1.0,
)
def dummy():
pass
print("Stamina with equivalent config loaded")
"""Test retryhttp with equivalent retry configuration"""
import pydantic
import httpx
import retryhttp
from pydantic import BaseModel
from retryhttp import retry
class TestModel(BaseModel):
name: str
model = TestModel(name="test")
client = httpx.Client()
client.close()
# Equivalent retry configuration (retryhttp uses different API)
@retry(
max_attempt_number=3,
# retryhttp uses different wait strategies - using default exponential
)
def dummy():
pass
print("RetryHTTP with equivalent config loaded")Run profilingmemray run --output baseline.bin test_baseline.py
memray run --output tenacity.bin test_tenacity.py
memray run --output stamina.bin test_stamina.py
memray run --output retryhttp.bin test_retryhttp.py
memray summary baseline.bin
memray summary tenacity.bin
memray summary stamina.bin
memray summary retryhttp.binBenchmark Results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall — clear improvement and simpler deps.
I did not understand completely why we removed 429 retry logic. Earlier (per the PR description and early commits), the Task SDK’s retry logic used to respect Retry-After headers from 429 responses - pausing before retrying.
In the final version, the _should_retry_api_request predicate only retries on:
-
httpx.RequestError(network / timeout) -
httpx.HTTPStatusErrorwith status ≥ 500
It no longer retries on 429, nor does it read Retry-After. Why?
Thanks @sunank200, the reason for that is simple -- API server doesn't have feature support for rate limiting! So it doesnt make sense to have the client support it when server doesn't: #56762 (comment) |
Yes, I'd also apply this to edge but I am still hoping for the PR of Kaxil merged and then followed by austind/retryhttp#28 as I'd like to change edge to use asyncio, then I'd need aiohttp support... and post the fix from Kaxil no dependency to requests anymore so can still keep retryhttp... |



Motivation
The task SDK uses httpx for HTTP operations but depends on
retryhttpfor retry logic. This brings in the entirerequestslibrary as a transitive dependency, even though it's never actually used.Memray reports also show some stats which can be improved.
For
import retryhttp:This is because retryhttp unconditionally imports both httpx and requests, even though we only use httpx.
The retry handler works fine, but it's unnecessary bloat. We're already using tenacity in sdk client, and we only use httpx, not requests, so might as well use tenacity for better memory results and reduced footprint.
Alternatives Considered
1. Switch to stamina
2. Use pure tenacity
Went ahead and selected tenacity because it is a battle tested library already present in task sdk and is used often and there is no need for a new library when the benefits of memory footprint (Net savings: ~444KB vs ~544KB with pure tenacity) aren't significantly high.
Changes of note
Migrated task SDK to use tenacity instead of retryhttp while maintaining total parity with what retryhttp offered. A lot of code thats written is inspired by code of retryhttp!
Tenacity is a generic retry library - it doesn't know anything about HTTP. It just retries when you tell it to, so to maintain parity, some wrappers and helpers had to be written.
_should_retry_api_request(exception)This function determines which errors should trigger a retry. It replicates retryhttp's behavior of retrying on:
But NOT retrying on client errors like 404 or 401, which would be pointless.
NOTE: Behaviour for 429 status code (rate limit) has been removed because the API server doesn't support rate limiting as of now, and we can add that support to the client as we need it or as ready.
How parity was maintained
The original retryhttp decorator looked like this:
Each of those parameters mapped to specific HTTP behaviors. Our new decorator with tenacity:
Removed from dependencies:
retryhttprequeststypes-requestsHow this was tested
Unit tests
All existing retry tests pass without modification:
Integration testing
Ran the task SDK against a live API server and killed the server mid request. The client correctly retried with exponential backoff and recovered when the server came back up, confirming network error handling works as expected.
Benefits
Memray results
Importing
import tenacityand showing difference:Flamegraph (now):

Flamegraph (earlier):
Memory Graph (now):
Memory Graph (before):
Stats (now):
Stats (earlier):
Package size
INSTALLED DEPENDENCIES SIZE:
Removed packages:
• retryhttp ..................... 40K
• requests ..................... 228K
• urllib3 (dep of requests) .... 488K
• charset_normalizer (dep) ..... 808K
• idna (dep) ................... 352K
• certifi (dep) ................ 296K
─────────────────────────────────────
TOTAL DISK SAVINGS: ........... ~2.2 MB
Workers do not need to have these dependencies anymore.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.