Fix database commit timeout - sync hangs on write phase#593
Merged
Conversation
…hangs The sync was hanging on database commit (not GAM API calls). This adds a 120-second timeout to the _flush_batch commit operation to prevent indefinite hangs caused by lock contention or large transactions. Root cause: AccuWeather syncs were hanging on 'Writing Targeting Keys to DB' phase because db.commit() can block indefinitely if there's lock contention. Fixes the issue where GAM API discovery completed successfully but database writes hung forever.
…syncs Root cause: Database connections timeout after ~15min of inactivity, but sync holds same session for 30+ min. When connection is lost, queries hang waiting for response that never comes. Fixes: 1. Add 'SELECT 1' connection keep-alive before critical queries 2. Add expire_all() to clear stale session state 3. Catch OperationalError/DBAPIError for lost connections 4. Better error messages indicating connection loss vs lock contention This is the correct 'out of the box' solution - we commit after each batch (short transactions) but use connection keep-alive for long-lived sessions. No need to create new sessions per batch - that's wasteful. The session can be long-lived as long as we verify connection health periodically.
danf-newton
pushed a commit
to Newton-Research-Inc/salesagent
that referenced
this pull request
Nov 24, 2025
* Add 120s timeout to database commit operations to prevent indefinite hangs The sync was hanging on database commit (not GAM API calls). This adds a 120-second timeout to the _flush_batch commit operation to prevent indefinite hangs caused by lock contention or large transactions. Root cause: AccuWeather syncs were hanging on 'Writing Targeting Keys to DB' phase because db.commit() can block indefinitely if there's lock contention. Fixes the issue where GAM API discovery completed successfully but database writes hung forever. * Add connection keep-alive and better error handling for long-running syncs Root cause: Database connections timeout after ~15min of inactivity, but sync holds same session for 30+ min. When connection is lost, queries hang waiting for response that never comes. Fixes: 1. Add 'SELECT 1' connection keep-alive before critical queries 2. Add expire_all() to clear stale session state 3. Catch OperationalError/DBAPIError for lost connections 4. Better error messages indicating connection loss vs lock contention This is the correct 'out of the box' solution - we commit after each batch (short transactions) but use connection keep-alive for long-lived sessions. No need to create new sessions per batch - that's wasteful. The session can be long-lived as long as we verify connection health periodically.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
AccuWeather syncs were hanging on the database write phase, not GAM API calls. The sync would complete discovery successfully but then hang indefinitely on "Writing Targeting Keys to DB".
Root Cause
The _flush_batch() method calls self.db.commit() which can block indefinitely if there's:
Our previous timeout fixes only covered GAM API discovery operations, not database operations.
Solution
Added a 120-second timeout to the database commit operation in _flush_batch()
Impact
Testing
Observed in production:
With this fix, database commits will timeout gracefully instead of hanging indefinitely.
Related: #587 (GAM API timeouts)