feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested)#5793
Conversation
Signed-off-by: Vincent Huang <vhuang@squareup.com>
Signed-off-by: Vincent Huang <vhuang@squareup.com>
There was a problem hiding this comment.
Pull Request Overview
This PR re-enables SQLite WAL (Write-Ahead Logging) mode with explicit transaction management to fix a race condition where get_session could read stale data immediately after create_session. The solution adds explicit tx.commit() calls to ensure WAL changes are visible to concurrent readers.
Key changes:
- Enabled WAL journal mode in SQLite connection options
- Added explicit transaction management with commit calls to write operations
- Ensures WAL checkpoints occur at transaction boundaries for consistent reads
| tx.commit().await?; | ||
|
|
||
| if let Some(conversation) = &session.conversation { | ||
| self.replace_conversation(&session.id, conversation).await?; | ||
| } |
There was a problem hiding this comment.
The transaction commits before calling replace_conversation, breaking atomicity of the import. If replace_conversation fails, the session will be imported without messages. Consider including the conversation import in the same transaction, or document why this split is intentional.
| tx.commit().await?; | |
| if let Some(conversation) = &session.conversation { | |
| self.replace_conversation(&session.id, conversation).await?; | |
| } | |
| if let Some(conversation) = &session.conversation { | |
| self.replace_conversation_tx(&session.id, conversation, &mut tx).await?; | |
| } | |
| tx.commit().await?; |
There was a problem hiding this comment.
I was on the fence with changing this, replace_conversation already does it's own commit independent of import_session. Easiest solution would be to unwrap replace_conversation inside of import_session.
If we move towards foreign keys then messages should not be session orphaned by default and can be addressed there.
…xt-test * 'main' of github.com:block/goose: chore: Add Adrian Cole to Maintainers (#5815) [MCP-UI] Proxy and Better Message Handling (#5487) Release 1.15.0 Document New Window menu in macOS dock (#5811) Catch cron errors (#5707) feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) (#5793) chore: remove autopilot experimental feature (#5781) Read paths from an interactive & login shell (#5774) docs: acp clients (#5800)
* main: feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) (#5793) chore: remove autopilot experimental feature (#5781) Read paths from an interactive & login shell (#5774) docs: acp clients (#5800) Provider error proxy for simulating various types of errors (#5091) chore: Add links to maintainer profiles (#5788) Quick fix for community all stars script (#5798) Document Mistral AI provider (#5799) docs: Add Community Stars recipe script and txt file (#5776)
* main: (33 commits) fix: support Gemini 3's thought signatures (#5806) chore: Add Adrian Cole to Maintainers (#5815) [MCP-UI] Proxy and Better Message Handling (#5487) Release 1.15.0 Document New Window menu in macOS dock (#5811) Catch cron errors (#5707) feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) (#5793) chore: remove autopilot experimental feature (#5781) Read paths from an interactive & login shell (#5774) docs: acp clients (#5800) Provider error proxy for simulating various types of errors (#5091) chore: Add links to maintainer profiles (#5788) Quick fix for community all stars script (#5798) Document Mistral AI provider (#5799) docs: Add Community Stars recipe script and txt file (#5776) chore: incorporate LF feedback (#5787) docs: quick launcher (#5779) Bump auto scroll threshold (#5738) fix: add one-time cleanup for linux hermit locking issues (#5742) Don't show update tray icon if GOOSE_VERSION is set (#5750) ...
…ification Requested) (block#5793) Signed-off-by: Vincent Huang <vhuang@squareup.com> Signed-off-by: Blair Allan <Blairallan@icloud.com>
Summary
There seems to be a WAL race condition within the
builder.rsfile where a session is created and get session is called immediately after, which in some instances can fail because the get_session call is looking at an old version of the sessions db + WAL from one thread before the create_session changes finishes propagating.I believe the culprit to be the lack of an explicit
committransaction with WAL enabled. In the concurrency section of sqlite documentation,When a read operation begins on a WAL-mode database, it first remembers the location of the last valid commit record in the WAL. So even though we were relying on concurrency throughawait?;create_session never explicitly called commit, which possibly resulted inget_session"misses" on an old version of the database.This could also explain why the Pragma wal_checkpoint approach didn't work as the checkpoint didn't have a completed commit to apply WAL file changes to the database.
Type of Change
AI Assistance
Testing
I was unable to reproduce on multiple linux docker images, I went ahead and reproduced the "bug" (
create_sessionandget_sessionrace condition) by writing concurrent create_session -> get_session race condition tests and ran them a few thousand times.Ran a timing check on an existing concurrency test.
time cargo test test_concurrent_session_creation --release -- --test-threads=1;Eyeballing the results, WAL takes about
.763sand without takes.848s, about a 10% improvement.Related Issues
Relates to #5197
Discussion: