Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch race condition in PostHogFileBackedQueue.deleteFiles #218

Merged
merged 3 commits into from
Oct 22, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions PostHog/PostHogFileBackedQueue.swift
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,14 @@ class PostHogFileBackedQueue {

private func deleteFiles(_ count: Int) {
for _ in 0 ..< count {
if items.isEmpty { return }
let removed = items.remove(at: 0) // We always remove from the top of the queue

guard let removed: String = _items.mutate({ items in
if items.isEmpty {
return nil
}
return items.remove(at: 0) // We always remove from the top of the queue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you reproduce this issue since the whole block is locked with the isFlushingLock and isFlushing flag? Can you pass the specific steps to reproduce the issue so we can check if the very same issue happens in more places as well?
I don't see how the remove(at: 0) would fail if the internal items list isn't modified anywhere else outside of this lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this call here is outside the isFlushingLock

func add(_ event: PostHogEvent) {
        if fileQueue.depth >= config.maxQueueSize {
            hedgeLog("Queue is full, dropping oldest event")
            // first is always oldest
            fileQueue.delete(index: 0)
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm in this case we might have a bug deleting the wrong file as well.
I think that part isn't locked to not block the calling thread on add, but we should do something about it then.
Can be a different issue/PR though, not blocking this one as its avoiding a crash already

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we definitely shouldn't wait on a lock here. Thing is that we can't assume that this is aways called on main thread either.

Maybe it's best to try and recreate this crash with unit tests with background queue and DispatchQueue.concurrentPerform(iterations:)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the thread is easy (Thread.isMainThread), but independently of the thread, the add method should be run within the dispatchQueue.async block and then it's ok to lock with isFlushingLock and await its availability.
Again, another PR, fixing the crash here is more important now, but a follow-up with the improvement above would be cool.

fileQueue.delete(index: 0)
is an edge case so I bet rarely happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw I only got one crash report with this call stack.

}) else {
continue
}
deleteSafely(queue.appendingPathComponent(removed))
jverkoey marked this conversation as resolved.
Show resolved Hide resolved
}
}
Expand Down
5 changes: 3 additions & 2 deletions PostHog/Utils/ReadWriteLock.swift
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,11 @@ public final class ReadWriteLock<Value> {
/// The lock will be acquired once for writing before invoking the closure.
///
/// - Parameter closure: The closure with the mutable value.
public func mutate(_ closure: (inout Value) -> Void) {
public func mutate<T>(_ closure: (inout Value) -> T) -> T {
jverkoey marked this conversation as resolved.
Show resolved Hide resolved
pthread_rwlock_wrlock(&rwlock)
closure(&value)
let result = closure(&value)
pthread_rwlock_unlock(&rwlock)
return result
jverkoey marked this conversation as resolved.
Show resolved Hide resolved
}
}

Expand Down