RxAtomic causes Thread Sanitiser race conditions #1853

andyj-at-aspin · 2019-01-15T13:24:35Z

Short description of the issue:

The newly added AtomicInt.swift causes data race reports from the Xcode Thread Sanitiser when running in DEBUG on iOS Simulator

Expected outcome:

No reports of data races.

What actually happens:

Any subroutines wrapped in TRACE_RESOURCES involved with incrementing or decrementing counters can trigger Thread Sanitiser data race reports.

Self contained code example that reproduces the issue:

  var atomicTester = AtomicInt(0)

        for i in 0...100 {
            DispatchQueue.global(qos: .background).async {
                if i % 2 == 0 {
                    self.atomicTester.increment()
                } else {
                    self.atomicTester.decrement()
                }
            }
        }

RxSwift/RxCocoa/RxBlocking/RxTest version/commit

4.4.0

Platform/Environment

[* ] iOS
[ *] macOS
[*] tvOS
[* ] watchOS
[* ] playgrounds

How easy is to reproduce? (chances of successful reproduce after running the self contained code)

[* ] easy, 100% repro
sometimes, 10%-100%
hard, 2% - 10%
extremely hard, %0 - 2%

Xcode version:

10.1 (10B61)

⚠️ Fields below are optional for general issues or in case those questions aren't related to your issue, but filling them out will increase the chances of getting your issue resolved. ⚠️

Installation method:

CocoaPods
Carthage
[*] Git submodules

I have multiple versions of Xcode installed:
(so we can know if this is a potential cause of your issue)

yes (which ones)
[ *] no

Level of RxSwift knowledge:
(this is so we can understand your level of knowledge
and formulate the response in an appropriate manner)

just starting
I have a small code base
[* ] I have a significant code base

The text was updated successfully, but these errors were encountered:

andyj-at-aspin · 2019-01-15T13:32:49Z

For what it's worth, I am in no way suggesting that AtomicInt is in itself not thread safe as I can see it's wrapping stdatomic's functions, however the fact that the swift functions AtomicInt.increment() and AtomicInt.decrement() are declared as mutable is tripping the Thread Sanitiser up.

As observed events are triggered by multiple events in my source project and often observed on alternate threads (i.e. the main one) the new resource tracing checks are constantly tripping the sanitiser. I have had to resort to disabling all sanitiser checks for the whole RxSwift framework in order to continue allowing checking of my own projects.

freak4pc · 2019-01-15T13:33:50Z

Is this basically a false positive of Thread Sanitizer?

andyj-at-aspin · 2019-01-15T13:36:35Z

As I have added above, it is, but to shut it up I have to stop all sanitisation checks for the Reactive framework. Unfortunately it's a 'feature' of swift, unless there's a way to have increment() and decrement() call sites that aren't 'mutable'.

kzaher · 2019-01-18T10:26:23Z

Ok, should we ditch RxAtomic, introduce NSRecursiveLock and just be done with it? Is there any other viable alternative?

We will probably be slightly slower, but I don't think we should be more then a few percentages slower.

freak4pc · 2019-01-18T10:32:28Z

Did we ever benchmark how bad it would be? I imagine it shouldn't be too noticeable on >95% of the systems

kzaher · 2019-01-18T10:41:17Z

I guess the difference depends on the concrete chain being tested so it's hard to quantify the exact impact. The smaller the Swift overhead, the bigger the impact is.

We can rerun the couple of performance tests we have and assess the difference there.

LucianoPAlmeida · 2019-01-18T12:19:13Z

Hey @kzaher
Maybe pthread API's like pthread_mutex_t/ pthread_rwlock_t or os_unfair_lock could be a viable options here? Would NSRecursiveLock will work on Linux? Not sure

freak4pc · 2019-01-19T20:31:52Z

There are some interesting thoughts here: http://www.cocoawithlove.com/blog/2016/06/02/threads-and-mutexes.html

LucianoPAlmeida · 2019-01-20T04:18:57Z

Interesting article @freak4pc :))
Personally, I'm always biased to use p_thread_mutex because it is fast and can work the same as a NSLock or NSRecursiveLock if we use PTHREAD_MUTEX_RECURSIVE attr.
Also, there are some more benchmarks here: http://www.vadimbulavin.com/benchmarking-locking-apis/

kzaher · 2019-01-21T09:06:29Z

I've tried running performance tests with the following implementation of AtomicInt.

I got 10-17% worse performance on Benchmarks. (Mostly around 10%)

https://github.com/ReactiveX/RxSwift/tree/mutex_atomic

Feel free to try to improve this.

Anyway, if there is no other alternative, I'll merge this, but it seems a bit wasteful :(

final class AtomicInt {
    private var mutex: pthread_mutex_t
    private var value: Int32

    init(_ initialValue: Int32) {
        self.value = initialValue

        var attr = pthread_mutexattr_t()
        pthread_mutexattr_init(&attr); defer { pthread_mutexattr_destroy(&attr) }

        self.mutex = pthread_mutex_t()
        guard pthread_mutex_init(&self.mutex, &attr) == 0 else { fatalError() }
    }

    @discardableResult
    func increment() -> Int32 {
        guard pthread_mutex_lock(&self.mutex) == 0 else { fatalError() }
        let oldValue = self.value
        self.value += 1
        guard pthread_mutex_unlock(&self.mutex) == 0 else { fatalError() }
        return oldValue
    }

    @discardableResult
    func decrement() -> Int32 {
        guard pthread_mutex_lock(&self.mutex) == 0 else { fatalError() }
        let oldValue = self.value
        self.value -= 1
        guard pthread_mutex_unlock(&self.mutex) == 0 else { fatalError() }
        return oldValue
    }

    func isFlagSet(_ mask: Int32) -> Bool {
        guard pthread_mutex_lock(&self.mutex) == 0 else { fatalError() }
        let oldValue = self.value
        guard pthread_mutex_unlock(&self.mutex) == 0 else { fatalError() }
        return oldValue & mask != 0
    }

    @discardableResult
    func fetchOr(_ mask: Int32) -> Int32 {
        guard pthread_mutex_lock(&self.mutex) == 0 else { fatalError() }
        let oldValue = self.value
        self.value |= mask
        guard pthread_mutex_unlock(&self.mutex) == 0 else { fatalError() }
        return oldValue
    }

    func load() -> Int32 {
        guard pthread_mutex_lock(&self.mutex) == 0 else { fatalError() }
        let oldValue = self.value
        guard pthread_mutex_unlock(&self.mutex) == 0 else { fatalError() }
        return oldValue
    }
}

freak4pc · 2019-01-21T21:43:55Z

The performance hit is quite large - I didn't expect it to be this bad.
Should we really bother replacing it if its just a false positive? (the thread sanitizer warning, that is).

Also, did you make an attempt on NSRecursiveLock? Was it even slower?

kzaher · 2019-01-21T21:45:20Z

@freak4pc Feel free to play with it. But I'm not sure I know how to do better than this. AtomicInt were really fast :)

I'm not sure is there a way to silence the thread sanitizer? Somehow I doubt it.

freak4pc · 2019-01-21T21:46:50Z

Yeah I understand the silencing part - but is it really worth 10% performance downgrade just for a warning? I guess I mainly wonder if it would be at all noticeable. I'll see if I can actually play with it tomorrow.

What did you use for the benchmark?

kzaher · 2019-01-21T21:48:07Z

There is a benchmark target inside or the Rx project. I think it's called Benchmark or something really similar to that. It's not the Microoptimizations target.

LucianoPAlmeida · 2019-01-22T02:44:49Z

Hey @kzaher, @freak4pc
Was playing with these two things and added two things that I thought would give a performance improvement, but in the end, there was not :((

Try to use the non-overflow check option (&+=) of += and -=. Thought it would be an improvement, but really wasn't :(( or I just did the benchmark wrong 🤣
Adding @inline attribute to the AtomicInt functions, although since is an internal final class, I think the optimizer would've done it anyway in -O. But just to give it a hit.

Also, notice that the Benchmarks target was with Debug config on the Test action which I think will compile -Onone and not give us a fair benchmark, so changed to Release-Test to compile -O.

The modifications are here https://github.com/LucianoPAlmeida/RxSwift/tree/mutex_atomic

Question, I'm not familiar with the usage of AtomicInt on the project, so maybe it is not required, but it wouldn't be good to add PTHREAD_MUTEX_RECURSIVE attr on the mutex to avoid deadlock, just for safety?

LucianoPAlmeida · 2019-01-22T02:58:28Z

Also, did you make an attempt on NSRecursiveLock? Was it even slower?

@freak4pc All the benchmarks I've seen on Darwin with Apple Foundation it shows that it is slower. And on Linux with Corelibs Foundation, the NSRecusiveLock is implemented as a wrapper on top of pthread (see here), so there's no difference I think :))

kzaher · 2019-01-24T21:05:16Z

I somehow doubt that we'll come close to the speed of the current version of AtomicInt :)

freak4pc · 2019-01-24T21:08:16Z

The real question is whether we should replace AtomicInt just because of a thread sanitizer false positive bug 🤔

kzaher · 2019-01-24T21:10:08Z

Does anybody know how does the ThreadSanitizer actually work, is there some way to plug into it and mark the memory as properly synchronized?

LucianoPAlmeida · 2019-01-24T23:06:24Z

@kzaher
I think the attribute((no_sanitize("thread")) do exactly that :))
You can see more info about it here

LucianoPAlmeida · 2019-01-24T23:11:54Z

Actually, reading more about it maybe SanitizerSpecialCaseList will be the one to use here :)
Edit: Maybe not, because we have to pass that blacklist file as a compilation option. So probably we just annotate the functions with no_sanitize("thread") and will be fine.

LucianoPAlmeida · 2019-01-24T23:38:47Z

I had some free time so, did #1860 maybe it can be useful :))

kzaher · 2019-01-31T15:32:21Z

@andyj-at-aspin I've merged fixes for TSan, please check the master branch. I would like to close this issue.

Thnx a lot @LucianoPAlmeida .

andyj-at-aspin · 2019-02-01T11:13:55Z

The issue no longer occurs with pulls from the master branch and I have verified that RxSwift is very definitely making use of the restructured AtomicInt objects. Thanks so much, guys. I can't wait for this to be included in a formal release.

kzaher mentioned this issue Jan 21, 2019

Add takeUntil(_ predicate:) and takeUntil(_ element:) #1783

Merged

LucianoPAlmeida mentioned this issue Jan 24, 2019

[WIP] Testing fixing approaches for tsan RxAtomic issues #1860

Merged

andyj-at-aspin closed this as completed Feb 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RxAtomic causes Thread Sanitiser race conditions #1853

RxAtomic causes Thread Sanitiser race conditions #1853

andyj-at-aspin commented Jan 15, 2019

andyj-at-aspin commented Jan 15, 2019

freak4pc commented Jan 15, 2019

andyj-at-aspin commented Jan 15, 2019 •

edited

Loading

kzaher commented Jan 18, 2019

freak4pc commented Jan 18, 2019

kzaher commented Jan 18, 2019 •

edited

Loading

LucianoPAlmeida commented Jan 18, 2019

freak4pc commented Jan 19, 2019

LucianoPAlmeida commented Jan 20, 2019

kzaher commented Jan 21, 2019

freak4pc commented Jan 21, 2019

kzaher commented Jan 21, 2019 •

edited

Loading

freak4pc commented Jan 21, 2019

kzaher commented Jan 21, 2019

LucianoPAlmeida commented Jan 22, 2019

LucianoPAlmeida commented Jan 22, 2019

kzaher commented Jan 24, 2019

freak4pc commented Jan 24, 2019

kzaher commented Jan 24, 2019

LucianoPAlmeida commented Jan 24, 2019

LucianoPAlmeida commented Jan 24, 2019 •

edited

Loading

LucianoPAlmeida commented Jan 24, 2019

kzaher commented Jan 31, 2019

andyj-at-aspin commented Feb 1, 2019

RxAtomic causes Thread Sanitiser race conditions #1853

RxAtomic causes Thread Sanitiser race conditions #1853

Comments

andyj-at-aspin commented Jan 15, 2019

andyj-at-aspin commented Jan 15, 2019

freak4pc commented Jan 15, 2019

andyj-at-aspin commented Jan 15, 2019 • edited Loading

kzaher commented Jan 18, 2019

freak4pc commented Jan 18, 2019

kzaher commented Jan 18, 2019 • edited Loading

LucianoPAlmeida commented Jan 18, 2019

freak4pc commented Jan 19, 2019

LucianoPAlmeida commented Jan 20, 2019

kzaher commented Jan 21, 2019

freak4pc commented Jan 21, 2019

kzaher commented Jan 21, 2019 • edited Loading

freak4pc commented Jan 21, 2019

kzaher commented Jan 21, 2019

LucianoPAlmeida commented Jan 22, 2019

LucianoPAlmeida commented Jan 22, 2019

kzaher commented Jan 24, 2019

freak4pc commented Jan 24, 2019

kzaher commented Jan 24, 2019

LucianoPAlmeida commented Jan 24, 2019

LucianoPAlmeida commented Jan 24, 2019 • edited Loading

LucianoPAlmeida commented Jan 24, 2019

kzaher commented Jan 31, 2019

andyj-at-aspin commented Feb 1, 2019

andyj-at-aspin commented Jan 15, 2019 •

edited

Loading

kzaher commented Jan 18, 2019 •

edited

Loading

kzaher commented Jan 21, 2019 •

edited

Loading

LucianoPAlmeida commented Jan 24, 2019 •

edited

Loading