Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Fix a potential race in iterating counters #28112

Merged
merged 1 commit into from
Nov 16, 2020

Conversation

sywhang
Copy link

@sywhang sywhang commented Oct 21, 2020

Summary

There was a mistake that was made in backporting dotnet/runtime#40259 to #28089 where _counters was used to iterate instead of the snapshotted value of counters. _counters is a list that needs to be lock-protected, but this access was happening outside of the lock, which is what the snapshot is for. This caused a race-condition on the read/write on this list, causing a crash. The problem does not exist in the 5.0 fix, only for the 3.1 backport.

Customer Impact

Medium/High. When hit with this issue, a customer may experience a crash. A customer may run into this issue when they turn on the runtime counters (System.Runtime) or any other set of counter providers, and then modify the list of counters while a counter callback is happening by either creating or disposing instances of EventCounters. For customers that don't consume/create their own EventCounters, this shouldn't be a problem. For those who do, it's pretty easy to hit with the likelihood increasing the shorter they set the counter polling interval.

Regression

Yes. From 3.1.8 -> 3.1.9 servicing release.

Testing

We do not have a local repro of this failure, but we tested the fix by providing a custom build of the runtime with this fix to the internal partner test that initially reported this problem. They ran it in production that hit this issue and reported that this addresses the issue.

Risk

The root cause/fix is well-understood and the fix was verified through partner testing - I would say it is relatively low.

@sywhang sywhang requested a review from noahfalk October 21, 2020 07:40
@sywhang sywhang added Servicing-consider Issue for next servicing release review area-Tracing and removed area-Tracing Servicing-consider Issue for next servicing release review labels Oct 21, 2020
@noahfalk
Copy link
Member

Customer Impact

@sywhang - its best if the text here describes what the user needs to do to hit the issue and some estimate of how likely the situation is to occur. The resulting behavior when the bug is triggered is important as well (crashing as you mentioned). The total impact is then a combination of both: likelihood_to_hit * effect_when_hit.

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@noahfalk
Copy link
Member

cc @davmason for another pair of eyes (despite being simple I was also the one who missed catching it during the port so I'd appreciate getting a fresh eyes on it)

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Can you elaborate on the customer impact and how often it will occur? We will consider this for 3.1.

@jeffschwMSFT jeffschwMSFT added the Servicing-consider Issue for next servicing release review label Oct 21, 2020
@jeffschwMSFT jeffschwMSFT added this to the 3.1.x milestone Oct 21, 2020
@leecow leecow added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Oct 22, 2020
@leecow leecow modified the milestones: 3.1.x, 3.1.11 Oct 22, 2020
@seif
Copy link

seif commented Nov 5, 2020

I am affected by this, trying to add Counters to the EventSource dynamically, the process crashes because of this

System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
at System.Collections.Generic.List1.Enumerator.MoveNextRare() at System.Collections.Generic.List1.Enumerator.MoveNext()
at System.Diagnostics.Tracing.CounterGroup.OnTimer()
at System.Diagnostics.Tracing.CounterGroup.<>c__DisplayClass21_0.b__0()
at System.Diagnostics.Tracing.CounterGroup.PollForValues()
at System.Threading.ThreadHelper.ThreadStart()

Any workaround you suggest until 3.1.11 is out?

@wtgodbe wtgodbe merged commit 9fb9458 into release/3.1 Nov 16, 2020
@stephentoub stephentoub deleted the dev/suwhang/fix-counter-race-condition branch January 5, 2021 02:43
sdmaclea pushed a commit to sdmaclea/coreclr that referenced this pull request Jan 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Tracing Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants