Skip to content

Conversation

@ywkaras
Copy link
Contributor

@ywkaras ywkaras commented May 29, 2024

These changes should prevent (low-frequency) crashes that are being seen in Yahoo Prod.

@ywkaras ywkaras self-assigned this May 29, 2024
@ywkaras ywkaras requested a review from bryancall May 29, 2024 17:28
@ywkaras ywkaras added the Crash label May 29, 2024
@ywkaras ywkaras added this to the 10.1.0 milestone May 29, 2024
@ywkaras ywkaras closed this May 29, 2024
@ywkaras ywkaras reopened this May 29, 2024
@ywkaras ywkaras force-pushed the regex_crash branch 4 times, most recently from c6713d5 to 976c730 Compare May 29, 2024 20:32
These changes should prevent (low-frequency) crashes that are being seen in Yahoo Prod.
@cmcfarlen
Copy link
Contributor

Should this be added to 10.0.x project as well for cherry-pick?

@ywkaras
Copy link
Contributor Author

ywkaras commented May 30, 2024

Should this be added to 10.0.x project as well for cherry-pick?

Debatable. Any crashes would happen at shutdown, and seem infrequent (with limited testing in Yahoo Prod).

@ywkaras
Copy link
Contributor Author

ywkaras commented May 30, 2024

This is where the segment violation is occurring in Yahoo Prod:

return _match_context;

@bryancall bryancall requested a review from cmcfarlen June 3, 2024 22:30
Copy link
Contributor

@cmcfarlen cmcfarlen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could RegexContext be automatic within the thread? i.e.

RegexContext* get_instance() {
  thread_local RegexContext ctx;
  return &ctx;
}

Why is the global vector of contexts necessary at all?

IIRC this was originally done because some Debug statements in destructors could come after the shutdown of the thread's regex context. But if we use Dbg in those locations, then this isn't a problem anymore?

@cmcfarlen
Copy link
Contributor

Could also look at how Metrics storage lifetime outlasts the threads and see if this would work for _Data lifetime:

https://github.com/apache/trafficserver/blob/master/src/tsutil/Metrics.cc#L34-L42

@ywkaras
Copy link
Contributor Author

ywkaras commented Jun 4, 2024

Yes, it would not be a logic error in creating automatic, function scope instances of RegexContext when one is needed. The issue is whether the performance hit is tolerable.

On the other hand, this change involves multiple threads/cores hardware locking a variable in cache, which is also a potentially significant performance hit.

@ywkaras
Copy link
Contributor Author

ywkaras commented Jun 4, 2024

Could also look at how Metrics storage lifetime outlasts the threads and see if this would work for _Data lifetime:

https://github.com/apache/trafficserver/blob/master/src/tsutil/Metrics.cc#L34-L42

Because of

// There is a mutex in the C/C++ runtime that both dlopen() and _cxa_thread_atexit() lock while running.
, RegexContext::get_instance() cannot cause the construction of a thread_local variable with a non-trivial destructor. The Metrics code is not limited by that restriction.

I guess I should look into changing DbgCtl:::_new_reference() so it doesn't call tag_activated() with its mutex locked.

@ywkaras
Copy link
Contributor Author

ywkaras commented Jun 4, 2024

Closing this because I think #11416 is better.

@ywkaras ywkaras closed this Jun 4, 2024
@zwoop zwoop modified the milestones: 10.1.0, 10.0.0 Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants