Skip to content

Conversation

@rob05c
Copy link
Member

@rob05c rob05c commented Aug 18, 2022

This is an alternative to #8664. That PR is fine in-itself and fixes a bug around the the atomic swap not actually reloading. But fixing that bug so that reloading hosting.config works exposes another deeper bug: the reloaded object has no synchronization around it.

Not only is the pointer itself never read with synchronization, even if it were, the object itself is never synchronized. So even if the pointer were atomic, the object reads could get different variables from different objects, when callers that need multiple variables really need them to come from a single object. Worse, if a request takes longer than the arbitrary delete timer, it could access freed memory.

This fixes the object to be synchronized, and all access behind a mutex. The reload swap and bad drive removal acquire a write lock, and all other accesses (none of which modify the object) acquire a read lock.

I really don't like adding a mutex to the request path. But the only other options I see are an incredibly complex and bug-prone lock-free solution, or not making it possible to reload hosting.config. As much as I dislike it, the mutex should never have contention except on reload or drive failure. So it's probably actually faster and fewer atomic instructions than even a lock-free solution.

This also adds some helper classes, to make it impossible to accidentally forget to acquire or release the mutex, or modify the object with a readlock, via RAII idioms.

Recommend backporting to 9.2.

Fixes #7220

@rob05c rob05c added the Bug label Aug 18, 2022
@rob05c rob05c added this to the 10.0.0 milestone Aug 18, 2022
@rob05c rob05c self-assigned this Aug 18, 2022
@rob05c rob05c requested a review from SolidWallOfCode August 18, 2022 18:47
@rob05c rob05c assigned rob05c and unassigned rob05c Aug 18, 2022
@rob05c rob05c requested a review from ezelkow1 August 18, 2022 18:48
@rob05c rob05c force-pushed the fix-hosting-config-reload-2 branch 5 times, most recently from ccf7dc0 to eadf861 Compare August 23, 2022 17:18
@traeak
Copy link
Contributor

traeak commented Aug 24, 2022

[approve ci docs]

@bryancall
Copy link
Contributor

[approve ci autest]

Copy link
Member

@SolidWallOfCode SolidWallOfCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about line 284, though - there doesn't seem to be a provision for a failure on a reload.

@rob05c
Copy link
Member Author

rob05c commented Oct 18, 2022

[approve ci autest]

@rob05c rob05c merged commit 026d906 into apache:master Oct 20, 2022
zwoop pushed a commit that referenced this pull request Nov 1, 2022
@zwoop
Copy link
Contributor

zwoop commented Nov 1, 2022

Cherry-picked to v9.2.x

@zwoop zwoop modified the milestones: 10.0.0, 9.2.0 Nov 1, 2022
SolidWallOfCode pushed a commit to SolidWallOfCode/trafficserver that referenced this pull request Nov 15, 2022
masaori335 pushed a commit to masaori335/trafficserver that referenced this pull request Feb 21, 2023
* asf/9.2.x:
  Updated ChangeLog
  Add docs for strategies.yaml hash_string (apache#9026)
  Fix hosting.config reloading (apache#9046)
  Remove unnecessary, dangerous casts from SET_HANDLER and SET_CONTINUATION invocations. (apache#9129)
  Remove deprecated ld option (--add-needed) (apache#9141)
masaori335 added a commit to masaori335/trafficserver that referenced this pull request Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hosting.config requires restart, but docs say reloadable

6 participants