-
Notifications
You must be signed in to change notification settings - Fork 845
Fix an error on SSL config reload (plus some cleanup). #8790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bf22920 to
4c40fd3
Compare
|
I'll undraft this after letting it soak in Yahoo Prod for a while. |
4c40fd3 to
c25d943
Compare
c25d943 to
286a0bc
Compare
286a0bc to
559d15b
Compare
|
Sorry, I accidentally made the PR branch in the Apache repo and not my fork. I'll delete the branch once this PR is closed. |
559d15b to
d7c7d66
Compare
|
I disagree this is incompatible, since, as Masakazu found, it's only incompatible with commits which have not been released yet. |
32947d0 to
c643ffd
Compare
|
This has been running in Yahoo Prod for more than 2 weeks, with no issues/problems. These changes fixed errors that were happening when doing a config reload. |
cmcfarlen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain more about the blocking_invoke being in this PR? Is that necessary and do we want this so-called hack?
iocore/net/SSLSecret.cc
Outdated
| if (stat(name.c_str(), &statdata) < 0) { | ||
| return false; | ||
| return std::string{}; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we could skip calling stat here since we don't do anything with the result and the load call will fail below if stat does.
| const std::string * | ||
| SSLSecret::getSecretItem(const std::string &name) const | ||
| std::string | ||
| SSLSecret::getSecret(const std::string &name) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these changes the caller can no longer differentiate between not found and empty secrets. Perhaps this could return std::optional to restore that info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that the calling code treats, or should treat, an empty file versus a non-existent file differently. Are you saying that it should? Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying a caller cannot tell anymore if its empty vs non-existent, as well as giving a suggestion as to how that information could be restored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're saying you think this function needs to be future safe in case someday something that calls it needs to know that the secret isn't available because the file is empty rather than not present? We could spend a lot of time and bloat consistently having that level of future safety.
| // the calling thread. It should be replaced with a version with a continuation parameter, providing a continuation | ||
| // to be called when the hook handling is complete. | ||
| // | ||
| int blocking_invoke(int event, void *edata) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this hack related to cleanup of SSL secrets? Is this needed for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The APIHook::invoke() (what was originally used instead of this) will, if the mutex of the continuation is already locked, do an ink_release_assert(). So that effectively means you can't have a continuation mutex. I needed a mutex on a continuation in an internal Yahoo plugin. APIHook::blocking_invoke() will just block on the mutex. If there is time someday, all hooks should be made to work like the global and transaction hooks. Try the mutex, and if you can't lock it, immediate reschedule the continuation. That's the sense in which this is a hack.
src/traffic_server/InkAPI.cc
Outdated
| // Support code for TSHeapBuf. | ||
| namespace | ||
| { | ||
| class HeapBuf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HeapBuf seems like its just std::unique_ptr<std::string>, would you consider using that to back TSHeapBuf instead of this custom class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like it would require three memory allocations instead of one. This is meant to be a general purpose type, so pessimizing it could have a significant performance impact, depending how it's used in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should only have to heap allocate the string as you do with _S here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TSHeapBuf would have to be of type std::unique_ptr<std::string> *. The std::uniue_ptr<std::string> instance would be dynamically allocated (allocation 1). It would point to a std::string instance in the heap (allocation 2). The std::string instance would allocate a buffer for the data (allocation 3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that effectively, HeapBuf could be written as using HeapBuf = std::unique_ptr<std::string> and TSHeapBuf would be (opaquely) *std::string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see what you mean now. But that will still mean doubling the number of memory allocations. Also mice that HeapBuf encapsulates the reinterpret_casts.
c643ffd to
452478a
Compare
maskit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.
This seemed thread unsafe. A periodic poll running in the background can update the secret data for an entry
for a secret name in the map.
I added a new TS API handle, TSHeapBuf, in order to be able to cleanly return a copy of the secret data.
I would not call this general cleanup. The commit log would look like a trivial change but it is not. Who thinks a "cleanup" fixes a runtime issue and introduces new TS APIs?
And I don't see the necessity of introducing TSHeapBuf on this PR. I might be missing something but it seems like we just need to return a copy of secret data to resolve the issue. That is the key, right?
I'm not going to judge whether we want TSHeapBuf in general here, because that should be discussed on dev@ as API review, but I think we should not introduce TSHeapBuf on this PR. If we failed to fix the issue before 9.2.0 release, we would have to introduce the problematic API and need an incompatible change on 10.0.0. I'd suggest fixing the issue without introducing TSHeapBuf for now (if possible) to avoid the unfortunate releases. Since feature freeze for 9.2.0 has not been called yet, you can still propose the interface that uses TSHeapBuf, after we fix the issue. Although that could be postponed to 10.0.0, I think that's much better than introducing the problematic API as it is.
At minimum, the title, which will be a commit message, need to be updated. And since there are new TS APIs proposed, this PR should not be merged until API review completes even if we add them on this PR.
|
"I would not call this general cleanup." What title do you prefer then? |
I don't know what happened exactly, so I can only say "Fix an error on SSL config reload". |
452478a to
c8e1c93
Compare
|
As mentioned already, at a minimum this should be split into two PRs, and a discussion should be opened on the mailing list about adding the new set of APIs. |
c8e1c93 to
fcd9137
Compare
|
We're seeing a deadlock with our plugin that uses SSL_SECRET_HOOK, so making this a draft for now. |
|
Brian Neradt has a fix for the deadlock problem. We're going to let it ripen internally for a bit, then I'll pull it into this PR. |
This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when doing config reloads. The SSLSecret public functions no longer return pointers into the unorded_map of secrets, they return a copy of the secret data. This seemed thread unsafe. A periodic poll running in the background can update the secret data for an entry for a secret name in the map. To avoid exporting pointers, I had to change the prototype of TSSslSecretGet(). Hopefully there are no existing plugins that are already using this TS API function, so breaking this rule will be moot. I added a new TS API type, TSAllocatedVarLenData, in order to be able to cleanly return a copy of the secret data.
1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret. 2. loadSecret dispatched to Continations that registered for the TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's lock. 3. In the meantime, those Continuations could call setSecret which would try to grab the secret_map_mutex. If this Continuation did this while holding the lock that step 2 is waiting upon, then there will be a deadlock between the Continuation lock and the secret_map_mutex between the two threads. This patch avoids the deadlock by releasing the secret_map_mutex lock before calling loadSecret. It also updates the secret_map when loading secrets from a file in loadSecret.
fcd9137 to
5d25fa2
Compare
|
I still think we should fix the API without introducing additional data structure to avoid the unfortunate release that has an unusable API. Even if we are going to have |
|
We have this in Yahoo internal TS. If anyone else sees errors like: we should reopen this PR. |
|
Weren't you going to update this PR with the simplest interface? I think this PR was closed because the branch was on the upstream repo. |
|
I was planning to, but it was delayed by production issues. Leif didn't comment on why he was closing it. |
This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.
The SSLSecret public functions no longer return pointers into the unorded_map of
secrets, they return a copy of the secret data. This seemed thread unsafe. A
periodic poll running in the background can update the secret data for an entry
for a secret name in the map.
To avoid exporting pointers, I had to change the prototype of TSSslSecretGet().
Hopefully there are no existing plugins that are already using this TS API function,
so breaking this rule will be moot. I added a new TS API handle, TSHeapBuf, in order
to be able to cleanly return a copy of the secret data.