Fix an error on SSL config reload (plus some cleanup). #8790

ywkaras · 2022-04-12T15:40:15Z

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.

The SSLSecret public functions no longer return pointers into the unorded_map of
secrets, they return a copy of the secret data. This seemed thread unsafe. A
periodic poll running in the background can update the secret data for an entry
for a secret name in the map.

To avoid exporting pointers, I had to change the prototype of TSSslSecretGet().
Hopefully there are no existing plugins that are already using this TS API function,
so breaking this rule will be moot. I added a new TS API handle, TSHeapBuf, in order
to be able to cleanly return a copy of the secret data.

ywkaras · 2022-04-12T16:51:31Z

I'll undraft this after letting it soak in Yahoo Prod for a while.

ywkaras · 2022-05-25T20:28:11Z

Sorry, I accidentally made the PR branch in the Apache repo and not my fork. I'll delete the branch once this PR is closed.

ywkaras · 2022-06-07T20:22:12Z

I disagree this is incompatible, since, as Masakazu found, it's only incompatible with commits which have not been released yet.

ywkaras · 2022-07-27T19:02:04Z

This has been running in Yahoo Prod for more than 2 weeks, with no issues/problems. These changes fixed errors that were happening when doing a config reload.

cmcfarlen

Could you explain more about the blocking_invoke being in this PR? Is that necessary and do we want this so-called hack?

cmcfarlen · 2022-08-30T14:27:40Z

iocore/net/SSLSecret.cc

  if (stat(name.c_str(), &statdata) < 0) {
-    return false;
+    return std::string{};
  }


Seems like we could skip calling stat here since we don't do anything with the result and the load call will fail below if stat does.

cmcfarlen · 2022-08-30T14:31:03Z

iocore/net/SSLSecret.cc

-const std::string *
-SSLSecret::getSecretItem(const std::string &name) const
+std::string
+SSLSecret::getSecret(const std::string &name) const


With these changes the caller can no longer differentiate between not found and empty secrets. Perhaps this could return std::optional to restore that info.

I don't see that the calling code treats, or should treat, an empty file versus a non-existent file differently. Are you saying that it should? Why?

I'm saying a caller cannot tell anymore if its empty vs non-existent, as well as giving a suggestion as to how that information could be restored.

So you're saying you think this function needs to be future safe in case someday something that calls it needs to know that the secret isn't available because the file is empty rather than not present? We could spend a lot of time and bloat consistently having that level of future safety.

cmcfarlen · 2022-08-30T14:32:11Z

proxy/InkAPIInternal.h

+  // the calling thread.  It should be replaced with a version with a continuation parameter, providing a continuation
+  // to be called when the hook handling is complete.
+  //
+  int blocking_invoke(int event, void *edata) const;


How is this hack related to cleanup of SSL secrets? Is this needed for this PR?

The APIHook::invoke() (what was originally used instead of this) will, if the mutex of the continuation is already locked, do an ink_release_assert(). So that effectively means you can't have a continuation mutex. I needed a mutex on a continuation in an internal Yahoo plugin. APIHook::blocking_invoke() will just block on the mutex. If there is time someday, all hooks should be made to work like the global and transaction hooks. Try the mutex, and if you can't lock it, immediate reschedule the continuation. That's the sense in which this is a hack.

cmcfarlen · 2022-08-30T14:47:13Z

src/traffic_server/InkAPI.cc

+// Support code for TSHeapBuf.
+namespace
+{
+class HeapBuf


HeapBuf seems like its just std::unique_ptr<std::string>, would you consider using that to back TSHeapBuf instead of this custom class?

That seems like it would require three memory allocations instead of one. This is meant to be a general purpose type, so pessimizing it could have a significant performance impact, depending how it's used in the future.

Should only have to heap allocate the string as you do with _S here.

TSHeapBuf would have to be of type std::unique_ptr<std::string> *. The std::uniue_ptr<std::string> instance would be dynamically allocated (allocation 1). It would point to a std::string instance in the heap (allocation 2). The std::string instance would allocate a buffer for the data (allocation 3).

I meant that effectively, HeapBuf could be written as using HeapBuf = std::unique_ptr<std::string> and TSHeapBuf would be (opaquely) *std::string.

OK I see what you mean now. But that will still mean doubling the number of memory allocations. Also mice that HeapBuf encapsulates the reinterpret_casts.

maskit

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when
doing config reloads.

This seemed thread unsafe. A periodic poll running in the background can update the secret data for an entry
for a secret name in the map.

I added a new TS API handle, TSHeapBuf, in order to be able to cleanly return a copy of the secret data.

I would not call this general cleanup. The commit log would look like a trivial change but it is not. Who thinks a "cleanup" fixes a runtime issue and introduces new TS APIs?

And I don't see the necessity of introducing TSHeapBuf on this PR. I might be missing something but it seems like we just need to return a copy of secret data to resolve the issue. That is the key, right?

I'm not going to judge whether we want TSHeapBuf in general here, because that should be discussed on dev@ as API review, but I think we should not introduce TSHeapBuf on this PR. If we failed to fix the issue before 9.2.0 release, we would have to introduce the problematic API and need an incompatible change on 10.0.0. I'd suggest fixing the issue without introducing TSHeapBuf for now (if possible) to avoid the unfortunate releases. Since feature freeze for 9.2.0 has not been called yet, you can still propose the interface that uses TSHeapBuf, after we fix the issue. Although that could be postponed to 10.0.0, I think that's much better than introducing the problematic API as it is.

At minimum, the title, which will be a commit message, need to be updated. And since there are new TS APIs proposed, this PR should not be merged until API review completes even if we add them on this PR.

ywkaras · 2022-08-30T19:00:29Z

"I would not call this general cleanup."

What title do you prefer then?

maskit · 2022-08-30T19:32:45Z

What title do you prefer then?

I don't know what happened exactly, so I can only say "Fix an error on SSL config reload".

zwoop · 2022-09-01T23:08:36Z

As mentioned already, at a minimum this should be split into two PRs, and a discussion should be opened on the mailing list about adding the new set of APIs.

ywkaras · 2022-09-30T21:01:50Z

We're seeing a deadlock with our plugin that uses SSL_SECRET_HOOK, so making this a draft for now.

ywkaras · 2022-10-03T18:00:42Z

Brian Neradt has a fix for the deadlock problem. We're going to let it ripen internally for a bit, then I'll pull it into this PR.

This seems to have eliminated some ERROR diags we were seeing in Yahoo Prod when doing config reloads. The SSLSecret public functions no longer return pointers into the unorded_map of secrets, they return a copy of the secret data. This seemed thread unsafe. A periodic poll running in the background can update the secret data for an entry for a secret name in the map. To avoid exporting pointers, I had to change the prototype of TSSslSecretGet(). Hopefully there are no existing plugins that are already using this TS API function, so breaking this rule will be moot. I added a new TS API type, TSAllocatedVarLenData, in order to be able to cleanly return a copy of the secret data.

1. getOrLoadSecret grabbed the secret_map_mutex and called loadSecret. 2. loadSecret dispatched to Continations that registered for the TS_EVENT_SSL_SECRET event. This would try to grab the Continuation's lock. 3. In the meantime, those Continuations could call setSecret which would try to grab the secret_map_mutex. If this Continuation did this while holding the lock that step 2 is waiting upon, then there will be a deadlock between the Continuation lock and the secret_map_mutex between the two threads. This patch avoids the deadlock by releasing the secret_map_mutex lock before calling loadSecret. It also updates the secret_map when loading secrets from a file in loadSecret.

maskit · 2023-01-03T18:38:42Z

I still think we should fix the API without introducing additional data structure to avoid the unfortunate release that has an unusable API.

Even if we are going to have TSAllocatedVarLenData, using it just for TSSslSecretGet causes inconsistency. TSUrlStringGet could use it too. We can continue the discussion about the data structure on the list and have it later. Let's focus on the issue on this PR.

ywkaras · 2023-01-20T20:34:59Z

We have this in Yahoo internal TS. If anyone else sees errors like:

[Jan 20 00:01:47.669] [ET_TASK 1] ERROR: SSL::139889882617600:error:0909006C:PEM routines:get_name:no start line:crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE
[Jan 20 00:01:47.669] [ET_TASK 1] ERROR: SSL::139889882617600:error:0909006C:PEM routines:get_name:no start line:crypto/pem/pem_lib.c:745:Expecting: ANY PRIVATE KEY
[Jan 20 00:01:47.669] [ET_TASK 1] ERROR: failed to load server private key from /home/y/conf/trafficserver/ssl/ycpi.tls.yahooinc.com.ec.key

we should reopen this PR.

maskit · 2023-01-20T20:56:08Z

Weren't you going to update this PR with the simplest interface? I think this PR was closed because the branch was on the upstream repo.

ywkaras · 2023-01-20T21:01:10Z

I was planning to, but it was delayed by production issues. Leif didn't comment on why he was closing it.

ywkaras self-assigned this Apr 12, 2022

ywkaras added Cleanup TS API SSL labels Apr 12, 2022

ywkaras added this to the 10.0.0 milestone Apr 12, 2022

ywkaras force-pushed the os_pkey_cnf_reload branch from bf22920 to 4c40fd3 Compare April 12, 2022 16:41

ywkaras force-pushed the os_pkey_cnf_reload branch from 4c40fd3 to c25d943 Compare April 12, 2022 19:44

ywkaras force-pushed the os_pkey_cnf_reload branch from c25d943 to 286a0bc Compare April 22, 2022 19:13

bryancall added the Incompatible label May 20, 2022

maskit mentioned this pull request May 25, 2022

Deprecate TSSslSecretGet (not thread safe). #8862

Closed

ywkaras force-pushed the os_pkey_cnf_reload branch from 286a0bc to 559d15b Compare May 25, 2022 20:24

ywkaras force-pushed the os_pkey_cnf_reload branch from 559d15b to d7c7d66 Compare May 25, 2022 23:55

ywkaras force-pushed the os_pkey_cnf_reload branch from 32947d0 to c643ffd Compare June 13, 2022 19:31

ywkaras marked this pull request as ready for review July 27, 2022 19:02

bryancall requested a review from shinrich August 1, 2022 23:42

ywkaras added the Bug label Aug 22, 2022

maskit self-requested a review August 30, 2022 15:14

cmcfarlen reviewed Aug 30, 2022

View reviewed changes

ywkaras force-pushed the os_pkey_cnf_reload branch from c643ffd to 452478a Compare August 30, 2022 17:52

maskit requested changes Aug 30, 2022

View reviewed changes

ywkaras changed the title ~~General cleanup of the SSLSecret code.~~ Fix an error on SSL config reload (plus some cleanup). Aug 30, 2022

ywkaras force-pushed the os_pkey_cnf_reload branch from 452478a to c8e1c93 Compare September 1, 2022 15:46

ywkaras marked this pull request as draft September 6, 2022 16:08

ywkaras mentioned this pull request Sep 6, 2022

Move Cleanup.h from plugins/xdebug to include/tscpp/api . #9080

Merged

ywkaras marked this pull request as ready for review September 20, 2022 00:45

ywkaras closed this Sep 20, 2022

ywkaras reopened this Sep 20, 2022

ywkaras force-pushed the os_pkey_cnf_reload branch from c8e1c93 to fcd9137 Compare September 20, 2022 13:28

ywkaras marked this pull request as draft September 30, 2022 21:02

ywkaras and others added 2 commits October 31, 2022 22:55

ywkaras force-pushed the os_pkey_cnf_reload branch from fcd9137 to 5d25fa2 Compare October 31, 2022 23:00

Fix clang-format CI job defect.

6a93677

ywkaras marked this pull request as ready for review November 1, 2022 16:48

zwoop closed this Jan 20, 2023

zwoop deleted the os_pkey_cnf_reload branch January 20, 2023 18:05

ywkaras mentioned this pull request Jan 26, 2023

Fix an error on SSL config reload (plus some cleanup) #9334

Merged

zwoop removed this from the 10.0.0 milestone Feb 29, 2024

Fix an error on SSL config reload (plus some cleanup). #8790

Fix an error on SSL config reload (plus some cleanup). #8790

Uh oh!

Conversation

ywkaras commented Apr 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywkaras commented Apr 12, 2022

Uh oh!

ywkaras commented May 25, 2022

Uh oh!

ywkaras commented Jun 7, 2022

Uh oh!

ywkaras commented Jul 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmcfarlen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maskit left a comment

Choose a reason for hiding this comment

Uh oh!

ywkaras commented Aug 30, 2022

Uh oh!

maskit commented Aug 30, 2022

Uh oh!

zwoop commented Sep 1, 2022

Uh oh!

ywkaras commented Sep 30, 2022

Uh oh!

ywkaras commented Oct 3, 2022

Uh oh!

maskit commented Jan 3, 2023

Uh oh!

ywkaras commented Jan 20, 2023

Uh oh!

maskit commented Jan 20, 2023

Uh oh!

ywkaras commented Jan 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ywkaras commented Apr 12, 2022 •

edited

Loading

ywkaras commented Jul 27, 2022 •

edited

Loading