Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify ca_path and ca_file configuration parameters #4087

Merged
merged 1 commit into from
Jul 28, 2023

Conversation

davisp
Copy link
Contributor

@davisp davisp commented May 15, 2023

This change creates unified configuration settings for ca_file and ca_path while also allowing backwards compatibility with the old vfs.s3.ca_file, vfs.s3.ca_path, and rest.ignore_ssl_verification settings.

This adds three new configuration settings:

  • ssl.ca_file - Path to a PEM formatted certificate list
  • ssl.ca_path - Path to a directory of certificates
  • ssl.verify - A boolean indicating wether SSL peer verification should be enabled or not

Not all VFS backends support all three options. Notably, the underlying cURL option that ssl.ca_path is used for (CURLOPT_CAPATH) is documented as not working on Windows. Options supported by each backend are:

Azure

  • ssl.ca_file
  • ssl.verify

GCS

  • ssl.ca_file

S3

  • ssl.ca_file
  • ssl.ca_path
  • ssl.verify

TileDB Cloud

  • ssl.ca_file
  • ssl.ca_path
  • ssl.verify

I'm still working on trying to figure out how best to write test assertions that these configuration options have been set and are correctly used by the corresponding backends.


TYPE: IMPROVEMENT
DESC: Unify ca_file and ca_path configuration parameters

@shortcut-integration
Copy link

@davisp davisp requested review from ihnorton and shaunrd0 May 15, 2023 22:00
@davisp davisp force-pushed the pd/sc-2768/unified-ca-path-and-file branch 8 times, most recently from eeb846c to fce6483 Compare May 17, 2023 19:28
@davisp davisp force-pushed the pd/sc-2768/unified-ca-path-and-file branch 6 times, most recently from 905704f to 3971b6c Compare May 31, 2023 20:59
@davisp davisp force-pushed the pd/sc-2768/unified-ca-path-and-file branch 9 times, most recently from 3171b14 to 3390e47 Compare June 5, 2023 21:24
Copy link
Member

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not consider this option on Windows? I don't think Windows suffers from the issues that necessitate application code customizing the trusted CA path; certificate management on Windows is managed by the system and the trusted root list is automatically updated via Windows update.

@davisp
Copy link
Contributor Author

davisp commented Jun 6, 2023

@teo-tsirpanis I'm not entirely sure I understand your comment. I think you're asking if we could disable the use of these configuration options on Windows because of the global certificate management that already exists. While I agree with the general sentiment that most users of most operating systems will never have a need for setting this options, there are plenty of valid use cases that do require them.

@davisp davisp force-pushed the pd/sc-2768/unified-ca-path-and-file branch from abc7ddc to 11ba60a Compare June 6, 2023 16:44
@teo-tsirpanis
Copy link
Member

Yes that's what I meant. And looks like curl will ignore these parameters on Windows (and macOS) either way (also saw that you mentioned it in the issue's description):

If libcurl was built with Schannel (Microsoft's native TLS engine) or Secure Transport (Apple's native TLS engine) support, then libcurl will still perform peer certificate verification, but instead of using a CA cert bundle, it will use the certificates that are built into the OS.
(https://curl.se/docs/sslcerts.html)

But we could support ssl.verify on Windows non-Linux.

@davisp davisp force-pushed the pd/sc-2768/unified-ca-path-and-file branch from 3a0ebda to 9064f4a Compare June 6, 2023 20:24
@davisp
Copy link
Contributor Author

davisp commented Jun 6, 2023

So, first off, I think the way to think of this PR is as an abstraction over a set of common library configuration parameters. There are three and each library supports some subset of those three. This PR happens to be related to TLS parameters, but it could just as easily be creating a unified TileDB configuration for HTTP retry semantics or some other common set of configuration parameters shared by each of our object store backends.

Thinking of things in that light makes this PR extremely easy to reason about. We create three new TileDB configuration parameters and then plumb those values to the appropriate spots in each object store client library as appropriate where supported.

With that framing, the only complexity is in the testing of these parameters being set or not and ensure we're not getting false positives in the test suite because something is configured weirdly. Everything else is super straightforward.

Given that these are TLS parameters though, there are a number of things we could concern ourselves with, but shouldn't. A quick non-exhaustive list of things that come to mind:

  1. Does the client library use libcurl at all? S3 doesn't on Windows, Azure supports libcurl and WinHTTP (or w/e its called).
  2. How was libcurl configured and built, libcurl supports multiple TLS backends
  3. If libcurl has multiple TLS backends, which is the default
  4. If libcurl is used, which configuration options are actually exposed by the client library's API
  5. Even for non-libcurl TLS backends, most TLS backends support some subset of the provided options
  6. Given that the options are exposed, are they all applied the same way?
  7. A whole bunch of other system dependent variables like, has the user done something funky to their certificate store?

For 1, there's an interesting case in the Azure library because it exposes both a libcurl and Windows backend transport on Windows. It took me a while to realize this was fine since Azure always includes the libcurl based transport. However, S3 does not use libcurl on Windows at all, it does however still expose the three configuration options in its API.

For 2 and 3 which are closely related, we would have to specifically audit the port overlays, build systems, and feature interactions to ensure the same exact libcurl is used by all client libraries. And even then, we'd be relying on the client libraries not doing a runtime configuration to pick a non-default TLS backend.

Number 4, not all libraries expose the same set of configuration options. In particular, GCS only exposes the ca_file parameter, presumably because its the only common parameter amongst its supported TLS backends. The lack of a ssl.verify is also a project decision (that I find rather annoying because it makes my GCS tests less robust because I can't assert test passage when SSL verification is off which ensures that turning verification on really is verifying things).

  1. Client libraries using non-libcurl backends means we have to trust that the options are set somewhat sanely. We could theoretically audit each client library to verify and submit upstream patches to fix what we perceive as inconsistent/wrong decisions. But then we're in a game of convincing the upstream authors that we're more correct than whatever they decided to do (i.e., ask GCS to allow users to disable SSL verification).

  2. Not all parameters in fact are the same. For instance, some of our client libraries treat SSL verification as both hostname verification and peer validation, while some only use it to control peer validation but not hostname verification. I haven't discovered a unifying theory here other than libcurl exposes this as two knobs and each client library made their own decision on which to control with one boolean (none of our client libraries expose two booleans to control these separately from what I could find).

  3. Is obviously not within our ability to control, but could be something we could at least theoretically detect. For instance, is a system clock way out of whack with reality which causes certificates to appear expired when they shouldn't be (or valid when they should be expired) etc etc. But its something to keep in mind at least if we're thinking of ways TLS could break.

The bottom line here is that we can worry about these things (keeping them in mind when debugging client issues with TLS for instance), but there's no sense worrying about them for this PR because we really can't do anything about them in reality. Sure, if a customer finds a bug we can submit patches upstream, but its also not like we can monkeypatch our client libraries to ensure every platform is using a FIPS 140-3 compliant TLS implementation.

So all of that is why I say its much easier to just think of this as "We take these TileDB configuration parameters and apply them to each client library, beyond that, its up to the library to do the right thing."

@teo-tsirpanis
Copy link
Member

Makes sense, thanks for the explanation!

@ihnorton ihnorton requested a review from teo-tsirpanis June 7, 2023 12:54
@ihnorton ihnorton requested review from teo-tsirpanis and removed request for teo-tsirpanis July 13, 2023 19:58
@davisp davisp force-pushed the pd/sc-2768/unified-ca-path-and-file branch from 9064f4a to 1d99bc8 Compare July 13, 2023 21:09
@robertbindar robertbindar self-requested a review July 14, 2023 14:19
@davisp davisp force-pushed the pd/sc-2768/unified-ca-path-and-file branch 2 times, most recently from d0f7bd1 to d43f7cb Compare July 14, 2023 18:12
Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but let's get an ACK from @Shelnutt2 (and @teo-tsirpanis for azure) before merging.

Copy link
Contributor

@robertbindar robertbindar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments, mostly user experience concerns. Very nice work and the housekeeping bits here and there are very appreciated, thank you!

assert(state_ == State::UNINITIALIZED);

thread_pool_ = thread_pool;

bool found;
endpoint_ = config.get("vfs.gcs.endpoint", &found);
assert(found);
if (endpoint_.empty() && getenv("TILEDB_TEST_GCS_ENDPOINT")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this getenv result ever be "" or we don't care?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we care given its a TileDB specific test feature. There's a separate environment variable used by the GCS library for changing endpoints as well, so if folks do need to change it there's a standard way for doing it.

channel_options.set_ssl_root_path(cert_file);
}
if (!ssl_cfg_.ca_path().empty()) {
LOG_WARN(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning is nice here, thanks!

ss << "]";
}

ss << " : " << err.GetMessage();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this will be much appreciated.

// configured by the user.

// Only set ca_file_ if vfs.s3.ca_file is a non-empty string
auto ca_file = cfg.get<std::string>("vfs.s3.ca_file");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some sort of deprecation mechanism for these config options? Thinking from the user's perspective, when I'd search for options documentation/forums/slack/etc, I'd definitely get a few hits with the old options and I'd get pretty confused on which one of the ssl options would help me. I'm wondering whether we should signal somehow that those options are not to be used anymore (warning/docs comments).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, and I have absolutely no idea on the docs side of things. I'd agree that at least adding notes to the s3 specific config variable docs would be useful though. As for how long until we drop the s3 variables, I'd assume that's a 3.x breaking change type of event.

bool ignore_ssl_validation = false;
bool found;
RETURN_NOT_OK(config_->get<bool>(
"rest.ignore_ssl_validation", &ignore_ssl_validation, &found));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in the comment above, I understand we now support both for backwards compatibility, but do we want to keep this one around indefinitely in the future? If no, should we discourage users? can we clarify somehow that this was enhanced and replaced by another option so it doesn't get very confusion which option gets the job done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should discourage the use. Though it occurs to me reading this second comment that maybe a LOG_WARN about the new config variables might be useful? I.e., something like "Configuration $old_var can now be achieved using $new_var. Support for $old_var will be removed in the future."

tiledb/sm/filesystem/azure.cc Outdated Show resolved Hide resolved
scripts/find_heap_api_violations.py Outdated Show resolved Hide resolved
@davisp davisp force-pushed the pd/sc-2768/unified-ca-path-and-file branch from d43f7cb to 49910ea Compare July 17, 2023 16:50
}

auto transport =
make_shared<::Azure::Core::Http::CurlTransport>(HERE(), transport_opts);
Copy link
Member

@teo-tsirpanis teo-tsirpanis Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause the Curl transport to be used on Windows, which uses WinHTTP by default. Not sure if we care though; in default builds Curl will be used with serialization either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the WinHTTPTransportOptions has two settings IgnoreUnknownCertificateAuthority and IgnoreInvalidCertificateCommonName which we could disable with ssl.verify = false. However, there's no option for specifying a certificate chain so it'd end up like GCS in that we only have the one switch on Windows.

There's also TransportOptions::DisableTlsCertificateValidation that is the more generic interface that we could use instead so that we don't care at all what Azure uses behind the curtain which seems slightly safer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take that back. If we don't use the curl transport here we'll not be able to specify a ca_file option on Linux which defeats the entire point of the PR (at least for Azure). So we can either leave it as it is, or we can do a if constexpr to pick the curl transport on not Windows and use the default version on Windows? Given it works fine on Windows as is, I'm inclined to leave it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no option for specifying a certificate chain so it'd end up like GCS in that we only have the one switch on Windows.

As said in #4087 (comment) Windows will not honor ca_file on curl.

or we can do a if constexpr to pick the curl transport on not Windows and use the default version on Windows?

I tried to find a comparison between libcurl and WinHTTP but to no avail. I found https://curl.se/libcurl/wininet.html which is for WinINet, another HTTP API. I am split on whether to use curl on Windows. Using it increases consistency among platforms, but not using it keeps the Azure SDK's defaults. Also the AWS SDK uses WinHTTP on Windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows not honoring the ca_file isn't really an issue since we're really only concerned that it works on Linux, specifically in Docker containers that might not have the ca_bundle package installed.

@ihnorton Have you got any good arguments for or against libcurl on Windows?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess lean toward WinHTTP, but not a strong preference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the Azure adapter to use WinHTTP on Windows and the cURL adapter on !Windows. I ended up having to use the preprocessor though since the WinHTTP approach ends up attempting to include winhttp.h which makes things not work on !Windows.

@davisp davisp force-pushed the pd/sc-2768/unified-ca-path-and-file branch 5 times, most recently from e4ac6d8 to 815bf79 Compare July 20, 2023 20:57
@davisp davisp force-pushed the pd/sc-2768/unified-ca-path-and-file branch from 815bf79 to e517a90 Compare July 20, 2023 21:50
@ihnorton ihnorton requested a review from teo-tsirpanis July 28, 2023 14:32
Copy link
Member

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure VFS looks good, thanks!

Comment on lines +1132 to +1134
if (ssl_cfg.verify() == false) {
transport_opts.IgnoreUnknownCertificateAuthority = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (ssl_cfg.verify() == false) {
transport_opts.IgnoreUnknownCertificateAuthority = true;
}
transport_opts.IgnoreUnknownCertificateAuthority = !ssl_cfg.verify();

Comment on lines +1155 to +1157
if (ssl_cfg.verify() == false) {
transport_opts.SslVerifyPeer = false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (ssl_cfg.verify() == false) {
transport_opts.SslVerifyPeer = false;
}
transport_opts.SslVerifyPeer = ssl_cfg.verify();

@ihnorton ihnorton merged commit 36f0c07 into dev Jul 28, 2023
@ihnorton ihnorton deleted the pd/sc-2768/unified-ca-path-and-file branch July 28, 2023 17:37
davisp added a commit to davisp/TileDB that referenced this pull request Aug 14, 2023
This change creates unified configuration settings for ca_file and ca_path while also allowing backwards compatibility with the old `vfs.s3.ca_file`, `vfs.s3.ca_path`, and `rest.ignore_ssl_verification` settings.

This adds three new configuration settings:

  * `ssl.ca_file` - Path to a PEM formatted certificate list
  * `ssl.ca_path` - Path to a directory of certificates
  * `ssl.verify` - A boolean indicating wether SSL peer verification should be enabled or not

Not all VFS backends support all three options. Notably, the underlying cURL option that `ssl.ca_path` is used for (`CURLOPT_CAPATH`) is documented as not working on Windows. Options supported by each backend are:

Azure
---

  * ssl.ca_file
  * ssl.verify

GCS
---

  * ssl.ca_file
 
S3
---

  * ssl.ca_file
  * ssl.ca_path
  * ssl.verify

TileDB Cloud
---

  * ssl.ca_file
  * ssl.ca_path
  * ssl.verify

~~I'm still working on trying to figure out how best to write test assertions that these configuration options have been set and are correctly used by the corresponding backends.~~

---
TYPE: IMPROVEMENT
DESC: Unify ca_file and ca_path configuration parameters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants