Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conn_pool: making TCP upstreams pluggable #13548

Merged
merged 14 commits into from
Nov 5, 2020

Conversation

alyssawilk
Copy link
Contributor

Risk Level: Medium (some refactory)
Testing: existing tests pass
Docs Changes: n/a
Release Notes: n/a
Fixes #13185

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #13548 was opened by alyssawilk.

see: more, trace.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor, thanks for this work to reduce differences between various upstream connection factories.

Please look at CI since there's some doc generation and spellcheck errors.

hdrs = [
"config.h",
],
security_posture = "robust_to_untrusted_downstream",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this extension interact with upstreams? Worth a comment explaining the interaction (if there's any) or saying that there should be no upstream interaction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Enovy in general is considered not robust to untrusted upstreams.
@htuch is this documented somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the work @yanavlasov is doing in #12278, this is about to change shortly. Do you think you think there are untrusted upstream issues in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

htuch: I think part of the issue is that there's currently no way to indicate that an extension is robust to untrusted downstreams and agnostic to upstream input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's agnostic then it's robust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to tag this whatever you folks decide - it's no different from the current TCP proxy code robustness which as I understand today is considered robust to downstream and not to upstream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanavlasov do you want to make a call on this one based on upstream hardening progress?

@@ -1029,6 +1030,29 @@ TEST_F(TcpProxyTest, DEPRECATED_FEATURE_TEST(HalfCloseProxy)) {
upstream_callbacks_->onEvent(Network::ConnectionEvent::RemoteClose);
}

// Test with an explicitly configured upstream.
TEST_F(TcpProxyTest, DEPRECATED_FEATURE_TEST(ExplicitFactory)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused. GenericConnectionPoolProto is the proto that you added in this PR, but it seems that this test is for a feature that is being deprecated.

* @param cm the cluster manager to get the connection pool from
* @param hostname the hostname to use if doing connect tunneling, absl::nullopt otherwise.
* @param context the load balancing context for this connection.
* @param upstream_callbacks the callbacks to provide to the connection if succesfully created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spell check error from CI:

    • @param upstream_callbacks the callbacks to provide to the connection if succesfully created.
    • @param upstream_callbacks the callbacks to provide to the connection if successfully created.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Comment on lines 447 to 451
factory = &Envoy::Config::Utility::getAndCheckFactory<GenericConnPoolFactory>(
cluster.info()->upstreamConfig().value());
} else {
auto* cluster = cluster_manager_.get(cluster_name);
if (!cluster) {
return false;
}
// TODO(snowp): Ideally we should prevent this from being configured, but that's tricky to get
// right since whether a cluster is invalid depends on both the tcp_proxy config + cluster
// config.
if ((cluster->info()->features() & Upstream::ClusterInfo::Features::HTTP2) == 0) {
ENVOY_LOG(error, "Attempted to tunnel over HTTP/1.1, this is not supported. Set "
"http2_protocol_options on the cluster.");
return false;
}

generic_conn_pool_ = std::make_unique<HttpConnPool>(cluster_name, cluster_manager_, this,
config_->tunnelingConfig()->hostname(),
*upstream_callbacks_);
if (generic_conn_pool_->valid()) {
generic_conn_pool_->newStream(this);
return true;
} else {
generic_conn_pool_.reset();
}
factory = &Envoy::Config::Utility::getAndCheckFactoryByName<GenericConnPoolFactory>(
"envoy.filters.connection_pools.tcp.generic");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAndCheckFactoryByName may throw exception on invalid type_config

Is such "non-exist" factory config validated by XDS already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. This does look like data plane code with no obvious try/catch blocks in sight. The most concerning one is the getAndCheckFactory on line 447 above which is a function of config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safe to lookup by name providing you can guarantee that the filter will be there; it will throw otherwise. The config thing becomes an issue only when you are actually loading config.

I wonder if we can make this more robust going forward by having the factory functions ASSERT that they are in a control plane context, using some main thread singleton. There's probably static annotations etc. we could use if we wanted to get fancy, but I can see how to build this dynamic check.

Also, Quic has this same pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the factory lookup functions in a non control plane setting is not legal. I urge we invest in adding the appropriate ASSERTs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored to make it non-fatal and added a test. PTAL!

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some remaining nits. Looks good otherwise.

@@ -230,6 +230,33 @@ class Utility {
return *factory;
}

/**
* Get a Factory from the registry with a particular name or return nullptr.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: worth adding a @return statement to the comment?

@@ -442,14 +442,19 @@ Network::FilterStatus Filter::initializeUpstreamConnection() {
}

bool Filter::maybeTunnel(Upstream::ThreadLocalCluster& cluster, const std::string& cluster_name) {
std::cerr << " HERE\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Remove debug logging.

* @param info supplies the stream info object associated with the upstream connection.
* @param upstream supplies the generic upstream for the stream.
* @param host supplies the description of the host that will carry the request.
* @param upstream_local_address supplies the local address of the upstream connection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: documentation and argument names disagree: upstream_local_address vs local_address

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

@snowp or @mattklein123 up for second pass?

@snowp snowp self-assigned this Oct 26, 2020
antoniovicente
antoniovicente previously approved these changes Oct 26, 2020
Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks good for the most part, just a few comments

*
* @param callbacks callbacks to communicate stream failure or creation on.
*/
virtual void newStream(GenericConnectionPoolCallbacks* callbacks) PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this take a ref instead of a pointer?

source/common/config/utility.h Outdated Show resolved Hide resolved
*/
virtual GenericConnPoolPtr
createGenericConnPool(const std::string& cluster_name, Upstream::ClusterManager& cm,
absl::optional<std::string> hostname,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const ref

}

absl::optional<std::string> hostname =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a string copy, which means two string copies here since hostname is passed by value to createGenericConnPool. Might not matter in this context

A higher level comment: should we do the factory lookup + partially populate the arguments during filter factory creation? That way we avoid factory lookup and string copies etc in the data path

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

@snowp I think I finally got CI sorted, PTAL!

Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! Just a few small comments

*
* @param disable true if the stream should be read disabled, false otherwise.
* @return returns true if the disable is performed, false otherwise
(e.g. if the connection is closed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing *

* @param config the tunneling config, if doing connect tunneling.
* @param context the load balancing context for this connection.
* @param upstream_callbacks the callbacks to provide to the connection if successfully created.
* @return may be null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would this be null?

@@ -230,6 +231,33 @@ class Utility {
return *factory;
}

/**
* Get a Factory from the registry with a particular name or return nullptr.
* @param name string identifier for the particular implementation. Note: this is a proto string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is just a regular string?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alyssawilk
Copy link
Contributor Author

@htuch @mattklein123 @lizan I can has API stamp?

@mattklein123 mattklein123 self-assigned this Nov 5, 2020
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API LGTM

@repokitteh-read-only repokitteh-read-only bot removed the api label Nov 5, 2020
@alyssawilk alyssawilk merged commit 6a044a3 into envoyproxy:master Nov 5, 2020
@alyssawilk alyssawilk deleted the tcp_api branch June 10, 2021 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tunnel TCP over HTTP/2 POST
6 participants