-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental gRPC server #16534
base: main
Are you sure you want to change the base?
Experimental gRPC server #16534
Conversation
❌ Gradle check result for b5fddb1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 24a40ad: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 3bcd5bb: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for b2a1078: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for ee85066: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for eda1c04: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
eda1c04
to
5038cf3
Compare
❌ Gradle check result for 5038cf3: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 1e5d354: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 00d4eb1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 6151622: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for b8d88d2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for b404cee: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 41e70eb: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for e985043: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for bbabecd: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 853b206: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for da16c66: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
da16c66
to
1937f5f
Compare
❌ Gradle check result for c49cbc0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 74680e8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
plugins/transport-grpc/src/main/proto/opensearch/main_action.proto
Outdated
Show resolved
Hide resolved
❌ Gradle check result for 6de6037: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 62d4522: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 7d90b9c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Can we get the build issues sorted out and get a passing gradle check? You should be able to run |
server/src/main/java/org/opensearch/common/util/FeatureFlags.java
Outdated
Show resolved
Hide resolved
7d90b9c
to
ac33ea2
Compare
❌ Gradle check result for ac33ea2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@finnegancarroll This looks good to me. Let's get this across the finish line. Just a few things remain:
- mark the PR as not draft
- fix the license issue by running
./gradlew updateSHAs
- get the build passing by removing the empty integration tests
- confirm the full
./gradlew check
passes on the PR
.../src/internalClusterTest/java/org/opensearch/transport/grpc/OpenSearchGrpcIntegTestCase.java
Outdated
Show resolved
Hide resolved
if (FeatureFlags.isEnabled(FeatureFlags.GRPC_ENABLE_SETTING) == false) { | ||
throw new IllegalArgumentException("transport-grpc is experimental and feature flag must be enabled before use"); | ||
} | ||
this.transport = new Netty4GrpcServerTransport(clusterService.getSettings(), Collections.emptyList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curios why do you need access to clusterService.getSettings()
here? I would suspect you could move instantiation to getHttpTransports
and use provided settings
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will definitely need the client
instance, which is why initialization is happening here. An earlier revision that had a hand-rolled example of the MainAction implemented as gRPC showed this. We probably could stash off the client instance and then do initialization in the getHttpTransports()
method though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably could stash off the client instance and then do initialization in the getHttpTransports() method though.
Yeah. exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reta Sorry, I remember now. We have to return the transport
instance here to allow the core to manage its lifecycle (it implements LifecycleComponent). This is why we have to do the weird initialization here but grab the needed network dependency in the other callback.
The core doesn't need to know anything specific about this component, but it does need to start and stop it at node startup/shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fully addressable by #16534 (comment)
// This is something of a hack to capture the network service via the HTTP transport extension | ||
// point, but unless/until a specific gRPC extension point is needed in the future this will work. | ||
transport.setNetworkService(networkService); | ||
return Collections.emptyMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we have to use GRPC + H2 (which means TLS), so the right hook would be getSecureHttpTransports
, also please return the transport so it could be configured:
return Collections.emptyMap(); | |
return Collections.singleton("grpc", transport); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gRPC class isn't actually a HttpServerTransport
so it can't be returned here. The whole HttpServerTransport
framework is really a low-level transport that the server builds the REST framework on top of. The gRPC framework is both the low-level transport and the application protocol and doesn't really fit in the same way as HttpServerTransport
.
Agree we'll probably need to hook into the secure version but figured we could punt that until TLS is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree we'll probably need to hook into the secure version but figured we could punt that until TLS is implemented.
Hm ... so basically the larger question here is:
- we need new kind of transports support hooks, and
- possibility to configure more than one (since those are complementary and not a replacement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think there will be a lot more to come here, but I think this approach is okay to let us start chipping away at it. This basic structure allows the plugin to "own" pretty much everything about the new transport and the core doesn't need to really know anything. The only exception here was enabling the security manager to allow the port to be opened and I'm definitely open to better options there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions provided (#16534 (comment)), very simple non-intrusive APIs to allow transport activation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like the idea of not touching the plugin interface until it was truly necessary and we had more clarity on what would be needed. Do you think it is better to add a new interface right away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is experimental anyway, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, definitely. Just chatted with Finn and we'll add this interface.
* @param settings the {@link Settings} instance to read the gRPC settings from | ||
*/ | ||
private static void addSocketPermissionForGrpc(final Permissions policy, final Settings settings) { | ||
final String grpcRange = GrpcTransportSettings.SETTING_GRPC_PORT.get(settings).getPortRangeString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong, the GRCP plugin policy has to include the permission, it may need placeholders support for port ranges though (but it could be implemented).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was the hangup? The policy must be hard-coded but the specific port needs to be user-configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do allow placeholders (for code bases fe), we could implement the placeholders for ports (or open up some limited subset fe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to get the specific value from the configuration at runtime at startup. Do we have any existing examples like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So something like addSocketPermissionForPluginTransports
and allow each plugin to consume from the available pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to get the specific value from the configuration at runtime at startup. Do we have any existing examples like that?
Yes, we do, check please Security::addSocketPermissionForTransportProfiles
Supplier<RepositoriesService> repositoriesServiceSupplier | ||
) { | ||
if (FeatureFlags.isEnabled(FeatureFlags.GRPC_ENABLE_SETTING) == false) { | ||
throw new IllegalArgumentException("transport-grpc is experimental and feature flag must be enabled before use"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception should only trigger when the http.type
includes to grcp
, installing the plugin without activation should cause no harm though
@@ -402,6 +405,10 @@ static void addFilePermissions(Permissions policy, Environment environment) thro | |||
private static void addBindPermissions(Permissions policy, Settings settings) { | |||
addSocketPermissionForHttp(policy, settings); | |||
addSocketPermissionForTransportProfiles(policy, settings); | |||
|
|||
if (FeatureFlags.isEnabled(GRPC_ENABLE_SETTING)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the bootstrap should know nothing about GRPC optional plugin
Signed-off-by: Finn Carroll <carrofin@amazon.com>
The transport-grpc plugin starts a gRPC server which is injected into the Node as a lifecycle component and gated behind an experimental feature flag. This server runs on a configurable port range in parallel to the transport-http server which still hosts the familiar REST API. Only two initial services are registered to the server, 'list' and 'Health/Check'. Add initial housekeeping for the new plugin and its dependencies: - Add boilerplate netty transport security policy to plugin - Populate plugins/transport-grpc/licenses with new gRPC dependency licenses & hashes - Initial server start/stop unit test - Initial integ test case example - Guava removed as a forbidden dependecy - Bump project wide gRPC version 1.68.0 -> 1.68.2 - Update discovery-gce/repository-gcs grpc sha1 to accommodate version bump - Add socket permissions for gRPC server in Security.java Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
* @param publishInetAddress address published for the server | ||
* @return Resolved port. If publishPort is negative and no port can be resolved return publishPort. | ||
*/ | ||
public static int resolvePublishPort(int publishPort, List<TransportAddress> boundAddresses, InetAddress publishInetAddress) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could generalize the implementation with TcpTransport.resolvePublishPort
under Transport
interface?
❌ Gradle check result for 542656a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@finnegancarroll this is great draft but I think we are largely missing the integration with core at the moment. I believe this is by design, have no issues with that, but we have to at least draft the API hooks for generic transports exposure and activation. Something very basic would be fine:
|
542656a
to
4d47a70
Compare
❌ Gradle check result for 4d47a70: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
gRPC server implemented as a network plugin.
Running with transport-grpc plugin
Note: The gRPC server is additionally gated by a feature flag: "opensearch.experimental.feature.grpc.enabled"
This is currently enabled by default as some additional details with setting feature flags with gradle and providing port permissions need to be sorted out.
Testing (grpcurl):
The gRPC server supports a few basic services.
Default service for listing available services + health check/ping service.
Related Issues
RFCs detailing generation of protobuf files from api-specification - 1 2.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.