Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental gRPC server #16534

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

finnegancarroll
Copy link
Contributor

@finnegancarroll finnegancarroll commented Oct 31, 2024

Description

gRPC server implemented as a network plugin.

Running with transport-grpc plugin

Note: The gRPC server is additionally gated by a feature flag: "opensearch.experimental.feature.grpc.enabled"
This is currently enabled by default as some additional details with setting feature flags with gradle and providing port permissions need to be sorted out.

./gradlew run -PinstalledPlugins="['transport-grpc']"

Testing (grpcurl):

The gRPC server supports a few basic services.

Default service for listing available services + health check/ping service.

grpcurl -plaintext localhost:9400 list
grpcurl -plaintext localhost:9400 grpc.health.v1.Health/Check

Related Issues

RFCs detailing generation of protobuf files from api-specification - 1 2.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for b5fddb1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 24a40ad: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@finnegancarroll finnegancarroll mentioned this pull request Nov 4, 2024
1 task
Copy link
Contributor

github-actions bot commented Nov 7, 2024

❌ Gradle check result for 3bcd5bb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Nov 7, 2024

❌ Gradle check result for b2a1078: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Nov 8, 2024

❌ Gradle check result for ee85066: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for eda1c04: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 5038cf3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 1e5d354: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 00d4eb1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 6151622: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b8d88d2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b404cee: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 41e70eb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for e985043: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for bbabecd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 853b206: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for da16c66: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@finnegancarroll finnegancarroll changed the base branch from main to feature/grpc November 23, 2024 00:17
@finnegancarroll finnegancarroll marked this pull request as ready for review November 23, 2024 00:24
Copy link
Contributor

github-actions bot commented Dec 9, 2024

❌ Gradle check result for c49cbc0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Dec 9, 2024

❌ Gradle check result for 74680e8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Dec 9, 2024

❌ Gradle check result for 6de6037: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Dec 9, 2024

❌ Gradle check result for 62d4522: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Dec 9, 2024

❌ Gradle check result for 7d90b9c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@andrross
Copy link
Member

andrross commented Dec 9, 2024


FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':plugins:transport-grpc:dependencyLicenses'.
> Missing SHA for guava-32.1.1-jre.jar. Run "gradle updateSHAs" to create them

Can we get the build issues sorted out and get a passing gradle check? You should be able to run ./gradlew precommit locally to find a lot of these problems before pushing your commits.

Copy link
Contributor

❌ Gradle check result for ac33ea2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@finnegancarroll This looks good to me. Let's get this across the finish line. Just a few things remain:

  • mark the PR as not draft
  • fix the license issue by running ./gradlew updateSHAs
  • get the build passing by removing the empty integration tests
  • confirm the full ./gradlew check passes on the PR

if (FeatureFlags.isEnabled(FeatureFlags.GRPC_ENABLE_SETTING) == false) {
throw new IllegalArgumentException("transport-grpc is experimental and feature flag must be enabled before use");
}
this.transport = new Netty4GrpcServerTransport(clusterService.getSettings(), Collections.emptyList());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curios why do you need access to clusterService.getSettings() here? I would suspect you could move instantiation to getHttpTransports and use provided settings instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will definitely need the client instance, which is why initialization is happening here. An earlier revision that had a hand-rolled example of the MainAction implemented as gRPC showed this. We probably could stash off the client instance and then do initialization in the getHttpTransports() method though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could stash off the client instance and then do initialization in the getHttpTransports() method though.

Yeah. exactly

Copy link
Member

@andrross andrross Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta Sorry, I remember now. We have to return the transport instance here to allow the core to manage its lifecycle (it implements LifecycleComponent). This is why we have to do the weird initialization here but grab the needed network dependency in the other callback.

The core doesn't need to know anything specific about this component, but it does need to start and stop it at node startup/shutdown.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fully addressable by #16534 (comment)

// This is something of a hack to capture the network service via the HTTP transport extension
// point, but unless/until a specific gRPC extension point is needed in the future this will work.
transport.setNetworkService(networkService);
return Collections.emptyMap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we have to use GRPC + H2 (which means TLS), so the right hook would be getSecureHttpTransports, also please return the transport so it could be configured:

Suggested change
return Collections.emptyMap();
return Collections.singleton("grpc", transport);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gRPC class isn't actually a HttpServerTransport so it can't be returned here. The whole HttpServerTransport framework is really a low-level transport that the server builds the REST framework on top of. The gRPC framework is both the low-level transport and the application protocol and doesn't really fit in the same way as HttpServerTransport.

Agree we'll probably need to hook into the secure version but figured we could punt that until TLS is implemented.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we'll probably need to hook into the secure version but figured we could punt that until TLS is implemented.

Hm ... so basically the larger question here is:

  • we need new kind of transports support hooks, and
  • possibility to configure more than one (since those are complementary and not a replacement)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there will be a lot more to come here, but I think this approach is okay to let us start chipping away at it. This basic structure allows the plugin to "own" pretty much everything about the new transport and the core doesn't need to really know anything. The only exception here was enabling the security manager to allow the port to be opened and I'm definitely open to better options there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions provided (#16534 (comment)), very simple non-intrusive APIs to allow transport activation

Copy link
Member

@andrross andrross Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like the idea of not touching the plugin interface until it was truly necessary and we had more clarity on what would be needed. Do you think it is better to add a new interface right away?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is experimental anyway, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely. Just chatted with Finn and we'll add this interface.

* @param settings the {@link Settings} instance to read the gRPC settings from
*/
private static void addSocketPermissionForGrpc(final Permissions policy, final Settings settings) {
final String grpcRange = GrpcTransportSettings.SETTING_GRPC_PORT.get(settings).getPortRangeString();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong, the GRCP plugin policy has to include the permission, it may need placeholders support for port ranges though (but it could be implemented).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was the hangup? The policy must be hard-coded but the specific port needs to be user-configurable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do allow placeholders (for code bases fe), we could implement the placeholders for ports (or open up some limited subset fe)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to get the specific value from the configuration at runtime at startup. Do we have any existing examples like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something like addSocketPermissionForPluginTransports and allow each plugin to consume from the available pool?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to get the specific value from the configuration at runtime at startup. Do we have any existing examples like that?

Yes, we do, check please Security::addSocketPermissionForTransportProfiles

Supplier<RepositoriesService> repositoriesServiceSupplier
) {
if (FeatureFlags.isEnabled(FeatureFlags.GRPC_ENABLE_SETTING) == false) {
throw new IllegalArgumentException("transport-grpc is experimental and feature flag must be enabled before use");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception should only trigger when the http.type includes to grcp, installing the plugin without activation should cause no harm though

@@ -402,6 +405,10 @@ static void addFilePermissions(Permissions policy, Environment environment) thro
private static void addBindPermissions(Permissions policy, Settings settings) {
addSocketPermissionForHttp(policy, settings);
addSocketPermissionForTransportProfiles(policy, settings);

if (FeatureFlags.isEnabled(GRPC_ENABLE_SETTING)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the bootstrap should know nothing about GRPC optional plugin

Signed-off-by: Finn Carroll <carrofin@amazon.com>
The transport-grpc plugin starts a gRPC server which is injected into the Node as a lifecycle component
and gated behind an experimental feature flag. This server runs on a configurable port range in parallel to the transport-http server which still hosts the familiar REST API. Only two initial services are registered to the server, 'list' and 'Health/Check'.

Add initial housekeeping for the new plugin and its dependencies:
- Add boilerplate netty transport security policy to plugin
- Populate plugins/transport-grpc/licenses with new gRPC dependency licenses & hashes
- Initial server start/stop unit test
- Initial integ test case example
- Guava removed as a forbidden dependecy
- Bump project wide gRPC version 1.68.0 -> 1.68.2
- Update discovery-gce/repository-gcs grpc sha1 to accommodate version bump
- Add socket permissions for gRPC server in Security.java

Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
* @param publishInetAddress address published for the server
* @return Resolved port. If publishPort is negative and no port can be resolved return publishPort.
*/
public static int resolvePublishPort(int publishPort, List<TransportAddress> boundAddresses, InetAddress publishInetAddress) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could generalize the implementation with TcpTransport.resolvePublishPort under Transport interface?

Copy link
Contributor

❌ Gradle check result for 542656a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator

reta commented Dec 13, 2024

@finnegancarroll this is great draft but I think we are largely missing the integration with core at the moment. I believe this is by design, have no issues with that, but we have to at least draft the API hooks for generic transports exposure and activation. Something very basic would be fine:

public interface ServerTransport extends LifecycleComponent {
 ....
}
    default Map<String, Supplier<ServerTransport>> getServerTransports(
        Settings settings,
        ThreadPool threadPool,
        BigArrays bigArrays,
        PageCacheRecycler pageCacheRecycler,
        CircuitBreakerService circuitBreakerService,
        NamedXContentRegistry xContentRegistry,
        NetworkService networkService,
        HttpServerTransport.Dispatcher dispatcher,
        ClusterSettings clusterSettings,
        SecureHttpTransportSettingsProvider secureHttpTransportSettingsProvider,
        Tracer tracer
    ) {
         ....
    }

@finnegancarroll finnegancarroll changed the base branch from feature/grpc to main December 13, 2024 19:05
Copy link
Contributor

❌ Gradle check result for 4d47a70: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants