Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pub/Sub GetIAMPolicy is using QUIC only. #12671

Closed
yilinhan opened this issue Apr 30, 2024 · 18 comments
Closed

Pub/Sub GetIAMPolicy is using QUIC only. #12671

yilinhan opened this issue Apr 30, 2024 · 18 comments
Assignees

Comments

@yilinhan
Copy link

yilinhan commented Apr 30, 2024

Environment details

  • OS: Windows
  • .NET version: 8.0
  • Package name and version: 3.12.0

Description:

I assume google apis should switch to other protocol if QUIC is banned, but it is not when calling pubsub GetIAMPolicy.
Other api works as expected, such as ListTopics, CreateSubscriptions.

Steps to reproduce

  1. Ban QUIC protocol on network level.
  2. Run GetTopicIAMPolicy as the example below
    https://cloud.google.com/pubsub/docs/samples/pubsub-get-topic-policy?hl=en#pubsub_get_topic_policy-csharp
  3. Result:
    Error: Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error starting gRPC call. HttpRequestException: The connection timed out from inactivity. (pubsub.googleapis.com:443) QuicException: The connection timed out from inactivity.", DebugException="System.Net.Http.HttpRequestException: The connection timed out from inactivity. (pubsub.googleapis.com:443)") ---> System.Net.Http.HttpRequestException: The connection timed out from inactivity. (pubsub.googleapis.com:443)
@jskeet
Copy link
Collaborator

jskeet commented Apr 30, 2024

Well, our client libraries use gRPC over HTTP/2.0 by default - with no automatic fallback. I don't know enough about the details of how QUIC relates to gRPC to know whether "gRPC over HTTP/2.0 without QUIC" is even a sensible phrase.

You can use HTTP/1.1 + JSON for at least some APIs (see the "transport selection" documentation), but PublisherServiceClient at least doesn't claim to support it, I'm afraid (and I imagine streaming calls might fail).

Assigning to @amanda-tarafa who may have some other ideas (and who is still working, whereas I'm done for the day).

Other api works as expected, such as ListTopics, CreateSubscriptions.

That's really surprising. I'd expect all calls to fail...

@jskeet jskeet assigned amanda-tarafa and unassigned jskeet Apr 30, 2024
@amanda-tarafa
Copy link
Contributor

I'll try to see if I can reproduce tomorrow, but I don't expect this to be related to the library at all.

One thing to maybe try, is to see if you can reproduce the issue using Grpc.Core instead of Grpc.Net.Client which is what would be used by default. For using Grpc.Core follow the Transport Selection documentation.

@jskeet
Copy link
Collaborator

jskeet commented May 1, 2024

The error message is interesting too: "QuicException: The connection timed out from inactivity"

@yilinhan do you get the error immediately, or does it take time? (If you could add some logging to clarify that, it would help.)

Looking at this again, I hadn't spotted that this is actually creating an IAM policy client, and that potentially does support REST - you'd just need to create it differently. But I wonder whether there's some problem with how we're creating the IAM policy client from the publisher client.

Do you see this problem in a standalone console app, and if so, are you doing anything custom in terms of how you're creating the client? If we were to create a separate console app for you to try some alternative settings, would you easily be able to test that?

@yilinhan
Copy link
Author

yilinhan commented May 1, 2024

The error message is interesting too: "QuicException: The connection timed out from inactivity"

@yilinhan do you get the error immediately, or does it take time? (If you could add some logging to clarify that, it would help.)

It takes about 10 seconds, looks like a timeout.

Looking at this again, I hadn't spotted that this is actually creating an IAM policy client, and that potentially does support REST - you'd just need to create it differently. But I wonder whether there's some problem with how we're creating the IAM policy client from the publisher client.

To clarify, I am using this lib(pubsub.v1) to creating topics, creating subscriptions, streaming pull messages from topics, and they all are working fine, except IAM client related calls under publisherApiClient.

Do you see this problem in a standalone console app, and if so, are you doing anything custom in terms of how you're creating the client? If we were to create a separate console app for you to try some alternative settings, would you easily be able to test that?

Console app means using gcloud cli? I blocked QUIC, and I tried this command on the console gcloud pubsub topics get-iam-policy <topic name> which is returning the correct policy.
I do not pass any extra argument while creating publisher api client. (new PublisherServiceApiClientBuilder{ChannelCredential=xxx}.Build();)
I can easily reproduce issue by blocking QUIC

@jskeet
Copy link
Collaborator

jskeet commented May 1, 2024

Console app means using gcloud cli?

No, it means a normal .NET console app - as in the result of running "dotnet new console" and then writing some code.

I can easily reproduce issue by blocking QUIC

Please could you tell us more about the context in which you're running the code, and how you're blocking QUIC? (If that blocking is easy for us to do as well, I'm happy to try to reproduce it. But I can imagine there are multiple ways of doing it, which could have subtly different results.)

@yilinhan
Copy link
Author

yilinhan commented May 1, 2024

I haven't tried using the SDK on dotnet console app.

I am running a backend .net application under the VPN/firewall, and QUIC can be blocked like these links:
https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClarCAC
https://community.fortinet.com/t5/FortiGate/Technical-Tip-Block-QUIC-Protocol/ta-p/197661

@jskeet
Copy link
Collaborator

jskeet commented May 1, 2024

Right, unfortunately that's firewall-vendor-specific - so while you may well be able to test easily, I won't be able to :(

@yilinhan
Copy link
Author

yilinhan commented May 1, 2024

I don't really have my network firewall control on my local PC, so I can't test this solution.
However, could you please give a try to block UDP port 443 and 80 on your local machine?
It should block your access to pubsub.googleapis.com:443
Windows example:
https://www.hostwinds.com/tutorials/how-to-open-or-block-ports-using-windows-firewall

@amanda-tarafa
Copy link
Contributor

I'll see if I can reproduce. I'll get back here when I know more.

@jskeet
Copy link
Collaborator

jskeet commented May 2, 2024

Possibly related comment on grpc/grpc-dotnet#2404:

You can work around this issue by creating a delegating handler that sets HttpRequestMessage.VersionPolicy to exact instead of greater than. Then the gRPC client will only try to negotiate a HTTP/2 connection and will ignore HTTP/3 upgrade.

This is undoubtedly fiddly to configure (and I'm going to follow up in terms of next steps for simplifying this), and I've no idea why it would only affect the IAM RPCs, but it's at least a potential way forward.

@amanda-tarafa
Copy link
Contributor

I've attempted to reproduce this and I haven't been able to. I went all the way and blocked incoming and outgoing UDP connections and the Pub/Sub, including IAM RPCs worked fine.

I must say, I ocassionally saw:

Grpc.Core.RpcException: 'Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Net.Sockets.SocketException: No such host is known.")'

But when that happened, I reenabled UDP and ran the code once, successfully, disabled UDP again and the code ran repetedly successfully again. So I think this was just DNS cache refresh failing because of blocked UDP.

@yilinhan
Copy link
Author

yilinhan commented May 8, 2024

static void GetTopicIamPolicy()
{
    //arrange
    string projectId = "xxx";
    string topicId =
        "xxxx";
    string credentialPath = "path";
    Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", credentialPath);
    PublisherServiceApiClient publisher = PublisherServiceApiClient.Create();

    var topics = publisher
        .ListTopics(new ListTopicsRequest { Project = "projects/" + projectId })
        .ToList(); //works

    TopicName topicName = TopicName.FromProjectTopic(projectId, topicId);

    Google.Cloud.Iam.V1.Policy policy = publisher.IAMPolicyClient.GetIamPolicy(
        new Google.Cloud.Iam.V1.GetIamPolicyRequest { ResourceAsResourceName = topicName }
    ); //throw exception
}

Here is the code I tested:
Block QUIC:
ListTopics() returned correct results; GetIamPolicy() timeout with exception.

Unblock QUIC:
ListTopics() and GetIamPolicy() all return correct results.

It could be network vendor blocks QUIC differently, but it is interesting one works one does not.

I also attached the network screenshots here:
image

@jskeet
Copy link
Collaborator

jskeet commented May 9, 2024

I have a theory about your example - please could you try swapping round the two calls? Given another issue we've been looking at, I wouldn't be surprised if "whichever request is made first" works, and then the subsequent one fails.

If you could test that for us, it would really help us understand what's happening.

@yilinhan
Copy link
Author

yilinhan commented May 9, 2024

I have a theory about your example - please could you try swapping round the two calls? Given another issue we've been looking at, I wouldn't be surprised if "whichever request is made first" works, and then the subsequent one fails.

If you could test that for us, it would really help us understand what's happening.

You are right, the order matters, but not exactly subsequent one fails (case 3,4,5 below).

  1. ListTopics()/IAMPolicyClient.GetIamPolicy() =>works, fails
  2. ListTopicSubscriptions()/IAMPolicyClient.GetIamPolicy() =>works, fails
  3. ListTopics()/ListTopicSubscriptions()/IAMPolicyClient.GetIamPolicy() =>works, works, works
  4. IAMPolicyClient.GetIamPolicy()/ListTopics()/ => works, works
  5. IAMPolicyClient.GetIamPolicy()/ListTopicSubscriptions()/ => works, works

For case 3: the second call ListTopicSubscriptions() takes away longer time to get the response, and the result is correct. Maybe the protocol is switched here, and the following GetIAMPolicy() calls success.

@jskeet
Copy link
Collaborator

jskeet commented May 13, 2024

Please could you try this workaround code? The intention is that this forces the use of HTTP/2.0 everywhere, so QUIC being prohibited won't cause a problem.

var publisher = new PublisherServiceApiClientBuilder
{
    GrpcAdapter = GrpcNetClientAdapter.Default.WithAdditionalOptions(ExactVersionHandler.ModifyGrpcChannelOptions)
}.Build();

...

/// <summary>
/// Delegating handler which enforces that messages are sent with HttpVersionPolicy.RequestVersionExact.
/// </summary>
internal class ExactVersionHandler : DelegatingHandler
{
    internal ExactVersionHandler(HttpMessageHandler handler) : base(handler)
    {
    }

    /// <summary>
    /// Convenience method to be used from GrpcNetClientAdapter.WithAdditionalOptions.
    /// </summary>
    internal static void ModifyGrpcChannelOptions(Grpc.Net.Client.GrpcChannelOptions options) =>
        options.HttpHandler = new ExactVersionHandler(new SocketsHttpHandler { EnableMultipleHttp2Connections = true });

    // Note: gRPC never calls the synchronous method.
    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        request.VersionPolicy = HttpVersionPolicy.RequestVersionExact;
        return base.SendAsync(request, cancellationToken);
    }
}

@yilinhan
Copy link
Author

Please could you try this workaround code? The intention is that this forces the use of HTTP/2.0 everywhere, so QUIC being prohibited won't cause a problem.

var publisher = new PublisherServiceApiClientBuilder
{
    GrpcAdapter = GrpcNetClientAdapter.Default.WithAdditionalOptions(ExactVersionHandler.ModifyGrpcChannelOptions)
}.Build();

...

/// <summary>
/// Delegating handler which enforces that messages are sent with HttpVersionPolicy.RequestVersionExact.
/// </summary>
internal class ExactVersionHandler : DelegatingHandler
{
    internal ExactVersionHandler(HttpMessageHandler handler) : base(handler)
    {
    }

    /// <summary>
    /// Convenience method to be used from GrpcNetClientAdapter.WithAdditionalOptions.
    /// </summary>
    internal static void ModifyGrpcChannelOptions(Grpc.Net.Client.GrpcChannelOptions options) =>
        options.HttpHandler = new ExactVersionHandler(new SocketsHttpHandler { EnableMultipleHttp2Connections = true });

    // Note: gRPC never calls the synchronous method.
    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        request.VersionPolicy = HttpVersionPolicy.RequestVersionExact;
        return base.SendAsync(request, cancellationToken);
    }
}

The issue is resolved with this workaround.

@jskeet
Copy link
Collaborator

jskeet commented May 13, 2024

Hooray! I suggest you use this workaround for now then; we hope to make this simpler to use (e.g. via the GAX GrpcChannelOptions) over time, but for the moment this is probably going to be the simplest approach.

@jskeet
Copy link
Collaborator

jskeet commented Sep 30, 2024

FYI: Google.Cloud.Pubsub v3.18.0 was released last week, using Grpc.Net.Client v2.66.0 which uses HTTP/2.0 by default, so the workaround isn't required any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants