Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP2 and gRPC support #2539

Merged
merged 18 commits into from
Jan 30, 2019

Conversation

tanzeeb
Copy link
Contributor

@tanzeeb tanzeeb commented Nov 24, 2018

This PR adds support for HTTP/2 and gRPC services:

  1. For each revision, the container port name is used to determine the appropriate protocol, as described in the runtime contract. The default is HTTP/1.1.
  2. For each revision, the port name of the k8s service is determined by the revision's protocol. For h2c, http2 and for http1, http. The default is http.
  3. The activator is served on two separate ports, one for each protocol.
  4. When a revision is scaled-to-zero, the ClusterIngress will route to the appropriate activator port based on the revision's protocol.

Manual Test Plan

Fixes #813
Fixes #706
Fixes #707

Release Note

* Support gRPC and HTTP/2 requests

Edit Added implementation details.

@knative-prow-robot knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 24, 2018
Copy link
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanzeeb: 1 warning.

In response to this:

This PR adds support for HTTP/2 and gRPC services.

Fixes #813
Fixes #706
Fixes #707

Release Note

* Support gRPC and HTTP/2 requests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mattmoor
Copy link
Member

I think we definitely want an e2e test for this.

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @tanzeeb for having separately reviewable commits with meaningful commit messages. That helped a lot while stepping through this change! 🙏

The change overall looks good to me, I left a few comments throughout but nothing major at all. Great job!

@tanzeeb
Copy link
Contributor Author

tanzeeb commented Nov 26, 2018

/test pull-knative-serving-upgrade-tests
/test pull-knative-serving-integration-tests

@tanzeeb
Copy link
Contributor Author

tanzeeb commented Nov 27, 2018

/hold

Thanks for reviewing @mattmoor @markusthoemmes

Will fix the broken tests and add some e2e tests.

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 27, 2018
@tanzeeb tanzeeb force-pushed the http2-and-grpc-support branch from 1398d27 to 442563d Compare November 27, 2018 15:57
@tanzeeb
Copy link
Contributor Author

tanzeeb commented Dec 5, 2018

Update:

TL;DR: This PR breaks for all applications that only support HTTP 1.1. Turns out I did a really bad job of 1) testing this feature, and 2) bisecting the test failure (sorry revision.timeoutSeconds, it wasn't your fault 😞).

Background:

Istio uses the Kubernetes Service port name of the upstream service (in our case, the Revision) to determine the request protocol. Changing the Kubernetes Service port name from http to http2 instructs the Envoy IngressGateway to make HTTP 2 requests to the service:

        // ServicePortName = "http"
	ServicePortName = "http2"

This will convert all requests, including HTTP 1.1 requests, into HTTP 2. This works very well for apps that support HTTP 2, but breaks all other apps. If we keep the port name as http, all requests, including HTTP 2 requests, will get converted to HTTP1.

The full compatibility matrix looks like this:

Port name: http2 / grpc:

HTTP 1.1 request HTTP 2 request Unary gRPC request Streaming gRPC request
HTTP 1.1 server
HTTP 2 server
gRPC server

Port name http:

HTTP 1.1 request HTTP 2 request Unary gRPC request Streaming gRPC request
HTTP 1.1 server
HTTP 2 server
gRPC server

Problem

In this PR, I was hoping to use http2 to support all http-ish protocols. This won't work. Istio will always convert the request to the upstream service protocol. Envoy has a feature to use the client protocol instead of the upstream protocol, but this breaks Istio so it is not an option.

Potential Solution?

We have to dynamically select http, http2 or grpc for the K8s Service port name.

In a previous PR there was hesitation around specifying the protocol in the revision spec. I can't think of a way around it...

Update from @evankanderson:

See the runtime spec for the current way to do this, using the container.ports[0].name field.

Edit: Corrected the charts

@evankanderson
Copy link
Member

Thanks for the in-depth investigation. It looks like istio/istio#6158 is not very conclusive -- istio/istio#6611 reverts the behavior, but I'm wondering whether unspecified port name should be equivalent to envoy's USE_DOWNSTREAM_PROTOCOL. Unfortunately, it looks like this would require adding an enum value to the supported Pilot Protocols.

In particular, it's possible to have a docker container which answers HTTP1, HTTP via h2c, and gRPC via h2c -- there's no particular reason why the container would need to support HTTP1, but it might be convenient for testing (distro curl may not support the -2 flag, for example). It would be nice to be able to request "attempt http2 but fall back to http" in the Istio configuration, which is different than USE_DOWNSTREAM_PROTOCOL (more of a USE_BEST_PROTOCOL).

@tanzeeb
Copy link
Contributor Author

tanzeeb commented Dec 6, 2018

It would be nice to be able to request "attempt http2 but fall back to http" in the Istio configuration, which is different than USE_DOWNSTREAM_PROTOCOL (more of a USE_BEST_PROTOCOL).

This would solve all of our problems 😃

@evankanderson
Copy link
Member

evankanderson commented Dec 7, 2018 via email

@tanzeeb tanzeeb force-pushed the http2-and-grpc-support branch from 442563d to 0c252ec Compare December 7, 2018 23:46
@tanzeeb tanzeeb force-pushed the http2-and-grpc-support branch from 0c252ec to 58f2b1f Compare December 19, 2018 02:37
@mattmoor mattmoor added this to the Serving 0.4 milestone Jan 3, 2019
@tcnghia tcnghia mentioned this pull request Jan 3, 2019
@tanzeeb tanzeeb force-pushed the http2-and-grpc-support branch 3 times, most recently from 17987fc to 9641f41 Compare January 9, 2019 23:27
@tanzeeb tanzeeb force-pushed the http2-and-grpc-support branch from d7ab918 to 1811bc0 Compare January 30, 2019 21:34
@tanzeeb tanzeeb force-pushed the http2-and-grpc-support branch from 1811bc0 to 1700fbc Compare January 30, 2019 21:36
@knative-metrics-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/activator/activator.go Do not exist 100.0%
pkg/activator/util/io.go Do not exist 100.0%
pkg/activator/util/rewinder.go 92.9% 100.0% 7.1
pkg/apis/serving/v1alpha1/revision_types.go 87.0% 87.9% 0.9

@evankanderson
Copy link
Member

I'm comfortable with a quick followup e2e test, but note that until there is an e2e test for this functionality, it's likely to backslide and be broken by accident.

@tanzeeb
Copy link
Contributor Author

tanzeeb commented Jan 30, 2019

Hi folks,

Thanks for the feedback so far. I'm happy to tackle the e2e tests next, but in the meantime, here's a manual test plan:

Test Plan

Pre-requisites

  1. Get the app go get github.com/evankanderson/sia
  2. Create a Service
apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
  name: sia
  namespace: default
spec:
  runLatest:
    configuration:
      revisionTemplate:
        spec:
          container:
            image: github.com/evankanderson/sia
            ports:
              - name: h2c   # set this to `http1` to test http1
                containerPort: 8080
  1. Install grpcurl and Curl with HTTP/2 support (--http2-prior-knowledge)

Test HTTP 1.1

  1. ko apply -f sia.yaml with port name http1
  2. curl -H 'Host: sia.default.example.com' http://${ip}:80 --http1.1
  3. Check the logs (kubectl logs <sia pod> user-container), should say "Got GET on 1 with ..."
  4. Repeat steps 2-3 after scale-to-zero

Test HTTP 2.0

  1. ko apply -f sia.yaml with port name h2c
  2. curl -H 'Host: sia.default.example.com' http://${ip}:80 --http2-prior-knowledge
  3. Check the logs (kubectl logs <sia pod> user-container), should say "Got GET on 2 with ..."
  4. Repeat steps 2-3 after scale-to-zero

Test Unary gRPC

  1. ko apply -f sia.yaml with port name h2c
  2. echo '{"thing":"SOMETHING"}' | grpcurl -plaintext -proto doer/doer.proto -authority sia.default.example.com -format json -d @ ${ip}:80 doer.Doer/DoIt
  3. Should see { "words": "Did: SOMETHING" }
  4. Check the logs (kubectl logs <sia pod> user-container), should say "Got POST on 2 with application/grpc"
  5. Repeat steps 2-4 after scale-to-zero

Test Streaming gRPC

  1. ko apply -f sia.yaml with port name h2c
  2. echo '{"thing":"SOMETHING"}{"thing":"SOMETHING ELSE"}' | grpcurl -plaintext -proto doer/doer.proto -authority sia.default.example.com -d @ ${ip}:80 doer.Doer/KeepDoing
  3. Should see { "words": "Did: SOMETHING" }{ "words": "Did: SOMETHING ELSE" }
  4. Check the logs (kubectl logs <sia pod> user-container), should say "Got POST on 2 with application/grpc"
  5. yes '{"thing":"SOMETHING"}' | grpcurl -plaintext -proto doer/doer.proto -authority sia.default.example.com -d @ ${ip}:80 doer.Doer/KeepDoing
  6. Should see endless stream of { "words": "Did: SOMETHING" }
  7. Repeat steps 2-6 after scale-to-zero

@tanzeeb
Copy link
Contributor Author

tanzeeb commented Jan 30, 2019

I'm comfortable with a quick followup e2e test, but note that until there is an e2e test for this functionality, it's likely to backslide and be broken by accident.

Understood. It's happened a few times while I was working on the PR, so e2e tests are definitely a top priority.

I'd still like to get this PR in before that, so that folks have a chance to play with the functionality. There's been a lot of theoretical discussion on how this will impact other areas, such as autoscaling (eg. #2916), and it'd be helpful for those discussions to be informed by an actual implementation.

@evankanderson
Copy link
Member

/approve

@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: evankanderson, tanzeeb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 30, 2019
@evankanderson
Copy link
Member

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 30, 2019
@knative-prow-robot knative-prow-robot merged commit d92cc73 into knative:master Jan 30, 2019
ZhiminXiang pushed a commit to ZhiminXiang/serving-1 that referenced this pull request Feb 7, 2019
* Use http.DefaultTransport dialer settings in h2c.DefaultTransport

* Support streaming in activator/util.Rewinder

* Use custom timeout handler in queue-proxy

The http.TimeoutHandler will buffer the response body in memory
until either the request completes or the request times out.

This works well for HTTP, but is a problem for HTTP2 and gRPC
streaming requests, where responses should be written as each
sub-request is processed.

This commit enforces the timeout by processing the request in a separate
goroutine and panicing with http.ErrAbortHandler if the timeout is
reached. The http.ErrAborthandler is a canary error used by the net/http
and x/net/http2 packages to gracefully end the connection without dumping
the stack trace.

* Use http2 instead of http as the k8s service port names

Istio uses port names in k8s services to determine protocols supported
by the service.

This change allows kservices to support HTTP/2 and gRPC traffic.

* Enforce max content length for streaming requests in activator

* Don't read original reader again after rewinder is closed

* Use chan struct{} instead of chan bool in queue-proxy timeoutHandler

* Move LimitReadCloser from activator handler to activator util and add test coverage

* Add/fix godoc comments for activator/util

* Explictly set Dialer config defaults in h2c.DefaultTransport

* Remove timeoutHandler from queue-proxy

* Revert k8s Service port name from http2 to http

Changing it to http2 for all services breaks services which only support
http1. Support for http2 will require selectively setting the port name
to http2 only for services that explicitly support it.

* Run activator on two ports, to support activating both http1 and h2c targets

* Add RevisionProtocolType to API

* Dynamically select 'http' or 'http2' for k8s service port name

* Dynamically route to the http1 or h2c activator service port based on the revision protocol

* Add test coverage for activator.ServicePort

* Print errors in EnforceLengthHandler tests
dgerd pushed a commit to dgerd/serving that referenced this pull request May 7, 2019
This change makes numerous cleanups to the runtime contract in an
attempt to improve the readability of the document and make the document
more useful for the intended auidence.

* Moves developer facing statements to a new `runtime-user-guide`.
Focuses `runtime-contract` on operator/platform-provider.
* Add links to Conformance tests that test Runtime Contract statements.
* Corrects, updates, or removes statements to more accurately represent
today's Knative runtime.
* Updates to informative or removes most untestable statements
* Copies in important OCI runtime requirements we previously referenced
* Removes reference to OCI specification that didn't bring new
requirements.

Ref: knative#2539, knative#2973, knative#4014, knative#4027
@dgerd dgerd mentioned this pull request May 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants