Improve concurrent request bucketing in queue-proxy. #1091

markusthoemmes · 2018-06-07T15:09:55Z

Proposed Changes

Instead of counting all requests that arrived in a certain bucket as concurrent, the queue proxy now reports the actual maximum concurrency that was present inside of one bucket.

If for example 3 requests arrive at once, the maximum concurrency will be 3 for that bucket. If another arises while the 3 remaining are still open, maximum concurrency is 4. Closing requests results in a decrement on the concurrency immediatly (versus draining the outgoing request queue on quantization which results in the behavior described above).

Release Note

Improved concurrent request bucketing of the queue-proxy to report more accurate values.

googlebot · 2018-06-07T15:09:58Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

markusthoemmes · 2018-06-07T15:12:11Z

cmd/queue/main.go

-	reqInChan                = make(chan queue.Poke, requestCountingQueueLength)
-	reqOutChan               = make(chan queue.Poke, requestCountingQueueLength)
+	reqInChan                = make(chan queue.Poke)
+	reqOutChan               = make(chan queue.Poke)


I'm fairly new to go so please bear with me: My understanding is, that we shouldn't buffer here anymore. In an extreme case, this could lead to reporting a higher concurrency than what we actually see in the container (since In and Out are different channels). Maybe it makes sense instead to move away from "Poke" but have "In" and "Out" and push those through the same channel?

By not having a buffer the side effect is http requests that are being proxied through the queue will block waiting for something to read from this channel. Looking at the code this would likely happen when sending stats to the autoscaler is on a slow network.

It might be worth thinking about queue.Poke be or contain a timestamp. Then when aggregating we check if the time falls within our bucket interval and maybe drop the poke if it's not.

@dprotaso the channel for sending stats is still buffered though. My understanding is, this only makes incrementing/decrementing the concurrency counter blocking, which might be okay?

I don't think we should allow request handling to block on stat reporting. So I think the buffered channel should remain. We could push both in and out through the same channel, although I don't think it makes much of a difference. Order is not critical here.

We can think about what happens when stat reporting gets way behind. Right now, concurrency "happens" when the stat reporter sees it. The problem with making the Poke a Timestamp is that we have to keep in and out balanced. Otherwise concurrency won't return to zero. E.g. if we disregard one "in" because it's late, we must remember to disregard one "out". Which one?

markusthoemmes · 2018-06-07T15:12:31Z

/assign @josephburnett

rootfs · 2018-06-07T15:18:27Z

@markusthoemmes need to sign a CLA

markusthoemmes · 2018-06-08T15:05:12Z

I signed the CLA!

googlebot · 2018-06-08T15:05:15Z

CLAs look good, thanks!

josephburnett

This looks good. My only concern is removing the channel buffering.

josephburnett · 2018-06-11T20:09:25Z

cmd/queue/main.go

-	reqInChan                = make(chan queue.Poke, requestCountingQueueLength)
-	reqOutChan               = make(chan queue.Poke, requestCountingQueueLength)
+	reqInChan                = make(chan queue.Poke)
+	reqOutChan               = make(chan queue.Poke)


I don't think we should allow request handling to block on stat reporting. So I think the buffered channel should remain. We could push both in and out through the same channel, although I don't think it makes much of a difference. Order is not critical here.

We can think about what happens when stat reporting gets way behind. Right now, concurrency "happens" when the stat reporter sees it. The problem with making the Poke a Timestamp is that we have to keep in and out balanced. Otherwise concurrency won't return to zero. E.g. if we disregard one "in" because it's late, we must remember to disregard one "out". Which one?

Instead of counting all requests that arrived in a certain bucket as concurrent, the queue proxy now reports the actual maximum concurrency that was present inside of one bucket. If for example 3 requests arrive at once, the maximum concurrency will be 3 for that bucket. If another arises while the 3 remaining are still open, maximum concurrency is 4. Closing requests results in a decrement on the concurrency immediatly (versus draining the outgoing request queue on quantization which results in the behavior described above). Closes #1060

markusthoemmes · 2018-06-13T17:49:31Z

/retest

josephburnett · 2018-06-13T17:54:49Z

pkg/queue/stats.go

-	// Ticks with every request completed
-	ReqOutChan chan Poke
+	// Ticks with every request arrived/completed respectively
+	ReqChan chan interface{}


Nit: chan interface{} seems too open to me. You can throw anything on the channel and the compiler won't tell you there's a problem. But binary will crash when the switch statement has no match.

How about an enumerated type instead?

type StatEvent int const ( ReqIn StatEvent = iota ReqOut ) ReqChan chan ReqEvent switch event { case ReqIn: case ReqOut: }

josephburnett · 2018-06-13T18:33:36Z

/lgtm
/approve

mattmoor

/approve

for ./config/

google-prow-robot · 2018-06-13T19:22:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: josephburnett, markusthoemmes, mattmoor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mattmoor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-prow-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 7, 2018

markusthoemmes commented Jun 7, 2018

View reviewed changes

google-prow-robot assigned josephburnett Jun 7, 2018

google-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 8, 2018

josephburnett suggested changes Jun 11, 2018

View reviewed changes

markusthoemmes added 4 commits June 13, 2018 19:31

Fix edge case and add a specific test.

2465551

Unify channels and add buffer again.

9d715be

Update config documentation.

6fb8fbe

josephburnett approved these changes Jun 13, 2018

View reviewed changes

Adjust channel typing.

f7828f7

google-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 13, 2018

mattmoor approved these changes Jun 13, 2018

View reviewed changes

google-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2018

google-prow-robot merged commit baf4b24 into knative:master Jun 13, 2018

josephburnett mentioned this pull request Jun 5, 2019

Add markusthoemmes as Scaling Working Group Lead. knative/community#11

Merged

skonto added a commit to skonto/serving that referenced this pull request Apr 12, 2022

apply quota (knative#1091)

5686ce8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve concurrent request bucketing in queue-proxy. #1091

Improve concurrent request bucketing in queue-proxy. #1091

markusthoemmes commented Jun 7, 2018

googlebot commented Jun 7, 2018

markusthoemmes Jun 7, 2018

dprotaso Jun 8, 2018 •

edited

Loading

markusthoemmes Jun 8, 2018

josephburnett Jun 11, 2018

markusthoemmes commented Jun 7, 2018

rootfs commented Jun 7, 2018

markusthoemmes commented Jun 8, 2018

googlebot commented Jun 8, 2018

josephburnett left a comment

josephburnett Jun 11, 2018

markusthoemmes commented Jun 13, 2018

josephburnett Jun 13, 2018 •

edited

Loading

josephburnett commented Jun 13, 2018

mattmoor left a comment

google-prow-robot commented Jun 13, 2018

Improve concurrent request bucketing in queue-proxy. #1091

Improve concurrent request bucketing in queue-proxy. #1091

Conversation

markusthoemmes commented Jun 7, 2018

Proposed Changes

googlebot commented Jun 7, 2018

What to do if you already signed the CLA

Individual signers

Corporate signers

markusthoemmes Jun 7, 2018

Choose a reason for hiding this comment

dprotaso Jun 8, 2018 • edited Loading

Choose a reason for hiding this comment

markusthoemmes Jun 8, 2018

Choose a reason for hiding this comment

josephburnett Jun 11, 2018

Choose a reason for hiding this comment

markusthoemmes commented Jun 7, 2018

rootfs commented Jun 7, 2018

markusthoemmes commented Jun 8, 2018

googlebot commented Jun 8, 2018

josephburnett left a comment

Choose a reason for hiding this comment

josephburnett Jun 11, 2018

Choose a reason for hiding this comment

markusthoemmes commented Jun 13, 2018

josephburnett Jun 13, 2018 • edited Loading

Choose a reason for hiding this comment

josephburnett commented Jun 13, 2018

mattmoor left a comment

Choose a reason for hiding this comment

google-prow-robot commented Jun 13, 2018

dprotaso Jun 8, 2018 •

edited

Loading

josephburnett Jun 13, 2018 •

edited

Loading