-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dedicated TChannel Client Per Callee Service #778
Conversation
ServerTChannelRouter *TChannelRouter | ||
TChannelSubLoggerLevel zapcore.Level | ||
Tracer opentracing.Tracer | ||
JSONWrapper jsonwrapper.JSONWrapper | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff here might make it hard to understand what has changed in the struct. Refer below
Channel -> .ServerTChannel
TChannelRouter -> ServerTChannelRouter
+ TChannelSubLoggerLevel
+ ClientTChannels
@@ -800,6 +803,7 @@ func (gateway *Gateway) setupTChannel(config *StaticConfig) error { | |||
if !ok { | |||
return errors.Errorf("unknown sub logger level for tchannel server: %s", levelString) | |||
} | |||
gateway.TChannelSubLoggerLevel = level | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required in case dedicated.tchannel.client: true
in which case we need this config while creating clients in init phase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: How is logger level related to TChannel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how.
- When we create a new TChannel, we need to specify a
tchannel.Logger
. - To pass this, we create a new TChannelLogger value.
- To create this value, we create a sublogger based on
gateway.logger
which essentially clones the original logger with a different level (since we might not want verbose TChannel logs) - The level we pass to this logger
debug|info|...
is determined bygateway. TChannelSubLoggerLevel
gateway.TChannelSubLoggerLevel
is populated bysubLoggerLevel.tchannel: <>
in the yaml
func NewTChannelLogger(logger *zap.Logger) tchannel.Logger {}
func (gateway *Gateway) SubLogger(name string, level zapcore.Level) *zap.Logger {}
@@ -62,7 +62,7 @@ func TestCallMetrics(t *testing.T) { | |||
}, | |||
) | |||
|
|||
numMetrics := 12 | |||
numMetrics := 13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we added new gauge tchannel.clients.running
7f3728c
to
4ee84ed
Compare
runtime/gateway.go
Outdated
@@ -422,7 +425,7 @@ func (gateway *Gateway) Shutdown() { | |||
swg.Add(1) | |||
go func() { | |||
defer swg.Done() | |||
if err := gateway.shutdownTChannelServer(ctx); err != nil { | |||
if err := gateway.shutdownTChannelServerAndClients(ctx); err != nil { | |||
ec <- errors.Wrap(err, "error shutting down tchannel server") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error string can also be improved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
||
service.shadow.env.override.enable: true | ||
dedicated.tchannel.client: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: How have we handled the default behaviour? Does it remain the same? Should the example be for default / old behaviour or new behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Default behaviour: Same as before
Example-gateway
example has the new behaviour.Selective-gateway
example has the default behaviour
Refactor log string
var channel *tchannel.Channel | ||
|
||
// If dedicated.tchannel.client : true, each tchannel client will create a | ||
// dedicated connection with local Muttley, else it will use a shared connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's not mention "local Muttley" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Missed it here. Changed.
} else { | ||
channel = createNewTchannelForClient(deps, serviceName) | ||
channel.Peers().Add(ip + ":" + strconv.Itoa(int(port))) | ||
gateway.ClientTChannels[serviceName] = channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ok given one service can have multiple clients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't get it. The clients is the local tchannel client for each backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, this is the required behaviour. For e.g presentation services where multiple tchannel clients talk to the same presentation backend.
runtime/gateway.go
Outdated
for serviceName, clientTchannel := range gateway.ClientTChannels { | ||
go func(service string, ch *tchannel.Channel) { | ||
gateway.Logger.Info(fmt.Sprintf("Closing tchannel client for [%v]", service)) | ||
ch.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could lead to a situation where client is closed but server is not? Leading to failures in pending outbound calls during shtudown (similar to #764).
Can we close all the servers first, wait for their server.close() to complete, and then shutdown clients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Updating
@@ -180,7 +194,35 @@ func {{$exportName}}(deps *module.Dependencies) Client { | |||
} | |||
} | |||
|
|||
func initializeDynamicChannel(deps *module.Dependencies, headerPatterns []string, altChannelMap map[string]*tchannel.SubChannel, re ruleengine.RuleEngine) ([]string, ruleengine.RuleEngine) { | |||
func createNewTchannelForClient(deps *module.Dependencies, serviceName string) *tchannel.Channel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't depend on the template so let's move this out so you don't have to generated this several times during the code-gen step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably someplace here: https://github.com/uber/zanzibar/blob/master/runtime/tchannel_client.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will lead to an explosion of parameters in the tchannel_client.go
as it needs a tracer, sub logger etc.
I found a better place for it in gateway.go
which will also clean up the defaultDependencies struct
a bit and would avoid repeated codegen. Thanks for the input.
Summary
Internal JIRA: https://t3.uberinternal.com/browse/EDGE-326 (ERD attached in the ticket)
Release 1.0.0 - 2021-08-05
Changed
BREAKING
gateway.Channel
has been renamed togateway.ServerTChannel
to distinguish between client and server TChannels.BREAKING
gateway.TChannelRouter
has been renamed togateway.ServerTChannelRouter
Added
A new boolean flag
dedicated.tchannel.client
is introduced. When set totrue
, each TChannel client creates a dedicated connection instead of registering on the default shared server connection.Clients need not do anything as the default behaviour is to use a shared connection.(#778)
A new field has been added to the gateway to propagate the
TChannelSubLoggerLevel
for clients with dedicated connections.Reviewer Notes
Description
This PR adds / changes the following
Changes the way Edge
TChannel
Clients connect with the sidecar. Instead of using a common gateway channel and router, we create dedicated channels for each client. This is done based on the flagdedicated.tchannel.client: true
Shutdown hooks now include the shutdown for clients in a separate goroutine.
Refactor existing gateway channel and router to more readable names
Does not change templates for tchannel endpoints. They still get registered on the common server router. Only clients get new routers.
Fixes some formatting consistencies and typos.
Motivation & Context
Current TChannel clients speak like below
Maintaining a dedicated connection ensures that in the event of a misbehaving downstream, requests to other “healthy” tchannel clients do not time out. Only the backlog requests in the queue for the unhealthy service would be affected if the timeout expires. This leads to a more fault tolerant request layer in Edge. As a Tier 0 service, this offers significant platform stability.
Since each queue would be separate and multiple router-sidecar goroutines would be reading requests in parallel, we expect good improvements in request latency.
Slight although insignificant improvements may be seen in CPU (more utilization) and memory (reduced pressure of requests in the queue).
How was this tested?
Tested this version of Zanzibar in Uber's Edge Gateway and verifying that
Tested that conditional client generation works and when
dedicated.tchannel.client: false | nil
, the existing behaviour remains trueBenchmarks
Verdict: No considerable change in op/sec. Zero change in memory allocs
Branch
Master