You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In production environment, server clusters are necessary to handle large amount of workloads with ensuring high availability, reliability, and scalability.
Current Yorkie server supports server cluster mode, which is based on broadcasting, pub/sub, and distributed lock between servers. This architecture can handle certain amount of workloads, but there are limitations.
Broadcast and pub/sub overheads: Because all servers have to communicate to each other with broadcast and pub/sub to share workload states, complexity, latency, and performance overheads will grow as more servers are created to handle workloads.
Distributed lock overhead: Because all servers have to compete to acquire lock to use same workload data, complexity, latency, and performance overheads will grow as more servers are created to handle workloads.
Because of these limitations, broadcast based cluster mode is not enough for production environment.
The root cause of these limitations is the fact that because workloads are distributed throughout all server clusters, additional synchronization throughout all servers is needed.
To solve this, we can assign each server to process same workloads to avoid multiple servers accessing to same data, and put lookup system to route same workloads to same servers.
(Example of lookup cluster mode in Yorkie)
By introducing lookup based cluster mode, we can reduce/remove additional overheads needed for workload synchronization. And become capable of handling large amount of workloads with ensuring high availability, reliability, and scalability.
Now we discussed about what and why lookup cluster mode is needed, let’s discuss more about how to design lookup cluster mode and how to implement this cluster mode.
LookUp Based Cluster Mode Design
There are some considerations to be considered to design lookup based cluster mode.
1. Workload Unit & Server Mapping Strategy & Hash Function
More details will be updated as we discuss.
Firstly, to assign each server to process same workloads, we need to define what data type is workload unit in Yorkie, and then determine mapping strategy to evenly distribute workload unit.
Because document is the basic unit of collaborative editing, we can define document as workload unit in Yorkie. We can confirm that document is workload unit because server cluster had to communicate each other to share and sync document’s states and data.
This means that if we assign each server to process same documents, we no longer need to share and sync states and data between server cluster.
But what about project as a unit of workload unit? We can use project is workload unit because document is subset of project.
But there is problem when we think of this scenario: when user A is in project A, and user A wants to attach to project B’s document, servers need to communicate to share document’s state and data. project can also be workload, but documents in certain project may be large workloads(traffic), which will burden server that is handling that project.
Now we have choose workload unit as document, let’s determine mapping strategy to evenly distribute workload document.
Choosing mapping strategy for lookup system is crucial, because this will determine performance, efficiency, and scalability of cluster. A cluster with the best possible hardware and infrastructure can be bottlenecked by the choice of key.
There are many mapping strategy, but commonly used algorithm for mapping based on request’s metadata(document) is consistent hashing.
Consistent hashing is a technique used to map a range of input values (such as request metadata) to a corresponding range of output values (such as server IDs).
The basic idea of consistent hashing is to assign each server to a point on a circular ring or continuum of hash values. When a request arrives, its hash value is computed, and the server responsible for that hash value is determined by finding the next server on the ring, starting from the point where the hash value falls.
This ensures that nearby hash values will be assigned to the same server, providing a degree of consistency in server assignments.
As you can see above, computed hash value is mapped to closest server in clockwise direction, therefore k0(keyo) is mapped to s4(server 4). Also, even when s4 fails, k0 can be mapped to s0(server 0), this mechanism is helpful when server scale out/scale in, or failover.
Therefore, by setting workload unit as document and use consistent hashing for server mapping, we can evenly distribute workload with consistency in server assignments. There can be more specific implementations of ring hash, but I will skip that for now.
Now we have determined mapping strategy as consistent hashing with hash function parameter as document, let’s discuss about which document field we should use for actual hash function parameter, and what hash function to use.
Based on the current implementation of Yorkie, we can use two fields: Key, and ID
Key: Key is internally name string, and current Yorkie use this field in API.
ID: ID is ObjectID generated by database(ex: mongoDB). This field is not used in current Yorkie API parameter, and only used in internal Yorkie server and DB.
For now, let’s use Key for hash function parameter, because Key is currently used in Yorkie API.
Next, let’s choose hash function. There are many hash functions that we can use(ex: MD5, SHA-1, MurmurHash, Jenkins Hash). To accomplish evenly distribute(fine-grained) using Key(name), we need to benchmark all hash functions with Key(name) parameter and decide which hash function to use. This will be done in further updates.
2. API Compatiability & LookUp Strategy
More details will be updated as we discuss.
Secondly, we need to consider API compatiability with lookup cluster mode. All APIs in Yorkie will successfully compatiable with cluster mode except one API; watchDocuments(), which is responsible for mutli-document watching.
If client is watching multi document handled by same server, we can use single server’s watchDocuments() API with no problem.
But If client is watching multi document handled by mutliple servers, we need to change API to watchdocument(), or introduce different strategy to cluster mode. This will appear again when we discuss about lookup strategy.
Now let’s change our topic and talk about lookup strategy.
In modern cloud environment, servers(containers) are dynamically created. Therefore, new kind of lookup system is introduced to discover dynamically created servers. We call this new lookup system, a service discovery.
Considering Yorkie’s production environment, it is better to implement dynamic service discovery then static lookup system.
There are two types of service discovery; client side discovery & server side discovery.
(Client side discovery)
In client side discovery, services register themselves to service registry. After that, client ask service registry where to connect and get response of service location to connect. Now client can connect to service based on the location provided by service registry.
In this scenario, we can use leader/follower node strategy. In leader/follower, leader node is responsible for routing to follower nodes based on service registry’s information. If leader dies, another follower will become leader node based on consensus of nodes(This strategy can be achieved by etcd Raft consensus algorithm).
(Server side discovery)
On the other hand, in server side discovery, there is proxy server(load balancer) in front of services. After services register themselves to service registry, proxy server can get service locations from service registry. Now when client connect to proxy server, proxy server will route to proper service based on service registry’s information.
Now let’s bring API problem and relate to service discovery.
In client side discovery, it is hard to establish stream connection with two services at once if we use watchdocuments() API. Therefore, API will be changed to watchdocument(). This will cause client’s network code more complex then before.
In server side discovery, we can aggregate multiple stream connection from multiple services and provide client single stream connection. Therefore, we can still use watchdocuments() API, which will keep our API compatibility. Also we can keep our client’s network code simple.
For now, it would be better to use server side discovery(proxy) then client side discovery to our lookup system design considering the benefits of using proxy, like routing, accomplish API compatiability by stream aggregating, protocol conversion, logging, etc.
Also, using leader/follower strategy(client side discovery) may cause overheads of lock aquire between servers to become leader, and this overhead will increase as number of servers increase(more servers try to aquire lock).
3. Server Addition/Removal Strategy
More details will be updated as we discuss.
Lastly, we need to consider the circumstances of server scaling up or server scaling down, or some system failure. In this scenario, servers needs to be added or removed.
To successfully handle these changes, we need server addition/removal strategy.
We need to think carefully because we are using hashed key for routing to servers, so we have to maintain previous hashed key & server mapping the same when there are server addition/removal.
As we discussed earlier, the answer to solve this problem, is again consistent hashing. By using ring hash, we can keep our previous hash key mapping the same and successfully add/remove servers. For more information, see 1. Workload Unit & Server Mapping Strategy & Hash Function.
To conclude, we can maintain mapping consistency in server addition/removal by using consistent hashing.
4. System Design & Architecture
More details will be updated as we discuss.
Considering all these factors mentioned above, we can now design lookup system architecture.
(Yorkie Lookup Cluster Mode Architecture)
Yorkies: Yorkies is router(proxy) responsible for two tasks.
Routing based on request: Yorkies receives requests from client and route to server based on computed request’s hash key and ring hash algorithm.
Stream aggregation: Yorkies recieves two watch API streams from different servers and aggregate into one stream.
Yorkie service registry: Service registry is responsible for storing metadata and configuration settings of yorkie servers just like mongoDB’s config servers.
Yorkie Service(s): Services to process specific workloads. There can be two type of service structure.
Single Server: Single Yorkie server will process workload.
Cluster Server(Broadcast Mode): Yorkie cluster will process heavy workloads. We can reuse broadcast based yorkie cluster mode to build cluster.
mongo: mongoDB is responsible for sharding and persisting data. There are several components in mongo system.
mongos: mongos is router responsible for routing data to correct shard based on data’s hash key.
shard: mongod is actual database responsible for persisting incomding data.
config servers: Config servers are responsible for storing metadata and configuration settings for the shard. mongos will use this information for routing.
From this architecture, we can implement solid lookup based cluster mode capable of handling large amount of workloads with ensuring high availability, reliability, and scalability.
LookUp Based Cluster Mode Implementation
1. Tasks Needed for Lookup Cluster Mode Implementation
Based on the architecture above, This is list of tasks that are needed to implement lookup cluster mode in Yorkie.
Tasks for Yorkie services(Servers)
Self-register API for register itself to service registry
DB Schema reconsturction for collection sharding
Tasks for mongoDB(Database)
Adminstration mongoDB sharding configuration for sharding
Tasks for Yorkies(Proxy/Router)
Configuration for polling server endpoints from service registry
Configuration for ring hash and custom hash function
Configuration for protocol conversion between HTTP and gRPC
Configuration for stream aggregation, multiplexing to aggregate watchDocuments() API
Tasks for Yorkie Service Registry(Service Registry)
Configuration for service discovery management
API for Yorkie services to register
Task 3 and 4 can be combined, and easily implemented by using Kubernetes and Istio. More specifically, by using Istio’s envoy proxy sidecar, and Istio Pilot for service registry, we can easily implement Task 3 and 4.
(Istio Architecture)
Now we have listed our tasks to implement lookup cluster mode, let’s talk about how we actually going to implement lookup based cluster mode.
After creating mongoDB containers, we also need to configure mongod’s to create sharded cluster and shard collections. Here is some examples of configuration.
This is PoC of our design with Kubernetes and Istio, so implementations below are for PoC/prototype only.
1. PoC Implementation Architecture
Below is internal implementation architecture for lookup cluster mode based on K8s and Istio.
Based on this architecture, let’s implement our design using K8s and Istio.
1. Implementing Yorkies and Service Registry & Server Mapping Strategy(Hash Function)
Actually, we can easily implement Yorkies and Service Registry by using Istio's Ingress Gateway(envoy), envoy sidecar injection as Yorkies, and Istio Pilot for service registry. When we deploy Yorkie cluster pods, Istio will automatically register pods and configure traffic routes for us.
We can define Gateway, VirtualService, and DestinationRule for configuring envoy inside ingress gateway and envoy sidecar.
Especially, we can configure consistent hashing in DestinationRule, and make it easy to implement ring hash based lookup system in our cluster.
As you can see, we can define each HTTP header name to use as hash function parameter. For now, we used x-api-key which is already included in HTTP header, for PoC.
Also, we can use Istio’s grpc-webappProtocol to configure envoy’s gRPC-web filter for HTTP → gRPC protocol conversion needed for web SDK.
Like this, If we use K8s/Istio for actual infrastructure, many considerations for server mapping strategy(hash function) can be resolved.
When we use Istio’s ringHash in ConsistentHashLB, Istio will perform consistent hashing for both server endpoints and request for us.
Internally, Istio will use Ketama hashing for endpoint hashing, and xxHash for HTTP header hashing. Both two algorithms are well-known algorithms for evenly distributing hash values.
Therefore, it will be a good idea to use Istio’s consistent hashing algorithm, and test them repeatly by measuring workload distribution repeatly, and optimize document key(hash paremeter).
2. Implementing Stream Aggregation
When we use K8s/Istio, many implementations can be eased, but there are some problem for implementing proxy for aggregating streams.
If we use proxy server like envoy, it seems that it is not possible to use envoy configuration to aggreagte several streams into one.
Therefore, we need to build aggegator server to perform stream aggregation written in go.
funcaggregateStreams(stream1 yorkie.WatchDocumentsResponse, stream2 yorkie.WatchDocumentsResponse, stream myservice.MyService_AggregateStreamsServer) {
for {
select {
caseres1, ok:=<-stream1.RecvCh():
ifok {
// Send the response from stream1 to the client
} else {
// Stream1 has closed
}
caseres2, ok:=<-stream2.RecvCh():
ifok {
// Send the response from stream2 to the client
} else {
// Stream2 has closed
}
}
}
}
func (s*server) AggregateStreams(stream yorkie.AggregateStreamsServer) error {
stream1, err:=s.client.Stream1(context.Background(), &yorkie.WatchDocumentsRequest{})
iferr!=nil {
// Handle error
}
stream2, err:=s.client.WatchDocuments(context.Background(), &yorkie.WatchDocumentsRequest{})
iferr!=nil {
// Handle error
}
goaggregateStreams(stream1, stream2, stream)
// Wait for both streams to close<-stream1.Done()
<-stream2.Done()
returnnil
}
After that, we can register aggreagator servers to K8s, or Istio’s ServiceEntry, and use Istio’s envoy sidecar, or Egress Gateway to send stream response to aggreagtor server.
3. Considering Server Behavior on Server Addtion/Removal
As we discussed above, we can use consistent hashing to resolve this problem. But still, there are some consideration to be considered.
When we use Istio, Istio Pilot will detect server addtion/removal, or even failure from envoy sidecar of each pods, and perform endpoint updates and route traffic to new endpoint.
Now Istio routed traffic to new endpoint, we need to think about how we going to keep consistency on workload when workload is performed from new server.
Currrently Yorkie server is persisting data on two places.
in-memory: holds Client Metadata(Peer awarenes), docEvent data for WatchDocuments().
mongoDB: holds all other data.
We can keep data consistency of data that is stored in DB, but what about in-memory data? We need to consider how to move in-memory data from pervious server to new server.
(WIP)
4. Yorkie Cluster PoC/Prototype Repository
Below is repository for PoC implementation we mentioned above. There are two implementations for lookup cluster mode.
krapie
changed the title
Introducing Lookup Based Cluster Mode to Yorkie Servers
Introducing Shard(Lookup-based) Cluster Mode to Yorkie Servers
Apr 17, 2023
What would you like to be added:
Add sharded(lookup-based) cluster mode to Yorkie servers.
Why is this needed:
To overcome current Yorkie cluster mode's limitations and make Yorkie server capable of handling large amount of workloads in production environment.
For more information, follow below:
Introducing Lookup Based Cluster Mode to Yorkie Servers
Table of Contents
Introduction to Lookup Based Cluster Mode
In production environment, server clusters are necessary to handle large amount of workloads with ensuring high availability, reliability, and scalability.
Current Yorkie server supports server cluster mode, which is based on broadcasting, pub/sub, and distributed lock between servers. This architecture can handle certain amount of workloads, but there are limitations.
Because of these limitations, broadcast based cluster mode is not enough for production environment.
The root cause of these limitations is the fact that because workloads are distributed throughout all server clusters, additional synchronization throughout all servers is needed.
To solve this, we can assign each server to process same workloads to avoid multiple servers accessing to same data, and put lookup system to route same workloads to same servers.
(Example of lookup cluster mode in Yorkie)
By introducing lookup based cluster mode, we can reduce/remove additional overheads needed for workload synchronization. And become capable of handling large amount of workloads with ensuring high availability, reliability, and scalability.
Now we discussed about what and why lookup cluster mode is needed, let’s discuss more about how to design lookup cluster mode and how to implement this cluster mode.
LookUp Based Cluster Mode Design
There are some considerations to be considered to design lookup based cluster mode.
1. Workload Unit & Server Mapping Strategy & Hash Function
Firstly, to assign each server to process same workloads, we need to define what data type is workload unit in Yorkie, and then determine mapping strategy to evenly distribute workload unit.
Because
document
is the basic unit of collaborative editing, we can definedocument
as workload unit in Yorkie. We can confirm thatdocument
is workload unit because server cluster had to communicate each other to share and syncdocument
’s states and data.This means that if we assign each server to process same
documents
, we no longer need to share and sync states and data between server cluster.But what about
project
as a unit of workload unit? We can use project is workload unit becausedocument
is subset ofproject
.But there is problem when we think of this scenario: when user A is in project A, and user A wants to attach to project B’s document, servers need to communicate to share document’s state and data.project
can also be workload, butdocuments
in certainproject
may be large workloads(traffic), which will burden server that is handling thatproject
.Now we have choose workload unit as document, let’s determine mapping strategy to evenly distribute workload
document
.Choosing mapping strategy for lookup system is crucial, because this will determine performance, efficiency, and scalability of cluster. A cluster with the best possible hardware and infrastructure can be bottlenecked by the choice of key.
There are many mapping strategy, but commonly used algorithm for mapping based on request’s metadata(
document
) is consistent hashing.Consistent hashing is a technique used to map a range of input values (such as request metadata) to a corresponding range of output values (such as server IDs).
The basic idea of consistent hashing is to assign each server to a point on a circular ring or continuum of hash values. When a request arrives, its hash value is computed, and the server responsible for that hash value is determined by finding the next server on the ring, starting from the point where the hash value falls.
This ensures that nearby hash values will be assigned to the same server, providing a degree of consistency in server assignments.
As you can see above, computed hash value is mapped to closest server in clockwise direction, therefore
k0(keyo)
is mapped tos4(server 4)
. Also, even whens4
fails,k0
can be mapped tos0(server 0)
, this mechanism is helpful when server scale out/scale in, or failover.Therefore, by setting workload unit as
document
and use consistent hashing for server mapping, we can evenly distribute workload with consistency in server assignments. There can be more specific implementations of ring hash, but I will skip that for now.Now we have determined mapping strategy as consistent hashing with hash function parameter as
document
, let’s discuss about whichdocument
field we should use for actual hash function parameter, and what hash function to use.Based on the current implementation of Yorkie, we can use two fields:
Key
, andID
Key
: Key is internally name string, and current Yorkie use this field in API.ID
: ID is ObjectID generated by database(ex: mongoDB). This field is not used in current Yorkie API parameter, and only used in internal Yorkie server and DB.For now, let’s use
Key
for hash function parameter, because Key is currently used in Yorkie API.Next, let’s choose hash function. There are many hash functions that we can use(ex: MD5, SHA-1, MurmurHash, Jenkins Hash). To accomplish evenly distribute(fine-grained) using
Key(name)
, we need to benchmark all hash functions withKey(name)
parameter and decide which hash function to use. This will be done in further updates.2. API Compatiability & LookUp Strategy
Secondly, we need to consider API compatiability with lookup cluster mode. All APIs in Yorkie will successfully compatiable with cluster mode except one API;
watchDocuments()
, which is responsible for mutli-document watching.If client is watching multi document handled by same server, we can use single server’s
watchDocuments()
API with no problem.But If client is watching multi document handled by mutliple servers, we need to change API to
watchdocument()
, or introduce different strategy to cluster mode. This will appear again when we discuss about lookup strategy.Now let’s change our topic and talk about lookup strategy.
In modern cloud environment, servers(containers) are dynamically created. Therefore, new kind of lookup system is introduced to discover dynamically created servers. We call this new lookup system, a service discovery.
Considering Yorkie’s production environment, it is better to implement dynamic service discovery then static lookup system.
There are two types of service discovery; client side discovery & server side discovery.
(Client side discovery)
In client side discovery, services register themselves to service registry. After that, client ask service registry where to connect and get response of service location to connect. Now client can connect to service based on the location provided by service registry.
In this scenario, we can use leader/follower node strategy. In leader/follower, leader node is responsible for routing to follower nodes based on service registry’s information. If leader dies, another follower will become leader node based on consensus of nodes(This strategy can be achieved by etcd Raft consensus algorithm).
(Server side discovery)
On the other hand, in server side discovery, there is proxy server(load balancer) in front of services. After services register themselves to service registry, proxy server can get service locations from service registry. Now when client connect to proxy server, proxy server will route to proper service based on service registry’s information.
Now let’s bring API problem and relate to service discovery.
In client side discovery, it is hard to establish stream connection with two services at once if we use
watchdocuments()
API. Therefore, API will be changed towatchdocument()
. This will cause client’s network code more complex then before.In server side discovery, we can aggregate multiple stream connection from multiple services and provide client single stream connection. Therefore, we can still use
watchdocuments()
API, which will keep our API compatibility. Also we can keep our client’s network code simple.For now, it would be better to use server side discovery(proxy) then client side discovery to our lookup system design considering the benefits of using proxy, like routing, accomplish API compatiability by stream aggregating, protocol conversion, logging, etc.
Also, using leader/follower strategy(client side discovery) may cause overheads of lock aquire between servers to become leader, and this overhead will increase as number of servers increase(more servers try to aquire lock).
3. Server Addition/Removal Strategy
Lastly, we need to consider the circumstances of server scaling up or server scaling down, or some system failure. In this scenario, servers needs to be added or removed.
To successfully handle these changes, we need server addition/removal strategy.
We need to think carefully because we are using hashed key for routing to servers, so we have to maintain previous hashed key & server mapping the same when there are server addition/removal.
As we discussed earlier, the answer to solve this problem, is again consistent hashing. By using ring hash, we can keep our previous hash key mapping the same and successfully add/remove servers. For more information, see 1. Workload Unit & Server Mapping Strategy & Hash Function.
To conclude, we can maintain mapping consistency in server addition/removal by using consistent hashing.
4. System Design & Architecture
Considering all these factors mentioned above, we can now design lookup system architecture.
(Yorkie Lookup Cluster Mode Architecture)
Yorkies
: Yorkies is router(proxy) responsible for two tasks.Yorkie service registry
: Service registry is responsible for storing metadata and configuration settings of yorkie servers just like mongoDB’s config servers.Yorkie Service(s)
: Services to process specific workloads. There can be two type of service structure.mongo
: mongoDB is responsible for sharding and persisting data. There are several components in mongo system.mongos
: mongos is router responsible for routing data to correct shard based on data’s hash key.shard
: mongod is actual database responsible for persisting incomding data.config servers
: Config servers are responsible for storing metadata and configuration settings for the shard.mongos
will use this information for routing.From this architecture, we can implement solid lookup based cluster mode capable of handling large amount of workloads with ensuring high availability, reliability, and scalability.
LookUp Based Cluster Mode Implementation
1. Tasks Needed for Lookup Cluster Mode Implementation
Based on the architecture above, This is list of tasks that are needed to implement lookup cluster mode in Yorkie.
watchDocuments()
APITask 3 and 4 can be combined, and easily implemented by using Kubernetes and Istio. More specifically, by using Istio’s envoy proxy sidecar, and Istio Pilot for service registry, we can easily implement Task 3 and 4.
(Istio Architecture)
Now we have listed our tasks to implement lookup cluster mode, let’s talk about how we actually going to implement lookup based cluster mode.
Lookup Cluster Mode Implementation (docker-compose)
This is PoC of our design with docker-compose, so implementations below are for PoC/prototype only.
1. Implementing Yorkies
There are several open source tools that can be used to implement Yorkies(proxy).
envoy
,HAProxy
,nginx
, etc…First, let’s consider using
envoy
.envoy
is open source l7 proxy which can perform Yorkies tasks mentioned above.envoy
uses consistent hashing(ring hash) load balancing algorthim to route to specific service.envoy
also supports stream aggreator filter and stream multiplexing feature to aggreate mutliple streams and sends single stream to client.envoy
also supports gRPC HTTP/2 bridge feature which is needed for web-based gRPC between web clients.This is an example of
envoy
configuration file that performs ring hash LB based on hash key computed by request’sx-api-key
2. Implementing Service Registry
There are several open source tools that can be used to implement servie registry.
etcd
,zookeeper
,euroka
, etc…But, to use
envoy
's EDS(endpoint discovery service), we have to implement our own EDS server using envoy EDS client.This is an example of
xds-server
configuration file that stores address of services, and provides endpoint for fetching available services.This is an example of
envoy
's configuration file that performs ring hash routing combined withxds-server
's service registry usingenvoy
's EDS.3. Creating mongoDB Sharded cluster
We are using
mongoDB
as Yorkie’s data store. To correspond with Yorkie server clusters large queries, we need to shard mongoDB and build cluster too.This is an example of
mongoDB
docker-compose file consisting of 2mongos
, 3shards
, and 1config server
.After creating
mongoDB
containers, we also need to configuremongod
’s to create sharded cluster and shard collections. Here is some examples of configuration.3. Lookup Cluster Mode Implementation (K8s & Istio)
This is PoC of our design with
Kubernetes
andIstio
, so implementations below are for PoC/prototype only.1. PoC Implementation Architecture
Below is internal implementation architecture for lookup cluster mode based on K8s and Istio.
Based on this architecture, let’s implement our design using K8s and Istio.
1. Implementing Yorkies and Service Registry & Server Mapping Strategy(Hash Function)
Actually, we can easily implement Yorkies and Service Registry by using Istio's
Ingress Gateway(envoy)
,envoy sidecar injection
asYorkies
, andIstio Pilot
for service registry. When we deploy Yorkie cluster pods, Istio will automatically register pods and configure traffic routes for us.We can define
Gateway
,VirtualService
, andDestinationRule
for configuringenvoy
inside ingress gateway and envoy sidecar.Especially, we can configure consistent hashing in
DestinationRule
, and make it easy to implement ring hash based lookup system in our cluster.As you can see, we can define each HTTP header name to use as hash function parameter. For now, we used
x-api-key
which is already included in HTTP header, for PoC.Also, we can use
Istio
’sgrpc-web
appProtocol to configureenvoy
’sgRPC-web
filter forHTTP
→gRPC
protocol conversion needed for web SDK.Also, we can configure
corsPolicy
inVirtualService
to configureCORS
forgRPC-web
to be accessed externally.Like this, If we use
K8s
/Istio
for actual infrastructure, many considerations for server mapping strategy(hash function) can be resolved.When we use Istio’s
ringHash
in ConsistentHashLB, Istio will perform consistent hashing for both server endpoints and request for us.Internally, Istio will use Ketama hashing for endpoint hashing, and xxHash for HTTP header hashing. Both two algorithms are well-known algorithms for evenly distributing hash values.
Therefore, it will be a good idea to use Istio’s consistent hashing algorithm, and test them repeatly by measuring workload distribution repeatly, and optimize
document key
(hash paremeter).2. Implementing Stream Aggregation
When we use
K8s
/Istio
, many implementations can be eased, but there are some problem for implementing proxy for aggregating streams.If we use proxy server like
envoy
, it seems that it is not possible to use envoy configuration to aggreagte several streams into one.Therefore, we need to build
aggegator server
to perform stream aggregation written in go.After that, we can register aggreagator servers to
K8s
, orIstio
’sServiceEntry
, and useIstio
’senvoy sidecar
, orEgress Gateway
to send stream response to aggreagtor server.3. Considering Server Behavior on Server Addtion/Removal
As we discussed above, we can use consistent hashing to resolve this problem. But still, there are some consideration to be considered.
When we use
Istio
,Istio Pilot
will detect server addtion/removal, or even failure fromenvoy sidecar
of each pods, and perform endpoint updates and route traffic to new endpoint.Now
Istio
routed traffic to new endpoint, we need to think about how we going to keep consistency on workload when workload is performed from new server.Currrently Yorkie server is persisting data on two places.
in-memory
: holdsClient Metadata(Peer awarenes)
,docEvent
data forWatchDocuments()
.mongoDB
: holds all other data.We can keep data consistency of data that is stored in DB, but what about in-memory data? We need to consider how to move in-memory data from pervious server to new server.
(WIP)
4. Yorkie Cluster PoC/Prototype Repository
Below is repository for PoC implementation we mentioned above. There are two implementations for lookup cluster mode.
docker-compose
K8s
&Istio
withminikube
https://github.com/krapie/yorkie-cluster
5. Actual Infrastuctural Implementation
Actual implementation will be on Kubernetes & Istio. As I mentioned above.
The text was updated successfully, but these errors were encountered: