Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service discovery #557

Merged
merged 1 commit into from
Mar 12, 2019
Merged

Service discovery #557

merged 1 commit into from
Mar 12, 2019

Conversation

jskswamy
Copy link
Contributor

@jskswamy jskswamy commented Aug 28, 2018

Introduce a separate command to register stolon slave and master information for service discovery, this will potentially solve resolve Allow clients to connect to standby replicas in RO mode and improve performance by connecting to the master directly without the need for stolon-proxy

This commit has proof of concept for registering the stolon master and proxy to consul for service discovery,

@sgotti
Copy link
Member

sgotti commented Aug 28, 2018

@jskswamy I'm not sure I understand the rationale behind this but the proxy cannot be removed, it's the critical point that avoids clients talking with the wrong primary in case of partitioning. It's explained in the architecture doc and the FAQ.

@wchrisjohnson
Copy link
Contributor

@sgotti this isn’t my PR!!

@sgotti
Copy link
Member

sgotti commented Aug 28, 2018

@wchrisjohnson sorry, I meant @jskswamy

@jskswamy
Copy link
Contributor Author

jskswamy commented Aug 28, 2018

Introduce a separate command to register stolon slave and master information for service discovery. This will potentially solve resolve Allow clients to connect to standby replicas in RO mode and improve performance by connecting to the master directly without the need for stolon-proxy.

We ran benchmark tests by connecting directly to the master without proxy and we saw significant improvement in Postgres performance

Test result of running benchmark tests by connecting to proxy

Postgres Host: Proxy IPs
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
latency average = 10.460 ms
tps = 956.015189 (including connections establishing)
tps = 960.124996 (excluding connections establishing)

Test result of running benchmark tests by connecting to directly to master

Postgres Host: Master IP 
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
latency average = 5.250 ms
tps = 1904.913493 (including connections establishing)
tps = 1909.669783 (excluding connections establishing)

This commit has proof of concept for registering the stolon master and proxy to consul for service discovery by introducing a new command stolon-register

Usage:
  stolon-register [flags]

Flags:
      --cluster-name string             cluster name
      --debug                           enable debug logging
  -h, --help                            help for stolon-register
      --kube-resource-kind string       the k8s resource kind to be used to store stolon clusterdata and do sentinel leader election (only "configmap" is currently supported)
      --log-color                       enable color in log output (default if attached to a terminal)
      --log-level string                debug, info (default), warn or error (default "info")
      --metrics-listen-address string   metrics listen address i.e "0.0.0.0:8080" (disabled by default)
      --register-backend string         register backend type (consul) (default "consul")
      --register-endpoints string       a common-delimited list of store endpoints (use https scheme for tls communication) defaults: http://127.0.0.1:8500 for consul (default "http://127.0.0.1:8500")
      --store-backend string            store backend type (etcdv2/etcd, etcdv3, consul or kubernetes)
      --store-ca-file string            verify certificates of HTTPS-enabled store servers using this CA bundle
      --store-cert-file string          certificate file for client identification to the store
      --store-endpoints string          a comma-delimited list of store endpoints (use https scheme for tls communication) (defaults: http://127.0.0.1:2379 for etcd, http://127.0.0.1:8500 for consul)
      --store-key string                private key file for client identification to the store
      --store-prefix string             the store base prefix (default "stolon/cluster")
      --store-skip-tls-verify           skip store certificate verification (insecure!!!)
      --version                         version for stolon-register

Example usage for the following cluster configuration

=== Active sentinels ===

ID    LEADER
094261bb  true
9e95d704  false
f9683b9a  false

=== Active proxies ===

No active proxies

=== Keepers ===

UID   HEALTHY PG LISTENADDRESS  PG HEALTHY  PG WANTEDGENERATION PG CURRENTGENERATION
00660a8c  true  127.0.0.1:5434    true    2     2 
3544bfc9  true  127.0.0.1:5437    true    2     2 
8faab17a  true  127.0.0.1:5436    true    2     2 
9a73a450  true  127.0.0.1:5435    true    4     4 

=== Cluster Info ===

Master: 9a73a450

===== Keepers/DB tree =====

9a73a450 (master)
├─00660a8c
├─3544bfc9
└─8faab17a
stolon-register --cluster-name stolon --store-backend consul
2018-08-28T23:15:03.509+0530    INFO    cmd/register.go:126     successfully registered master stolon with uid a0a37028
2018-08-28T23:15:03.510+0530    INFO    cmd/register.go:138     successfully registered slave stolon with uid ac670a1d
2018-08-28T23:15:03.511+0530    INFO    cmd/register.go:138     successfully registered slave stolon with uid b5b5f468
2018-08-28T23:15:03.511+0530    INFO    cmd/register.go:138     successfully registered slave stolon with uid 852945a0

consul exposes the both master and slave via its DNS

dig +short @localhost -p 8600 slave.stolon.service.consul
127.0.0.01
dig +short @localhost -p 8600 master.stolon.service.consul
127.0.0.01

@jskswamy
Copy link
Contributor Author

@sgotti this does not replace stolon-proxy its additional feature for stolon

@sgotti
Copy link
Member

sgotti commented Aug 28, 2018

@jskswamy you said: "improve performance by connecting to the master directly without the need for stolon-proxy." But clients should only use the stolon-proxy and shouldn't be encouraged to bypass it.

If you want to improve performance (but my tests on Linux, since we only support Linux, are quite different than yours) then let's do it improving the stolon-proxy (have you tried compiling it with go1.11 that implements splice on Linux?), please open a new issue with your benchmarks and how to reproduxe them so we can track it.

@jskswamy
Copy link
Contributor Author

jskswamy commented Aug 28, 2018

I'll raise a separate issue with the performance result after compiling it with go1.11, any reason why the client shouldn't connect directly to master?

This PR needs following feature before it can be merged to master

  • Remove stale master and proxy from consul
  • Watch for changes and update consul

@jskswamy
Copy link
Contributor Author

@sgotti I've created this PR to get your opinion of adding a feature to stolon to register the master and slave (not proxy) into consul/zookeeper for service discovery.

@sgotti
Copy link
Member

sgotti commented Aug 28, 2018

@jskswamy I'll happily review this PR and the idea behind registering to a service discovery system, I just don't agree on the part regarding bypassing the stolon-proxy.

Have you read the architecture doc and the FAQ? They should explain quite well the reason, take also a look at the integration tests that tests various partitioning cases. If there's something not clear feel free to ask (also on the gitter channel).

@jskswamy
Copy link
Contributor Author

I agree that this solution will not forcibly close connections, in that case, we can register proxy as master and slave into consul for service discovery and work on improving the performance of the proxy if there is a significant difference after testing it with go1.11

}
}

// A HTTPClient has necessary method to make http calls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just use consul client lib (I'm not asking for libkv since I'd like to ditch it)? In this way it'll be easier to handle context, tls etc... It's already vendored (perhaps needs an update to the latest version).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will make necessary changes to use consul client lib instead of HTTP calls


// Package mock_store is a generated GoMock package.
package mock_store

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store mock could be moved to the project internal dir so it could be used also by other packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've introduced gomock and the store.go is generated using gomock, would like to hear from you on adopting it in the project

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jskswamy no problem adopting it. Currently we focused on creating a lot integration tests to test real world scenarios so we didn't had the need to mock the store.

@sgotti
Copy link
Member

sgotti commented Aug 31, 2018

@jskswamy Thanks for the PR!

I like the idea of having a separate command. I just did a very fast review of two pieces. Here some more questions:

  • Perhaps it should just become a stolonctl subcommand (stolonctl register) to avoid building a new binary (like proposed in Allow clients to connect to standby replicas in RO mode #132 (comment))

  • If I understand it correctly now register is an oneshot command. I was thinking of something like a daemon the can be active (also multiple instances for HA) and continuously (at intervals for the moment) read the clusterdata and react to changes in an idempotent way. In this way it will automatically register/unregister when a new master/standby changes/is removed and also unregister everything when the cluster is delete/reinitialized. It should be idempotent to avoid, when having multiple instances, doing the same operation two times if not needed.

@jskswamy
Copy link
Contributor Author

jskswamy commented Sep 1, 2018

  • I like the idea of making register subcommand of stolonctl, will make the necessary changes and update the PR
  • yes the current register command is one shot and idempotent, working on it to make it a daemon to watch the store at regular interval and update service discovery

@jskswamy jskswamy force-pushed the service-discovery branch 2 times, most recently from 4bb5da3 to 63ef03b Compare October 11, 2018 16:46
@jskswamy jskswamy force-pushed the service-discovery branch 4 times, most recently from 9a94327 to bcd799c Compare October 25, 2018 20:00
@jskswamy
Copy link
Contributor Author

Made all the requested changes

  • Use consul client library instead of http calls
  • Make register part of stolonctl sub command
  • Run a daemon and idempotently update service discovery for any change in master/slave

Following minor tweak needs to be handled, would like to hear from you @sgotti could you kindly review the changes?

  • Read from consul catalog instead of consul agent for the registered services
  • Add support for TLS

@jskswamy jskswamy changed the title [WIP] Service discovery Service discovery Oct 28, 2018
@jskswamy
Copy link
Contributor Author

@sgotti Please go through the changes I've made the following pending changes as well

  • Read from consul catalog instead of consul agent for the registered services
  • Add support for TLS

Copy link
Member

@sgotti sgotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jskswamy Great work! I haven't tested it so just a fast review, see inline comments plus some other comments here:

  • I checked out out your PR and noticed that there were some unneeded files (mitchellh/mapstructure) and some changed consul client files in the vendor directory. Probably you forgot to run go mod vendor and commit the new vendor files.

  • Are the unit tests enough for you to test it or do you also need some integration tests (with a real consul)?

internal/mock/store/store.go Outdated Show resolved Hide resolved
cmd/stolonctl/cmd/register.go Show resolved Hide resolved
@jskswamy
Copy link
Contributor Author

Yes unit tests are enough and it covers all the scenarios, review the code and let me know if there is a need for adding integration tests

@sgotti
Copy link
Member

sgotti commented Feb 12, 2019

@jskswamy Sorry for being late, I'll review it in the next few days. Thanks!

@sgotti
Copy link
Member

sgotti commented Feb 28, 2019

@jskswamy I haven't tested it directly but it LGTM. Before merging please:

  • squash in one single commit
  • run go mod vendor again since looks like the vendor dir is not in sync with go.mod contents and contains unneeded files.

@jskswamy jskswamy force-pushed the service-discovery branch 3 times, most recently from f56eb3d to 3c4a7f2 Compare March 1, 2019 15:24
…n for service discovery

Co-authored-by: Dinesh B <dineshudt17@gmail.com>
Co-authored-by: Abdul Rahman K <kadkab.abdul@gmail.com>
@jskswamy jskswamy force-pushed the service-discovery branch from 3c4a7f2 to 4730d0a Compare March 1, 2019 19:38
@jskswamy
Copy link
Contributor Author

jskswamy commented Mar 1, 2019

@sgotti

  • I've squashed all the changes into a single commit
  • synced up vendor dir
  • added necessary documentation

@sgotti
Copy link
Member

sgotti commented Mar 12, 2019

@jskswamy Thanks for your great work! Merging.

@sgotti sgotti merged commit d7f0ba2 into sorintlab:master Mar 12, 2019
@ansersolutions
Copy link

Hello all.

I have followed your conversation regarding this PR. It is 2 years old now... I was wondering if this is already into current Stolon?
I am fairly new to Stolon but I think that RO replicas access is the cherry on top that was missing.
For what I have tested it seems that it is not active by default. Is it available with additional setup? Where can I find more info regarding this feature, or any other equivalent that allows access to all PG instances for RO access?

Thanks,

Fernando

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow clients to connect to standby replicas in RO mode
4 participants