Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: DNS cache sync between multiple blocky instances #344

Closed
kwitsch opened this issue Nov 15, 2021 · 16 comments · Fixed by #365
Closed

FR: DNS cache sync between multiple blocky instances #344

kwitsch opened this issue Nov 15, 2021 · 16 comments · Fixed by #365
Labels
🔨 enhancement New feature or request

Comments

@kwitsch
Copy link
Collaborator

kwitsch commented Nov 15, 2021

If blocky is deployed on multiple instances for concurrency and/or failsafe the cache most likely will differ.
This will cause spikes in response time during instance switches.

I'd like to propose an external second level cache for blocky.
Redis would be logical solution as it's already been used in similar scenarios(unbound cache db).

If activated this feature would include:

  • populate blocky cache from redis during startup
  • query redis after cache miss
  • update redis entry on cache insertion/update
@0xERR0R
Copy link
Owner

0xERR0R commented Nov 16, 2021

Hey,

which cache do you mean: cache with black/whitelists or cache with DNS responses (positive/negative)?

@kwitsch
Copy link
Collaborator Author

kwitsch commented Nov 16, 2021

Hi,
This would be the DNS response cache.
It was inspired by the section "Cache DB Module Options" in the unbound manual.

@0xERR0R
Copy link
Owner

0xERR0R commented Nov 16, 2021

I think, this would be a nice feature, but is should also run without redis. I don't like current cache implementation (which I forked and patched: https://github.com/0xERR0R/go-cache), maybe it is possible to run something redis compatible in memory for single instance and optionally with external redis for multiple instances to implement only one API.

@0xERR0R 0xERR0R added the 🔨 enhancement New feature or request label Nov 16, 2021
@kwitsch
Copy link
Collaborator Author

kwitsch commented Nov 16, 2021

My idea was to keep the current cache and add the new redis cache as seperate(optional) resolver between cache and parallel_best.
As it depends on external services it will certainly be slower than an in-memory cache but potentionally way faster than an internet request.
Removing the internal blocky cache would decrease the resolution performance a lot in my opinion.
Therfore i wouldn't suggest that.

An alternative solution may be to let blocky broadcast cache insertions to other instances.
If such a broadcast is recieved the same cache insertion is done on the recieving end.
I guess this would be even better since ther wouldn't be a seperate resolver, server and request neccesarry.
Drawback on this solution would be potentionally a little more network traffic.

@kwitsch kwitsch changed the title FR: redis second level cache FR: DNS cache sync between multiple blocky instances Nov 16, 2021
@0xERR0R
Copy link
Owner

0xERR0R commented Nov 16, 2021

I think, at runtime only one cache should be there, either external redis or internal. My idea was to use something like "embedded redis", maybe a kind of in-memory cache, which is compatible to redis API (I'm not sure something exists, but I hope so). In this case we could implement caching against redis API and user can either configure external redis or to use "internal" one.

@kwitsch
Copy link
Collaborator Author

kwitsch commented Nov 17, 2021

This would have the contrary effect to my proposal. 😅

Speed in my home environment where DNS resolution is done locally behind blocky:

  • after clean start ~35ms(no cache)
  • blocky miss, unbound hit ~15ms(unbound cache)
  • blocky hit ~7ms(blocky cache)

Replacing the internal blocky cache would slow responses down as network request take ~3ms.

My network infrastructure:
image

Currently it takes a few hours to populate the blocky cache enough to get the ~7ms times.
I'm trying to speed this up. 😅

@0xERR0R
Copy link
Owner

0xERR0R commented Nov 17, 2021

wow, interesting infrastructure! off-topic, just curious: 3 blocky instances are running on different pieces of hardware for redundancy/loadbalancing? So each client has 3 DNS resolver (blocky instances) configured? And why you are using unbound and not external upstream resolver in blocky?

I think, in your case, using redis will not improve your speed. Redis can improve blocky's startup, but redis cache must be maintained at runtime and this will bring some overhead.

@kwitsch
Copy link
Collaborator Author

kwitsch commented Nov 17, 2021

Everything is dockerized in a swarm environment with 3 managers. Every manager got a blocky container on them.
Therefore a whole manager could be offline without decreasing DNS performance.
Both unbound container are deployed with a constrained that there is only ever one on a single node.
Unless there are more than one manager down DNS resolution won't be affected by node failure.

All three blocky instances are distributed as DNS resolvers for other hardware in the network.
The unbound resolvers are full recursive to minimize external communication for privacy.

The whole setup ishould provide high resolution speed and little downtime as possible.

For comparison Google(8.8.8.8) and Cloudflare(1.1.1.1) resolution speed tends to be ~45ms.

@kwitsch
Copy link
Collaborator Author

kwitsch commented Nov 19, 2021

@0xERR0R
I'm still pondering over it.
A broadcast channel to harmonize the DNS cache insertions seems most beneficial to me.

Every cache insertion would be broadcasted parallel to the insertion itself.
Received broadcasts would be inserted without a notify.

Pro:

  • all instances stay self contained(no external services necessary)
  • instance cache population without actual usage(fallback instance)
  • no performance decrease to the current cache

Con:

  • no cold start(all instances start at the same time)
  • network traffic increases with instances count
  • higher memory consumption through cache redundancy

@0xERR0R
Copy link
Owner

0xERR0R commented Nov 23, 2021

What do you mean with "broadcast channel"? Do you want to "connect" blocky instances to each other?

@kwitsch
Copy link
Collaborator Author

kwitsch commented Nov 23, 2021

Currently considering a UDP socket as its designed for just that.
The highest IP in a subnet is usually your broadcast address.

For example:
Network: 192.168.0.0/24 -> broadcast address: 192.168.0.255
Sync port in config: 11112
blocky instance 1: 192.168.0.2
blocky instance 2: 192.168.0.3
would result in an UDP connection to 192.168.0.255:11112

The instances itself wouldn't know of each other or how many other instances are listening.

@0xERR0R
Copy link
Owner

0xERR0R commented Nov 23, 2021

ok, understand. Why not redis with Pub/Sub? It could be managed by redis, all subscribed blockys will get cache insertion propagation. If one instance restarts, it can get the cache from redis.

Your approach needs own protocol and it relies on network infrastructure (all blocky instances are in the same subnet).

@kwitsch
Copy link
Collaborator Author

kwitsch commented Nov 23, 2021

The sync could surely be done with redis.
I tried to think of a simple solution without multiple caches.

I really wouldn't like running another service in the blocky container or missing the cache inside it.

It seems like I'm a little stuck there.
Could you elaborate your solution suggestion?

@0xERR0R
Copy link
Owner

0xERR0R commented Nov 23, 2021

Ok, these are my thoughts, this should be verified (maybe it doesn't work this way):

Blocky 1 -------- redis --------- blocky 2 (or even more)

Blocky 1 inserts a key in the cache and propagates is (async) to redis (publish over channel "cache")
Blocky 2 is subscribed to channel "cache" and receives cache insertions from blocky 1. Blocky 2 updates own cache. Blocky 2 propagates own cache inserts to redis.

On instance startup, blocky loads cache from redis.

Redis is optional. If not configured, each blocky instance is Independent.

We can use redis pub/sub also for other things, for example disabling of blocking: REST request receives blocky 1, blocky 1 disables blocking and propagates the change to redis. All other blocky instances disables blocking too.

@kwitsch
Copy link
Collaborator Author

kwitsch commented Nov 23, 2021

Ah ok i get it.
That seems to be a more efficient solution than my first proposal.
I will look into this some time later this week.
Thanks for the input!

@kwitsch
Copy link
Collaborator Author

kwitsch commented Nov 25, 2021

It seems that redis streams are the better option for this feature request as they store the message protocol.
Pub/Sub is the simpler solution communication wise but would require more logic in blocky self because the sync messages aren't stored in redis.

Currently i would prefer a redis stream solution but I'll look further into it 😅


Edit 1:

I looked further into it and changed my point of view.
Redis streams won't fit the needs as the key value assertion isn't really queryable.

I will try implementing the pub/sub approach.


Edit 2:

Started development in repository 344.
May take some time to finish as my time is somewhat limited at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants