Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use strict order for upstreams #355

Open
agneevX opened this issue Dec 3, 2021 · 25 comments
Open

Use strict order for upstreams #355

agneevX opened this issue Dec 3, 2021 · 25 comments
Labels
🔨 enhancement New feature or request
Milestone

Comments

@agneevX
Copy link
Contributor

agneevX commented Dec 3, 2021

Currently blocky...

Blocky picks 2 random resolvers from the list for each query and returns the answer from the fastest one. This improves your network speed and increases your privacy - your DNS traffic will be distributed over multiple providers.

This works very well, but is not desirable when you want to use a known resolver as primary all the time and want to use a secondary resolver only as backup.

I propose adding an option to query the first resolver in the list, then falling back to secondary and so on... after any of the following:

  • Timeout is reached (half of the upstreamTimeout value maybe)
  • REFUSED is returned. Google does this for some queries containing ECS data. More on that issue here.
@kwitsch
Copy link
Collaborator

kwitsch commented Dec 3, 2021

I also would like an option to configure the upstream policy.

Maybe we could implement a configuration enum like:

  • parallel_best(old behavior)(default)
  • strict(behavior as mentioned in first post)
  • random(one request to a random resolver in list)

This would mirror the first two options in adguard home. (I don't get the third option and never used it 😅)

@0xERR0R
Copy link
Owner

0xERR0R commented Dec 7, 2021

Current implementation was designed to combine privacy with performance:

  • blocky peeks random (weighted, upstream resolvers with errors become "penalty") 2 resolvers and returns answer from the fastest
  • If you define 10 upstream resolver, each receives only 20% of your DNS traffic

We can provide additional "strategies", like strict, random or random weighted based on upstream resolver response time.

@0xERR0R 0xERR0R added the 🔨 enhancement New feature or request label Dec 7, 2021
@0xERR0R
Copy link
Owner

0xERR0R commented Dec 10, 2021

Maybe we can also implement a "hyperlocal" mode: Blocky works as a recursive resolver and doesn't rely on any upstream resolver? That means blocky will recursively ask the corresponding name server and caches results. This will significantly improve the privacy, but is probably slow for queries with many subdomains.

Any thoughts?

@agneevX
Copy link
Contributor Author

agneevX commented Dec 10, 2021

What do you mean by "corresponding name server"? Do you mean something like Unbound?

@0xERR0R
Copy link
Owner

0xERR0R commented Dec 10, 2021

Yes, like unbound, but in blocky. In this case we can reuse blocky's cache and provide additional prometheus metrics.

@agneevX
Copy link
Contributor Author

agneevX commented Dec 10, 2021

A few things I've observed when I used to use Unbound to query root name servers:

  • Queries to root name servers take far too long. <1000ms queries are fairly common.
  • It's fully unencrypted and uses ports 53/udp and 53/tcp. My ISP immediately hijacks that and redirects to their BIND server.

@kwitsch
Copy link
Collaborator

kwitsch commented Dec 10, 2021

I don't think that it would be feasible to include a recursive dns server option.
Most users won't use it as forward dns servers are more common.
Therefore it would most likely just increase binary size.

In my setup there are multiple unbound instances as upstream resolvers for blocky.
Even I wouldn't use an internal recursive option as this would reduce my fault tolerance and configuration option.

@reitermarkus
Copy link

I'm currently trying to migrate from Pi-Hole to Blocky, since it is much better suited for running on K8s, but this issue is currently blocking me from doing so, unless I'm missing another option. I want the LanCache DNS server to always be preferred if it is available.

My current Setup, with Pi-Hole using strict order, looks like

Router --- Pi-Hole --- LanCache --- Unbound
              \_______________________/

With Blocky, I think currently the only options would be

Router --- LanCache --- Blocky --- Unbound

or

Router --- Blocky --- LanCache --- Unbound

with LanCache being a SPOF since both Blocky and Unbound have multiple replicas.

@agneevX
Copy link
Contributor Author

agneevX commented Feb 14, 2022

@reitermarkus is that the Steam LAN thing?

If so, it should not be a problem if LC answers queries faster than your other upstreams.

@reitermarkus
Copy link

Yes, it's for caching Steam games, among other things.

Well, my other upstream is Unbound running in the same cluster, so it's quite likely that LanCache will not be significantly faster, if at all.

@agneevX
Copy link
Contributor Author

agneevX commented Feb 14, 2022

I'm not sure about blocky, but I know AdGuard Home has a Fastest IP feature that does exactly what you want.

@0xERR0R
Copy link
Owner

0xERR0R commented Feb 14, 2022

Conditional DNS configuration (https://0xerr0r.github.io/blocky/configuration/#conditional-dns-resolution) could work if you can figure out which DNS names are used (steamcontent.com for example for steam, maybe others?) Did you try this approach?

@reitermarkus
Copy link

reitermarkus commented Feb 14, 2022

AdGuard Home has a Fastest IP feature that does exactly what you want.

I had a look ad AdGuard Home before finding Blocky, but it has the same issue as Pi-Hole: No easy way to have multiple replicas.

Conditional DNS configuration (https://0xerr0r.github.io/blocky/configuration/#conditional-dns-resolution) could work

That depends: Will conditional DNS fall back to using the default upstream when LanCache DNS is down?

@0xERR0R
Copy link
Owner

0xERR0R commented Feb 14, 2022

That depends: Will conditional DNS fall back to using the default upstream when LanCache DNS is down?

No, blocky will ask your lancache instance and if it returns NXDOMAIN, there is no fallback. Is it not the desired behaviour? Since lancache will either return the ip of local cache or the origin ip.

@reitermarkus
Copy link

The problem would be if LanCache is down, now I cannot resolve any cached domains. Basically, I want to be able to download game updates even if LanCache is down for whatever reason.

Currently, this works by having LanCache as first DNS server, and if it is down, fall back to the next, i.e. downloads fall back to using the uncached upstream IP.

@0xERR0R
Copy link
Owner

0xERR0R commented Feb 14, 2022

Is this the way how pihole works? If one upstream DNS is down, it tries the second one (and not round-robin)? That means, if you query for example for "google.com", the pihole will ask you lancache instance first, does lancache return NXDOMAIN or will it resolve this query properly (by using some external resolver)?

@reitermarkus
Copy link

Is this the way how pihole works?

Not by default, but since it uses DNSmasq, I can configure it to use strict order.

the pihole will ask you lancache instance first, does lancache return NXDOMAIN or will it resolve this query properly (by using some external resolver)?

LanCache will resolve it, using Unbound as upstream. And the same Unbound server acts as the fallback DNS server in Pi-Hole.

So in case LanCache is running:

Pi-Hole -> LanCache -> Unbound

In case LanCache is down:

Pi-Hole -> Unbound

@0xERR0R
Copy link
Owner

0xERR0R commented Feb 14, 2022

ok, got it. The requested "strict order resolution" will solve this challenge. With conditional mapping, you won't get the fallback resolution.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2022

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2022

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@0xERR0R 0xERR0R removed this from the 0.20 milestone Nov 4, 2022
@DerRockWolf
Copy link
Contributor

DerRockWolf commented Jul 28, 2023

Hi all, I would like to contribute both the strict & random (non weighted) resolvers.

Where should we add the new upstreamStrategy config field. Should we start with adding it as a global enum which configures the strategy for all upstream groups?
If there are use cases for a group scoped enum we could still discuss adding it later on.

@0xERR0R
Copy link
Owner

0xERR0R commented Jul 28, 2023

Hi all, I would like to contribute both the strict & random (non weighted) resolvers.

That sounds good! 👍

Currently, we do have the "upstream" section and related "UpstreamTimeout". The "upstream" section is not a nested struct, but only a map (historical reasons). It would be better to have all upstream related configurations in a separate structure, but in this case we'll introduce breaking changes. So I think it would be better (for a moment) to introduce a new top-level config enum "upstreamStrategy" and refactor the "ParallelBestResolver" to extract the resolver choose logic for example in a separate interface. So we can implement more strategies later.

@ThinkChaos
Copy link
Collaborator

ThinkChaos commented Jul 28, 2023

Currently, we do have the "upstream" section and related "UpstreamTimeout". The "upstream" section is not a nested struct, but only a map (historical reasons). It would be better to have all upstream related configurations in a separate structure, but in this case we'll introduce breaking changes.

I've got local changes to allow having more config there, and be back-compat. Basically I also renamed it to upstreams instead of upstream, so we can use our standard option deprecation flow.
The main goal of those changes is to have parallel init for upstreams (#835). It's almost done so I could make a PR soon. But I think I can even split the config change so we can merge that quicker and @DerRockWolf can use that as a base.

EDIT: so if you, @DerRockWolf, have already started some work, don't worry too much about the config, just add something to the big Config struct, and moving your struct into the one I created should be easy :)

refactor the "ParallelBestResolver" to extract the resolver choose logic for example in a separate interface. So we can implement more strategies later.

Related to #1001

@ThinkChaos
Copy link
Collaborator

Bad weather gave me a bit of extra time today, so I opened #1086 with just the config change.

@DerRockWolf
Copy link
Contributor

@agneevX my PR (#1093) implementing the strict strategy doesn't tackle:

REFUSED is returned. Google does this for some queries containing ECS data.

The "upstream resolver" contacting the upstream DNS server only returns err if it didn't get a reply. The responses are returned as received, regardless of the DNS message response codes.

This is also currently the case for the parallel_best resolver. If google DNS replies REFUSED and wins the race, blocky will return the answer from google.

We would need to implement custom handling based on the DNS response codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants