Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for pure randomization of DNS upstream server #95

Open
monperrus opened this issue Apr 6, 2018 · 15 comments
Open

add support for pure randomization of DNS upstream server #95

monperrus opened this issue Apr 6, 2018 · 15 comments

Comments

@monperrus
Copy link

Dear stubby team,

I want my DNS requests to be spread over different servers. For this, option round_robin_upstreams is a first step. However, it is weak if the attacker gets to make fake DNS requests so that all the interesting requests to be spied are sent to the same server.

For overcoming this problem, what about adding an option for pure randomization when selecting the server:

# Instructs stubby to randomly distribute queries across all available name servers. 
randomize_upstreams: 1
@saradickinson
Copy link
Contributor

Hi @monperrus, thanks for the request.
We've thought about adding this but haven't done so yet because we are not convinced of the benefit of sending queries to different servers for purely privacy reasons (we certainly find it more performant, which is why it is the default). One argument is that over time (days, weeks) any resolver you use (either in round robin or with a random distribution) will likely acquire enough information about an end user to profile that user because over time it will see the entire query profile (we are creatures of habit and tend to visit the same sites).
Also, if an attacker has the capability to inject fake DNS queries into your DNS-over-TLS connections in this way then they already have access to your local DNS resolution.

I'm not saying we won't add this feature (we probably will) but I'd like to see more research on the threat analysis of spreading queries across multiple servers and how that varies with the number of servers. Without a sophisticated algorithm based on the query content then I think in the end, you either trust a server you are using to see some or all of your queries, or you don't....

@monperrus
Copy link
Author

monperrus commented Apr 9, 2018 via email

@saradickinson
Copy link
Contributor

Sadly I don't have a clear answer for you at the moment apart from to limit your server selection to only those you trust to see all your queries... it is possible that a resolver that sees just a slice of your DNS queries could generate a profile based on correlating looks ups (if they cared enough).

As an aside.... This is one of the arguments for using DNS-over-HTTPS (DoH) within browser tabs: if you do all the DNS lookups for a site to the DNS resolver of that site then you don't leak anything they don't already know.

Also - have you seen the proposal for Oblivious DNS? It seems we need something like this to have technical solution for end-to-end privacy for DNS.

@ArchangeGabriel
Copy link
Contributor

@saradickinson That would be a solution for web browser, but not all DNS traffic is web browser-generated. ;)

ODNS OTOH looks promising, but they are missing encryption of the answer (at least in their abstract).

@saradickinson
Copy link
Contributor

BTW - I believe there will be an IETF I-D submitted soon on ODNS for discussion in DPRIVE.

@monperrus
Copy link
Author

This is one of the arguments for using DNS-over-HTTPS (DoH) within browser tabs: if you do all the DNS lookups for a site to the DNS resolver of that site then you don't leak anything they don't already know.

I'm not sure to understand. AFAIU, with DoH, all your traffic goes to the same server, which itself may be compromised. Could you elaborate on what you mean by "if you do all the DNS lookups for a site to the DNS resolver of that site"?

@ArchangeGabriel
Copy link
Contributor

@monperrus After some bit of thinking, I’m not sure anymore what @saradickinson has in mind here. I first thought that the idea would be to ask the DNS authoritative server of the website directly, but this suppose already knowing it, which could be done using qname-minimization to reduce leaked information.

But then you’re not using a STUB resolver anymore, you’re using a plain recursive resolver that communicates with DNS authoritative servers directly. Which I would love to do (and used to before Stubby existed), but of course cannot because most of them don’t speak DNS-over-TLS (or HTTPS).

@saradickinson
Copy link
Contributor

@monperrus @ArchangeGabriel Hi Both, I was referring to some of the discussion around early use cases for DoH where it was proposed that for a given website/application there could be a discovery mechanism to determine if the host offered DoH and if so that host resolver would be used for DNS queries by that website/application (e.g. your twitter app uses the twitter resolver for all queries). However the only current implementation I know if is in Firefox and that does indeed currently send all queries to the (single) configured DoH resolver ( or 'DNS API server' as it is described in the DoH draft).

@monperrus
Copy link
Author

monperrus commented Apr 16, 2018 via email

@monperrus
Copy link
Author

FYI, just published a post on this topic: https://www.monperrus.net/martin/randomization-encryption-dns-requests

@saradickinson
Copy link
Contributor

Nice - thanks!

@monperrus
Copy link
Author

Very interesting implementation choice by @dimkr's nss-tls

When nss-tls is configured like this, it pseudo-randomly chooses one of the servers, for each name lookup. The choice of the server is consistent: if the same domain is resolved twice (e.g. for its IPv4 and IPv6 addresses, respectively), nss-tlsd will use the same DoH server for both queries. If nss-tlsd is restarted, it will keep using the same DoH server to resolve that domain. This contributes to privacy, since every DoH server sees only a portion of the user's browsing history.

(notifications: @nharrand, @rudametw)

@tc287
Copy link

tc287 commented Sep 8, 2021

I'd like to see more research on the threat analysis of spreading queries across multiple servers

I'm not a security professional, but in my head, the "best" attack on round-robin goes something like this:

  • Attacker hosts a popular public resolver and also popular websites.
  • Attacker's site uses JavaScript to query some random names on attacker-controlled domains.
    • Initially, this is used to work out the position of attacker-controlled servers in the list of resolvers (and the total number of servers N).
    • The attacker site now issues N-1 DNS lookups every time the server receives a non-attacker controlled query.

The benefit to the attacker is seeing 100% of "real" DNS queries, instead of e.g. 50%, but at the cost of e.g. doubling the amount of DNS traffic with a pattern that is very easy to spot (each attacker-controlled query would happen shortly after a "real" query). While I don't know the motives of these hypothetical attackers, I'd imagine that the risk of the attack being discovered outweighs the benefit of seeing a greater fraction of queries.

For ~deterministic random selection, there are a bunch of considerations:

  • Should queries for subdomains of the same "organizational domain" (e.g. www.something.example, cdn.something.example) go to the same server?
    • If so, how would you determine what an "organizational domain" is? (Public suffix list? Ew...)
    • What about sites whose CDN is on an unrelated domain (e.g. somethingcdn.example or randomnumber.cdncompany.example)?
  • Should different resolver addresses controlled by the same entity be treated as belonging in the same "shard"? (e.g. IPv4 vs. IPv6, primary/secondary IPs)
  • How should failover work?
  • How stable is the random choice when resolvers are added/removed?
  • Does the choice of resolver leak information about the domain being resolved?

A potential algorithm goes something like this:

  • Group resolvers by their controlling entity, and give the group a name. (The name is arbitrary but shouldn't change when different resolvers in the group are added/removed.)
  • Use something like the Public Suffix List to determine the "organizational domain".
  • Generate a cryptographically random salt at install/first run.
  • Calculate h(salt, organizational_domain, group_name) for each group, pick the group with the smallest hash and a random server in that group. For failover, pick the next-smallest.
    • h() could just be SHA-256, but it's probably possible to do better without leaking information about the domain being resolved.
    • Alternatively, use a more efficient method of consistent hashing.

(FWIW, nss-tls uses g_str_hash (session->request.name) % nresolvers which effectively re-randomizes when you add an additional resolver. I'm not sure how it handles failover.)

@dimkr
Copy link

dimkr commented Sep 9, 2021

(FWIW, nss-tls uses g_str_hash (session->request.name) % nresolvers which effectively re-randomizes when you add an additional resolver. I'm not sure how it handles failover.)

That's true, it re-randomizes when you add an extra server. However, it is assumed that the user changes the servers list only rarely or just once (when nss-tls is installed), and nss-tls doesn't fall back to another server if the psuedo-randomly chosen server fails to resolve a domain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants