checking interest in merging code to help stop malware #596

mabrafooMSFT · 2019-06-19T10:37:54Z

In raising this issue, I confirm the following (please check boxes, eg [X]) Failure to fill the template will close your issue:

I have read and understood the contributors guide.
The issue I am reporting can be replicated
The issue I am reporting isn't a duplicate

How familiar are you with the codebase?:

{7}

[BUG | ISSUE] Expected Behaviour:
N/A

[BUG | ISSUE] Actual Behaviour:
N/A

[BUG | ISSUE] Steps to reproduce:
N/A

Log file output [if available]

N/A

Device specifics

Hardware Type: rPi, VPS, etc
OS: N/A

This template was created based on the work of udemy-dl.

Reaching out to see if there is any interest in adding support to help stop malware. The code I am referring to can be found here. This idea is something I started working on 4 years ago and thanks to krisives, have a demo of the idea.
https://github.com/krisives/dnsmasq-sqlite

Pi-hole obviously supports blacklists, but it does not support massive whitelists. Right now, if a DNS request is made to any of the potentially malicious domains found here, https://phishstats.info/phish_score.txt, it will only be blocked if the domain is on a blacklist. The way https://github.com/krisives/dnsmasq-sqlite works is it imports the 10 million domains from here: https://www.domcop.com/top-10-million-domains or a much larger 400 million set from here: http://commoncrawl.org/connect/blog/ (searchable here http://wwwranking.webdatacommons.org/ )

As an example, if you select any domain from the top of this list: https://phishstats.info/phish_score.txt , pi-hole will only block it if the domain is found on a recently updated blacklist. Since krisives's code requires the domain to have some amount of popularity in order to resolved, it likely wouldn't resolve the domain because it doesn't trust domains with a extremely low harmonic page rank score.

Note because blocking a domain could create a serious problem for someone using pi-hole in a business network, I am leaning more toward alerting and not blocking until edge cases are more thoroughly tested. Let me know if this is something that anyone would want added to pi-hole.

The text was updated successfully, but these errors were encountered:

DL6ER · 2019-06-19T12:35:44Z

Thank you for your message. We will discuss your idea within our team. However, as I'm the main developer of FTL, I will already comment on your request from my perspective.

The idea discussed here is to generate a database containing the Top 10 million domains and query this domain from the database.

Not found in database: The query is immediately replied with NXDOMAIN.
The query is not forwarded to an upstream server.
Found in database: The query is forwarded to an upstream server.

Performance measurements for the proposed 10 mio. domains database. The measurements have been taken on a Raspberry 3B+ without any other load:

Creation of domains.db: 7min 36sec (excluding downloading time for roughly 100 MB of raw data)
Size of domains.db: 606M
Lookup time of google.com: ~ 20 msec
Lookup time of tagesschau.de: ~ 50 msec
Lookup time of a non-existing domain (abffgktl.cfv): ~ 50 msec

Issues I see with this implementation:

Even though individual the lookup time is not very high on its own, I expect the database lookups to severely degrade the performance of FTL. You should be aware that FTL has the goal to perform flawlessly on low- to mid-end devices even when being deployed in large environments seeing up to tens of millions of queries coming from up to thousands of clients every day.
The main performance hit comes from that the database lookup are sequential and block the DNS service.
I tried to query a few (sub)domains from my own company and they all failed.
This is somewhat expected as they are surely not within the downloaded domain list. I even doubt they will be on the Top 400 mio. list you mention as an alternative. They are simply not publicly known enough and there is no need for this.
Hence, you suggest to only warn people about possibly suspicious domains without any immediate consequences for the current query.
I checked the result for some of the domains that are blocked on my Pi-hole (like google-analytics.com) and see that they are all included. One might argue that they are not "suspicious" so this point is just a mentioning, no rating.

Summary: This approach uses a (huge!) local database for comparing requested domains to given list, trying to figure out if they are suspicious. Even if there seems to be evidence that something might be suspicious this either cannot have much of a consequence or will cause many perfectly legit domains to malfunction if they just happen to not be widely used enough.

My advise: This should not be implemented into FTL.
However, this does not mean that it should not be used. Anyone can use it on their own if they like to. A possible solution is to set up the modified dnsmasq you reference on another - or even the same with a custom port - device and to direct pihole-FTL to use it as upstream server. By this, you can use both without having to do any modifications to pihole-FTL.

Please do not understand this as an easy dismissal of your idea. I downloaded, reviewed and tried the modified variant you proposed, I just don't see it being suitable for integration in our project. Others of our team may or not want to leave additional comments here.

Thanks for your suggestion and for using the Pi-hole!

krisives · 2019-06-19T15:24:28Z

Thanks, @DL6ER for taking a look at the proposal.

Even though individual the lookup time is not very high on its own, I expect the database lookups to severely degrade the performance of FTL. You should be aware that FTL has the goal to perform flawlessly on low- to mid-end devices even when being deployed in large environments seeing up to tens of millions of queries coming from up to thousands of clients every day.

It's worth mentioning this only happens if the cache is empty, care was taken so that dnsmasq's cache continues to be used. Sqlite is also doing it's own caching of the index, so I don't think it's fair to assume that adding N queries will scale to N time.

The main performance hit comes from that the database lookup are sequential and block the DNS service.

The proof-of-concept implementation here is for simplicity. In theory, the code could be modified to do the DB lookup asynchronously an alert dnsmasq once it's ready.

I tried to query a few (sub)domains from my own company and they all failed.
This is somewhat expected as they are surely not within the downloaded domain list. I even doubt they will be on the Top 400 mio. list you mention as an alternative. They are simply not publicly known enough and there is no need for this.
Hence, you suggest to only warn people about possibly suspicious domains without any immediate consequences for the current query.

Part of this could because the current code looks up subdomains very strictly, since again this is a proof of concept. Possibly only the primary domain (foo.com) should be looked up instead of the subdomain (bar.foo.com) - in which case maybe your company is in the popular domain list.

I checked the result for some of the domains that are blocked on my Pi-hole (like google-analytics.com) and see that they are all included. One might argue that they are not "suspicious" so this point is just a mentioning, no rating.

The goal of these changes weren't to change the blacklisting behavior, which remains the same, but instead to augment the whitelist behavior, since to my knowledge there is no scalable way to add tens of millions of domains to the whitelist in dnsmasq. Attempts to do so caused dnsmasq to take 30+ minutes to start and never responds to a DNS query. In other words, you can still load all the gravity blacklists in addition to this functionality.

(Please let me know if my understanding of that is wrong)

cannot have much of a consequence

I don't think that's accurate. Go take some known recently discovered malware and check which domain it's using for exfiltration and you'll likely find a domain that is very new and not on these popular lists.

will cause many perfectly legit domains to malfunction if they just happen to not be widely used enough.

Long term our goal is to provide a UI for managing this so that organizations can fine-tune the allowed domains to their needs.

My advise: This should not be implemented into FTL.

I understand. Our needs may not directly overlap with the needs of Pi-Hole.

I downloaded, reviewed and tried the modified variant you proposed

Thanks a ton for that. I agree that your assessment isn't flippant or misguided.

If any of you guys have any suggestions or comments on the idea or implementation please let me know.

AzureMarker · 2019-06-19T15:28:47Z

This kind of blocking could be achieved with FTL itself by using a regex to block all domains (.*) and whitelisting the domains from the list you mentioned. However, the current implementation of the whitelist check takes O(n) time because it does a loop through all whitelisted items, so it would not be performant at that scale.

I agree with @DL6ER's comments otherwise.

krisives · 2019-06-19T15:33:15Z

This kind of blocking could be achieved with FTL itself by using a regex to block all domains (.*) and whitelisting the domains from the list you mentioned. However, the current implementation of the whitelist check takes O(n) time because it does a loop through all whitelisted items, so it would not be performant at that scale.

My attempts to do that with 10 million domains caused dnsmasq to never respond to a DNS query in time.

mabrafooMSFT · 2019-06-19T22:10:59Z

Great feedback. Thanks everyone, especially DL6ER for taking time to kick the tires. My main question here really was "Is FTL interested in adding support for any size whitelists that can be accessed in N time?" It's cool if this is not something that is a priority for your project. With so many projects on GitHub you would never have known this beta support has been added to a fork of dnsmasq. If you ever need support for large whitelists, feel free to use code from us. Thanks for creating an awesome solution to counter the bastards that are ruining the internet with too many ads!

AzureMarker · 2019-06-19T23:33:55Z

Is FTL interested in adding support for any size whitelists that can be accessed in N time?

This is already done, so perhaps you meant O(1) time (via a hashmap)?
This is the code that is used to check if a domain is in the whitelist (simple O(n) loop):

FTL/regex.c

Lines 50 to 63 in 9e8273b

 bool __attribute__((pure)) in_whitelist(char *domain) 

 { 

 bool found = false; 

 for(int i=0; i < whitelist.count; i++) 

 { 

 // strcasecmp() compares two strings ignoring case 

 if(strcasecmp(whitelist.domains[i], domain) == 0) 

 { 

 found = true; 

 break; 

 } 

 } 

 return found; 

 }

My attempts to do that with 10 million domains caused dnsmasq to never respond to a DNS query in time.

Perhaps FTL never finished loading the large whitelist into memory, or the scaling factor on the O(n) lookup was too large for the machine you were testing on. We have not tested FTL with such a large whitelist before, so this is interesting.

mabrafooMSFT · 2019-06-19T23:37:03Z

Yea, I meant using a hashmap / O(1).

mabrafooMSFT · 2019-06-19T23:50:44Z

btw, I assume both of you probably have a decent history of your DNS requests since you are some of the main contributors to the pi-hole project. I encourage you to look into how often your dns queries are outsite the top 10 or 100 million.

As you know, each day many domains, are added, but also many expire. Using a massive whitelists will not stop all malware and ransomware, but it will stop (or detect) a significant percentage that is large enough, IMHO, to educate others about reducing the chance of getting malware.

I used to think there were many billions of domains registered, but that is not the case.
https://www.internetlivestats.com/total-number-of-websites/

Also, direct IP communication is a separate issue that should be monitored on corp networks and matched with DNS requests in order to detect traffic that didn't involve a DNS request at all, but that is a pcap based solution and beyond what pi-hole can help with.

dschaper · 2019-06-20T03:25:48Z

Getting away from the technical aspects of things, here's my take on the functional side.

Pi-hole by default allows access to all domains. Users are required to either manually add domains they do not want to visit or subscribe to lists that others curate of domains they have found to be not beneficial and can be blocked.

This approach, and correct me please if I am wrong, assumes that every domain is bad by default and only a select list of domains are "safe" to visit. This list of safe domains is curated by someone else. This potentially opens a hole in users defenses.

With the current way of Pi-hole if a list accidentally or otherwise adds a domain the worst that can happen is that domain is blocked and no access is granted. It may be frustrating, but still is not inherently dangerous. This approach, if a domain is listed unintentionally then everyone using that master whitelist is open to having that site accessible to their networks. This is where the hole comes in to play. It also plays in to the user having a false sense of security and relying on others. (Insert the Tommy Boy Guarantee clip here.)

DL6ER · 2019-06-20T21:50:27Z

My attempts to do that with 10 million domains caused dnsmasq to never respond to a DNS query in time.

We will look into implementing the whitelist check using a prepared database transaction to the whitelist table. By this we could take advantage of SQLite's indexing system, vastly reducing the lookup time for huge whitelists.
There are a few technical details to investigate, including but not limited to concurrency. Not only pihole-FTL has to be able to access the database, but also gravity will want to write to it ocassionally. This might be a problem when the database is updated while FTL wants to query the whitelist, etc.

RYAN-dot-LOCAL · 2019-07-09T16:57:24Z

Are there plans to add support for default deny? Looking at rfc1035.c in the development branch vs this file https://github.com/krisives/dnsmasq-sqlite/blob/b7001b4ff638da9e5085af341653555749459934/src/rfc1035.c
..it doesn't look like there is any default deny/alert support.

If a site like outbrain.com is not in any blacklist, but also NOT in the whitelist. Default deny would block/alert for outbrain.com queries. Not sure if that is the plan, but it would be a useful way to make use of these changes.

btw, Pi-hole seems to be getting even more popular online lately. Well done guys/gals!

AzureMarker · 2019-07-09T17:57:03Z

@4688-is-great I explained how this can be done in this comment: #596 (comment)

DL6ER · 2019-07-09T18:00:55Z

If a site like outbrain.com is not in any blacklist, but also NOT in the whitelist. Default deny would block/alert for outbrain.com queries. Not sure if that is the plan, but it would be a useful way to make use of these changes.

This is already possible. Add a regex rule that blocks everything and then selectively whitelist domains.

However, the current Pi-hole lacks sufficient performance for huge whitelists. This will be added with Pi-hole v5.0 which will perform quickly even for millions of whitelist domains. We're even looking into adding regex-based whitelisting which will be even more powerful.

RYAN-dot-LOCAL · 2019-07-12T14:59:42Z

I am looking forward to trying out the development branch of v5.0 at home on a separate network so that my children don't kill me if anything breaks as I test it out. Keep in mind I don't expect any support from your team. But could you clarify if the following 2 commands are (or will be) correct to enable the v5.0 code?
$ pihole checkout ftl development
$ pihole checkout core development

DL6ER · 2019-07-12T18:56:34Z

Use

pihole checkout dev

for this.

mabrafooMSFT · 2019-07-13T04:03:50Z

Just installed the dev branch tonight. Really cool to be able to use pi-hole with a 10 million domain whitelist. The code seems to work perfectly so far after I ran the pihole --regex .* to block everything not in the whitelist table.

These are only my thoughts. These are NOT instructions for any devs...

One thing I discovered will not be a surprise.
www.website.com doesn't work if only website.com is in the whitelist table. This is a case of wanting to have the option of both cases.
CASE 1
one.two.trustedDomain.com working for both one.two and three.four.trustedDomain.com because trustedDomain.com is in the whitelist table with the "TrustSubDomains" column set to 1.
CASE 2
However, maybe I have five.six.trustedSUBdomain.com and I don't want seven.eight.trustedSUBdomain.com to be automatically allowed because trustedSUBdomain.com is in the whitelist with the TrustSubDomain column set to 0 or null.

Popular domains often have many subdomains. Many of those subdomains are not found in domain ranking lists because they are specific to a certain website working. These "website plumbing" subdomains are not found on other websites because there is not point in linking to them.

I haven't looked into how blacklists are handled. I assume subdomain.domain.com is blocked if domain.com is in the blacklist table. Details are in the devil! I will stop here. No need to rant on to you guys that know 100 times more about this stuff.

dschaper · 2019-07-13T04:08:10Z

Thanks for the comments, doesn't sound rant-y in the least to me. As developers, we see things in our way and implement them how we see them. It helps a ton to have outside information and comments from people outside the developers circle, and to come up with use-cases that we may not have considered.

krisives · 2019-07-13T04:14:37Z

Great work Pi-Hole team you guys are awesome.

AzureMarker · 2019-07-13T06:09:55Z

@herdingcatz Your use case is solved by regex whitelisting. See #612.

The blacklist and whitelist block or allow domains exactly as they are specified. If you want to block a range of domains, you need to use regex. In v4.3.1 we support regex blacklisting. The PR I linked will add support for regex whitelisting.

mabrafooMSFT · 2019-07-14T02:01:06Z

I still need to test more, but this code seems to work. This took me a LONG time to write, I am sure there are mistakes and more efficient ways to write it. Note this code is part of the dnsmasq_interfaces.c file. Sorry for sharing via a sketchy site like pastebin. It's the fastest way to share it from this computer. https://pastebin.com/s5q6Zp5d

In a nutshell, if paypal.com is whitelisted then this code will allow any version of www.paypal.com or a.b.c.d.e.f.g.h.i.paypay.com. This will automate the subdomain headaches that come with trying to build a default deny DNS solution.

Maybe this code will be useful for the project, either way, I will use it!

mabrafooMSFT · 2019-07-15T15:38:20Z

Hopefully this post makes sense. These are simply my observations, not demands :)

This tweet from @JayTHL, who is one of the world's most active malware website hunters is useful to explain a typical Monday morning in 2019.
https://twitter.com/JayTHL/status/1150779826602729474

As you can see, he is alerting his followers to block 800+ new domains that he identified today. This is why I believe that default deny is the best approach to protect medium or small organizations. As I said before, alerting is probably better at first because something could break if an important domain (or subdomain) is accidentally blocked because it wasn't on a whitelist. A hacker can create a domain in seconds. It takes time for the malicious domain to be identified, added to a malware list and imported into pi-hole.

Ransomware is costing businesses about 80 billion per year and it will likely grow to 800 billion per year. A crazy number considering that exceeds the GDP of any country not in the top 20 in world GDP.

Challenges with using a massive whitelist to sink hole 50+% of the internet, but you still want to block malware and advertising domains that happen to be in the whitelist.

Massive whitelists are only a list of popular domains. That list will contain advertising domains. This concept is backwards to how pi-hole is currently implemented because the original approach was to use the whitelist in a case by case basis to allow a domain that is on some blocklist, but you need for some reason one off reason.

One of the issues I am running into in my testing is how to do all three of these things.
1- Use massive whitelist of domains.
2- Block anything that is not on the whitelist (pihole --regex .*)
3- Block an advertising or malware domain that accidentally is included in the whitelist.

#3 could be solved by cleaning up the whitelist when a blacklist is updated or adding logic in dnsmasq_interfaces.c. Both have pros and cons.

DL6ER · 2019-07-15T15:42:27Z

We should really continue this discussion on out forum https://discourse.pi-hole.net as it is the much better platform for this. I will close your issue here as it is no longer (or has never been) an actual FTL issue but is rather a feature request.

dschaper · 2019-07-15T15:45:12Z

At some point you have to have an ultimate "right". Either the whitelist is the ultimate right answer or the blacklist is the ultimate right list. Item 3 will start you on a cycle that just never ends.

Block everything. Allow things on a whitelist. Something on whitelist should be blocked. Add thing to blacklist. User actually wants to see that site, add back to whitelist. ...

You need to be able to say "If you add to whitelist, it will resolve." or "If this is not on whitelist, it will not". It gets confusing to users to say "If you add to whitelist, it will resolve, unless it really shouldn't and then it wont."

DL6ER added the Discussion label Jun 19, 2019

DL6ER mentioned this issue Jun 23, 2019

Use whitelist table in gravity.db directly #600

Merged

5 tasks

DL6ER added the Fixed in next release label Jul 9, 2019

DL6ER closed this as completed Jul 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checking interest in merging code to help stop malware #596

checking interest in merging code to help stop malware #596

mabrafooMSFT commented Jun 19, 2019

DL6ER commented Jun 19, 2019

krisives commented Jun 19, 2019

AzureMarker commented Jun 19, 2019

krisives commented Jun 19, 2019

mabrafooMSFT commented Jun 19, 2019

AzureMarker commented Jun 19, 2019

mabrafooMSFT commented Jun 19, 2019

mabrafooMSFT commented Jun 19, 2019 •

edited

Loading

dschaper commented Jun 20, 2019

DL6ER commented Jun 20, 2019

RYAN-dot-LOCAL commented Jul 9, 2019

AzureMarker commented Jul 9, 2019

DL6ER commented Jul 9, 2019

RYAN-dot-LOCAL commented Jul 12, 2019

DL6ER commented Jul 12, 2019

mabrafooMSFT commented Jul 13, 2019 •

edited

Loading

dschaper commented Jul 13, 2019

krisives commented Jul 13, 2019

AzureMarker commented Jul 13, 2019

mabrafooMSFT commented Jul 14, 2019 •

edited

Loading

mabrafooMSFT commented Jul 15, 2019

DL6ER commented Jul 15, 2019

dschaper commented Jul 15, 2019

checking interest in merging code to help stop malware #596

checking interest in merging code to help stop malware #596

Comments

mabrafooMSFT commented Jun 19, 2019

[BUG | ISSUE] Steps to reproduce: N/A

DL6ER commented Jun 19, 2019

krisives commented Jun 19, 2019

AzureMarker commented Jun 19, 2019

krisives commented Jun 19, 2019

mabrafooMSFT commented Jun 19, 2019

AzureMarker commented Jun 19, 2019

mabrafooMSFT commented Jun 19, 2019

mabrafooMSFT commented Jun 19, 2019 • edited Loading

dschaper commented Jun 20, 2019

DL6ER commented Jun 20, 2019

RYAN-dot-LOCAL commented Jul 9, 2019

AzureMarker commented Jul 9, 2019

DL6ER commented Jul 9, 2019

RYAN-dot-LOCAL commented Jul 12, 2019

DL6ER commented Jul 12, 2019

mabrafooMSFT commented Jul 13, 2019 • edited Loading

dschaper commented Jul 13, 2019

krisives commented Jul 13, 2019

AzureMarker commented Jul 13, 2019

mabrafooMSFT commented Jul 14, 2019 • edited Loading

mabrafooMSFT commented Jul 15, 2019

DL6ER commented Jul 15, 2019

dschaper commented Jul 15, 2019

[BUG | ISSUE] Steps to reproduce:
N/A

mabrafooMSFT commented Jun 19, 2019 •

edited

Loading

mabrafooMSFT commented Jul 13, 2019 •

edited

Loading

mabrafooMSFT commented Jul 14, 2019 •

edited

Loading