Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add per-client rate-limiting #1052

Merged
merged 1 commit into from
Feb 14, 2021
Merged

Add per-client rate-limiting #1052

merged 1 commit into from
Feb 14, 2021

Conversation

DL6ER
Copy link
Member

@DL6ER DL6ER commented Feb 3, 2021

By submitting this pull request, I confirm the following:

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

How familiar are you with the codebase?:

10


Add per-client rate-limiting. Rate-limited queries are answered with a REFUSED reply and not further processed by FTL Even when they are logged in pihole.log, they will not contribute to the overall statistics nor enter the Query Log or the database.
This serves the purpose of a real rate-limit and ensures that abnormally behaving clients hammering FTL with thousands of queries per second cannot lead to a denial-of-service failure.

Rate-limiting is very customizable, it defaults to allowing not more than 1000 queries in 60 seconds. Both numbers can be changed by the user.

It is important to note that rate-limiting is happening on a per-client basis. Other clients can continue to use FTL while rate-limited clients are short-circuited at the same time.

Rate-limiting can be disabled by setting RATE_LIMIT=0/0.

One might argue that rate-limiting should best be realized with a firewall. However, we do not want to touch user firewalls and this effectively does the same thing (albeit better because we don't simply drop but reply with a proper REFUSED message).

… seconds.

Signed-off-by: DL6ER <dl6er@dl6er.de>
@PromoFaux
Copy link
Member

it defaults to allowing not more than 1000 queries in 60 seconds.

I wonder, would a default of off be better?

@DL6ER
Copy link
Member Author

DL6ER commented Feb 4, 2021

I guess we want this protection to be enabled by default. Use cases are clients going crazy because of some defect and/or DNS loops between a router and the Pi-hole. We can add a warning to the Pi-hole diagnosis system if you like so users are aware of this.

My first attempt was showing rate-limited queries in the Query Log, however, this would not effectively help against a DoS attack as the memory needed to hold them would still quickly grow.

We should rather stress in the change log that this is something you can disable. The numbers I put in should not even be close to be triggered by any correctly working client. 10,000 queries per minute still allow 14,4 million queries per-client in 24 hours. We may even want to reduce this number but off by default seems wrong to me.

@DL6ER DL6ER added the PR: Approval Required Open Pull Request, needs approval label Feb 6, 2021
@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/updated-dietpi-and-pihole-now-high-cpu-usage-and-lost-connection/44069/2

@DL6ER DL6ER merged commit 991f128 into development Feb 14, 2021
@DL6ER DL6ER deleted the new/rate_limiting branch February 14, 2021 14:41
@DL6ER DL6ER mentioned this pull request Feb 15, 2021
@PromoFaux PromoFaux mentioned this pull request Feb 16, 2021
@LordSimal
Copy link

Would be nice if this new rate limiting setting can be adjusted via the Web GUI.
I was wondering why my backup script wasn't working anymore since 2 weeks and just recently stumbled upon this new feature.
Currently I added the setting into the .conf file as described in the doc here and now everything works again.
https://docs.pi-hole.net/ftldns/configfile/#rate_limit

@DL6ER
Copy link
Member Author

DL6ER commented Mar 4, 2021

@LordSimal Why is your backup script getting rate-limited in the first place? And do you consider this healthy behavior?

@LordSimal
Copy link

I use rclone to sync my server data to PCloud. I cant decide/change any way how rclone connects to the server. I guess every request, and therefore ever file/folder, is creating a new DNS query.
I know the above comments are saying you can adjust that ratelimiting behaviour per client but where/how do i do that?

@DL6ER
Copy link
Member Author

DL6ER commented Mar 4, 2021

Rate-limiting is only measured and applied per-client. You cannot set different levels for different devices. The proper fix for this behavior seems to be a local DNS chance on the server. Like what modern Ubuntu ships with systemd-resolved

@LordSimal
Copy link

Well I am running on Ubuntu 20 so I will look into that.

Still I think it would be nice if there would be at least a notice in the backend when a client is being rate limited (not only in debug mode).
Or at least notify the user about that new feature when updating (which now is a bit late i would say)

On the client side I only got "server misbehaving" (which is OK of course) and in the pihole.log the Query was logged with REFUSED. Can that at least be adapted to REFUSED/RATELIMITED or something that leads to this feature?
Or perhaps a notification of some sort to the system admin that a specific client is being rate limited right now?

@jfb-pihole
Copy link
Member

"Or at least notify the user about that new feature when updating (which now is a bit late i would say)"

In all of our release announcements, we have stressed the need to read the release notes for each new release. This new feature, in particular, was thoroughly discussed in the V5.7 release notes:

https://pi-hole.net/2021/02/16/pi-hole-ftl-v5-7-and-web-v5-4-released/#page-content

@LordSimal
Copy link

Im sorry, I didn't look at the blog/release notes 🙇🏻

@anno006
Copy link

anno006 commented Sep 24, 2021

Or/and first make a web-gui instell page nd make it possible to enter it per client. Before you make a drastic short mindbreak.So next time make sure to have everything ready before you inplement stuff. as a router, fer example pfsense use pihole to. And the router has a lot of dns quiries from other devices..... smart!!!

@DL6ER
Copy link
Member Author

DL6ER commented Sep 25, 2021

@anno006 Smart people read the release notes or blog posts before updating. We mentioned the option to disable rate-limiting and you could have done so even before updating to ensure there would have been no downtime.

In your case, your router (pfSense) seems to be configured incorrectly which is the cause for this. When using ECS (EDNS0 Client Subnet), the Pi-hole can tell apart your clients even when the router makes all the queries on their behalf. We are preparing a pfSense + Pi-hole step-by-step to make this more obvious in the future.

As you can see above, this feature was merged and released more than half a year ago. In the meantime, it always turned out that there is an underlying problem (rouge client) whenever rate limiting was triggered. Even for those where the router is the single point of contact. The default value is rather permissive: 1000 queries every 60 seconds equals up to 1.4 million queries a day. If your network is larger than this, you should clearly have your router configured accordingly to stop things like these from happening.

@anno006
Copy link

anno006 commented Sep 27, 2021

@anno006 Smart people read the release notes or blog posts before updating. We mentioned the option to disable rate-limiting and you could have done so even before updating to ensure there would have been no downtime.

In your case, your router (pfSense) seems to be configured incorrectly which is the cause for this. When using ECS (EDNS0 Client Subnet), the Pi-hole can tell apart your clients even when the router makes all the queries on their behalf. We are preparing a pfSense + Pi-hole step-by-step to make this more obvious in the future.

As you can see above, this feature was merged and released more than half a year ago. In the meantime, it always turned out that there is an underlying problem (rouge client) whenever rate limiting was triggered. Even for those where the router is the single point of contact. The default value is rather permissive: 1000 queries every 60 seconds equals up to 1.4 million queries a day. If your network is larger than this, you should clearly have your router configured accordingly to stop things like these from happening.

Really smart people will accept there mistakes and learn from them instead of backfire. As i said it is possible to overrun you limet. Next time make the step by step before activating...

@DL6ER
Copy link
Member Author

DL6ER commented Sep 27, 2021

It wasn't my intention to "backfire". Not in the slightest. Rereading my own post, I can see how you can have seen it differently. You have my sincere apologies for that. I just wanted to point out that you can improve your router configuration to get an improvement which also influences rate-limiting.

I can also assure you that we have no issues with undoing a change if it turned out to be incorrect. We have at least a few thousand Pi-hole users out there and I've heard less than five complaints about it. Typically, users were even thankful because it revealed misbehaving clients and they were able to do something against it. And I have seen at least one report that helped a user to keep the rest of their network responsive when one client went nuts and started requesting millions of queries per minute.

What do you consider to be the mistake exactly? Rate-limiting in the first place or do you consider the default value too low? Would it still be too low in your network when pfSense would be configured such that clients can be kept apart? Or is the problem rather that the value is not modifiable via the web dashboard? If the latter, please take into account that the vast majority of advanced settings is currently not editable on the wbe interface.

@anno006
Copy link

anno006 commented Sep 28, 2021

I did search for the terms you posted. But this is also pretty new. it has the same problem, not inplemented yet or atleast understandable/findable for the peaple involved in the devoloping. not for people like me. don't understand me wrong i don't dislike it, only the way you inplement it.

  1. " Rate-limiting in the first place or do you consider the default value too low?" you do have good point to have a form of rate-limitation, so no with the aguments. i can see that it is for some situations the value to low.
  2. "Would it still be too low in your network when pfSense would be configured such that clients can be kept apart?" i did read the forum posts about EDNS and if it is a but further in development and i can inplement it than this will be a good workaround/ solution.
  3. "Or is the problem rather that the value is not modifiable via the web dashboard?" if the default value is for whatever reason not working or good that it is indeed best to have it in the webur. Or even i can put commands in, but need to know with. And the only info i can find is: https://docs.pi-hole.net/ftldns/configfile/ i do miss where i need to change the RATE_LIMIT value.
  4. "If the latter, please take into account that the vast majority of advanced settings is currently not editable on the wbe interface" I thought so, but this is the first time i actually run into a problem that is so new there is amost no information to find. Exept all of you (cryptic) posts...

i do like what i read about the EDNS and you explenation that it can help with misbehaving. I will look out for the step by step.

And last some background info about my system. i am running a pfsense (own hardware) run some vm's for homeautomation and pihole, have about 200000 queries in 24 hours.

@DL6ER
Copy link
Member Author

DL6ER commented Sep 28, 2021

the only info i can find is: https://docs.pi-hole.net/ftldns/configfile/ i do miss where i need to change the RATE_LIMIT value.

The link to the documentation is directly there on the web dashboard. The place where to put the options is at the very top of this linked document:

You can create a file /etc/pihole/pihole-FTL.conf that will be read by FTLDNS on startup.

Possible settings (the option shown first is the default):
[...]

I'm open to suggestions how to improve the situation.

i can see that it is for some situations the value to low.

Yes. We've had a lot of discussions about the default value on our Discourse forum and this is what we concluded on. The motivation behind this is that Pi-hole is typically employed in a regular household with typically not more than a dozen devices. If Pi-hole is run on an Raspberry Pi and only one of the clients goes near the rate limiting limit of 1.4 million queries a day, the Pi-hole will continue to work. However, if only two clients go close to this query rate, the Pi-hole will eventually stop working correctly because the memory will be used up. Hence, we didn't want to set the default even higher. Of course, there are much larger networks and Pi-hole may be running on much beefier hardware with more than 1 GB of memory available, but this is rather edge-case than a default setup. This just to give some background of why we chose this value.

I will look out for the step by step.

You are not the first one asking for this, we're currently preparing a blog post about this. @dschaper might be able to give some more info about the current status of it (I do not use pfSense myself).

i am running a pfsense (own hardware) run some vm's for homeautomation and pihole, have about 200000 queries in 24 hours.

So this means roughly 140 queries per minute so about 7 times lower than the default limit. So we're talking about peak load here. We could easily change the default from 1,000 queries in 1 minute to, say, 10,000 queries in 10 minutes. Even when this would not change the upper limit, it'd relax the peak load issue. However, this would also mean that clients will only later be rate-limited and blocking them will take longer. This is the price to pay. @dschaper thoughts?

@dschaper
Copy link
Member

I have most of the guide written for OPNsense. If pfSense uses dnsmasq then it's a few lines of configuration. Bonus is if you can run unbound as well then the Sense stays as DHCP server and sole DNS server for the network while it forwards queries along with ECS/EDNS0 information to Pi-hole. No need for Conditional Forwarding.

The basic idea is to add

add-mac
add-subnet=32,128

as a dnsmasq config snippet and then adjust the DNS servers accordingly after including DHCP lease names in dnsmasq. (You may also need to check if rebind protection is something pfSense does, I know OPNsense has it on by default.)

@dschaper
Copy link
Member

@anno006 Please read https://pi-hole.net/2021/09/30/pi-hole-and-opnsense/ and see if this works on pfsense as well. Thanks!

@Raineer
Copy link

Raineer commented Jan 15, 2022

I welcome enlightenment on what metrics were used to decide on the default 1000/1m rate limit setting.

Same here. I've hit it twice today doing nothing out of the ordinary. One was building a package in Arch (the script happened to download Imagemagick libraries in many different chunks), the other was downloading ts chunks of a video stream to concatenate. These were both serial requests yet, apparently, easily triggered this default.

The error was presented in the GUI, I was able to google a fix, and I'm up and running. But I heavily caution so many in this thread thinking the only way to get the default 1000/60 limit is a mistake. This is too conservative by at least an order of magnitude.

@dschaper
Copy link
Member

I'd counter that with Arch being broken if it's querying 1000 different hosts to get imagemagick libraries in one minute. At least it should be caching some of the domain queries.

@DL6ER
Copy link
Member Author

DL6ER commented Jan 15, 2022

I welcome enlightenment on what metrics were used to decide on the default 1000/1m rate limit setting.

Same here

Please have a look at my reply to the initial request of enlightenment.

This is too conservative by at least an order of magnitude.

I disagree for the reasons already mentioned above:

If you would, say, tenfold the per-client limit (10000/1m), you'd immediately end up at 1 GB per client next to the operating system which would already be too much for some systems.

Maybe we can add a step in the installer script that asks the user to specify a custom limit (or accept the default) so you can guess what would be appropriate for your setup.

@bretonium
Copy link

I disagree for the reasons already mentioned above:

If you would, say, tenfold the per-client limit (10000/1m), you'd immediately end up at 1 GB per client next to the operating system which would already be too much for some systems.

I appreciate the level of care taken about system resources, especially in the context of rpi's constrained RAM.

However, i feel that the approach taken here is not the commonly accepted one. We have many well-known mechanisms in the OS to tackle the problem of lack of memory. If i don't have enough RAM, OOM kicks in, and it is loud, and it is known where to look for its signs, and it will probably yell at me as soon as i ssh to the rpi. If i am ok with some slowness, i add swap. If i want to limit RAM, i use cgroups. And those who care about system resources monitor them with grafana or something else.

Firefox does not refuse to open the 101-st tab. Messenger does not refuse to accept n-th message. Terminal does not refuse to execute m-th command. They just try to do it.

I still think that rate-limiting should be opt-in instead of opt-out.

@DL6ER
Copy link
Member Author

DL6ER commented Jan 15, 2022

The comparison isn't valid because Firefox spawn individual processes that can by killed independently without taking the entire application down. Neither do messagers, etc. keep all their stuff in memory. Pi-hole keeps all queries in memory intentionally so you can do quick filtering and requesting. In the end, Pi-hole always pays attention to work fast even on the low-end of hardware and this means we cannot do a ton of disk lookups over the slow SD interface.

I also disagree about OOM being "loud". You will not notice a single bit of its action if you are not connected to a live terminal that will show "Killed." or if you are not used to reading the system logs manually. I would argue that Pi-hole's rate-limiting is a lot louder (for the inexperienced user) as it will be shown prominently on the dashboard and an in the log files, too. I perfectly see how power users want to have more control and that us giving them the control via a config options isn't obvious enough.

This and

I still think that rate-limiting should be opt-in instead of opt-out.

is why I suggested

Maybe we can add a step in the installer script that asks the user to specify a custom limit (or accept the default) so you can guess what would be appropriate for your setup.

This will explicitly ask you what to do. It will neither by an opt-in or opt-out. Instead it will be explicitly asking. This ensures we are not actually making any assumptions for what is good for the user.

@okaestne
Copy link

I also just found out about this feature, by restoring my last firefox session with like 15 tabs. Reading this thread I agree on the rate limiting itself, even if it might be too conservative per default.
The actual reason I comment is: how long does the rate limit last? How to configure it?

@yubiuser
Copy link
Member

The actual reason I comment is: how long does the rate limit last? How to configure it?

https://docs.pi-hole.net/ftldns/configfile/#rate_limit

@okaestne
Copy link

The actual reason I comment is: how long does the rate limit last? How to configure it?

https://docs.pi-hole.net/ftldns/configfile/#rate_limit

I meant the time of being limited.

@PromoFaux
Copy link
Member

It's all there in the docs.

...
The default settings for FTL's rate-limiting are to permit no more than 1000 queries in 60 seconds.
...
For this setting, both numbers, the maximum number of queries within a given time, and the length of the time interval (seconds) have to be specified.
...
For instance, if you want to set a rate limit of 1 query per hour, the option should look like RATE_LIMIT=1/3600

@okaestne
Copy link

It's all there in the docs.

Correct me if I'm wrong, but all I see there is the time and the amount of queries needed to get limited. But I want to set the time of being rate limited. Even after 5 minutes I was unable to use anything on my PC because of being limited. Therefore, I would like to know, how to configure the time of being in the state "rate limited" or at least how to un-limit again. Not setting the bounds of before getting limited.

@yubiuser
Copy link
Member

yubiuser commented Jan 16, 2022

@okaestne

I agree it's not that obvious and we need to update the documentation. Since #1199
the rate limiting works the following way:

  1. FTL uses a fixed counting interval (the rate-limiting interval, 1 minute by default)
  2. if a client exceeds the set limit it will be blocked until the end of the counting interval (it will let you know in /var/log/pihole-FTL.log something like Rate-limiting 10.0.1.39 for at least 44 seconds)
  3. if the client will exceed the limit while being blocked, it will also be blocked in the next interval as well.

So far, there is no way to

set the time of being rate limited

It's always until the end of the interval or until the end of the next interval if the limit is reached while being blocked. If a client continues to send queries it will be blocked forever.

@okaestne
Copy link

okaestne commented Jan 16, 2022

@yubiuser

Ah I see, thanks. So likely systemd-resolve needs to be reconfigured to don't retry too often..

[2022-01-16 13:09:46.579 1314/T1319] Still rate-limiting 192.168.178.10 as it made additional 1007 queries
[2022-01-16 13:10:46.819 1314/T1319] Still rate-limiting 192.168.178.10 as it made additional 11928 queries
[2022-01-16 13:11:46.055 1314/T1319] Still rate-limiting 192.168.178.10 as it made additional 5976 queries
[2022-01-16 13:12:46.295 1314/T1319] Still rate-limiting 192.168.178.10 as it made additional 14856 queries
[2022-01-16 13:13:46.535 1314/T1319] Still rate-limiting 192.168.178.10 as it made additional 1176 queries
[2022-01-16 13:14:46.775 1314/T1319] Still rate-limiting 192.168.178.10 as it made additional 1392 queries
[2022-01-16 13:15:46.011 1314/T1319] Still rate-limiting 192.168.178.10 as it made additional 1824 queries
[2022-01-16 13:16:46.251 1314/T1319] Still rate-limiting 192.168.178.10 as it made additional 7176 queries
[2022-01-16 13:17:46.491 1314/T1319] Still rate-limiting 192.168.178.10 as it made additional 6955 queries
[2022-01-16 13:18:46.731 1314/T1319] Still rate-limiting 192.168.178.10 as it made additional 5933 queries

Edit: Maybe instead of blocking all requests all together, can't we just reject all requests above the limit?

@DL6ER
Copy link
Member Author

DL6ER commented Jan 16, 2022

If you surpass the limit at any point, the client gets blocked. This means they were able to do 1000 queries within a minute and then only blocked thereafter. A client is unblocked as soon as it made less than 1000 queries per minute.

[2022-01-16 13:09:46.579 1314/T1319] Still rate-limiting 192.168.178.10 as it made additional 1007 queries

Here your client was only slightly above the limit and almost got de-limited. The client could immediately have done 1000 queries thereafter.

Maybe instead of blocking all requests all together, can't we just reject all requests above the limit?

Because this doesn't look like a proper solution. Imagine the typical usecase of the rate-limitation that could be a DNS loop between, e.g., the router and the Pi-hole or a client going rouge. Allowing 1000 queries per minute leads to significant load on the Pi-hole - consider also the ever growing queries database eventually eating up all space on disk.
Take now the example from your log above (11928 within 60 seconds, this is not the worst number). Let us simplify a bit by assuming the queries were uniformly distributed over time, i.e., we can say that roughly 200 queries were done per second. In your suggestion we would allow these queries (5 seconds until we hit 10,000) and then block the rest of the 55 seconds.
Your Internet would be working 8% of the time and some content can be downloaded. I guess that's a situation that is much harder to understand and debug for the user than a straight loss of Internet connection altogether.

@yubiuser
Copy link
Member

I adjusted my above answer slightly as it was a bit incorrect regarding the interval. The interval steps correspond to the set rate-limiting interval. They also do not correspond to the straight minute but relative to when FTL has finished starting (so start of the daemon + possible delay by DELAY_STARTUP).

Edit: Maybe instead of blocking all requests all together, can't we just reject all requests above the limit?

This was the behavior before #1199
We intentionally changed this, because it seems a but inconsistent to block a client going wild (constantly generating queries) after 1000 queries, but then allow 1000 queries, block again, unblock, block....

@vseven
Copy link

vseven commented Jan 25, 2022

What would you suggest? The current limit equates to rate-limiting when a single client issues more than 1.4 million queries a day. I don't think it should be increased.

I route all our clients to our in house DNS then have the pi-hole setup as the first forwarder address. So in my case the in house DNS server, serving 150+ clients, can easily hit 1000 requests in 60 seconds and does so often. I increased my rate limit in the config file to 2000/60 and it rarely hits that but it would be nice if this was a GUI setting as others have mentioned.

@MarkusNiGit
Copy link

Feedback from user land:
Setup
2 piholes in HA
Synology 820+
I enabled Cloud Sync on the Synology, a software package delivered by Synology and part of every box they ship. It took me 4 days to figure out the source of slow down/stop of the Cloud Sync. Turns out there was a little exclamation point on my pihole dashboard blinking. It took me a day to read up on what the message "rate limit" meant. The dashboard, btw, gives no information about the fact that this is not a one-time occurance but that there are other messages that, in my case, went on for days that the limit is still active because the Synology tried again within 60 sec. I had to go digging through logs via SSH into the pihole.
I was finally able to bum up the limit to 4000/60 after I read all the documentation I could get my hands on. From the "still blocking the Synology message" I was able to deduce what "my" appropriate limit is.
Synology now races like a charm and is able to keep several thousands of files in sync between my cloud and on-prem.
It took me another day to find this github where this variable/functionality originated.
A couple of thoughts:

  • this situation was caused by a professional piece of hardware, with a professional piece of software on it (Synology after all), not some fly by night, rogue client.
  • Even for an somewhat advanced user this was way too much work to identify the source of the problem. And the source of the problem is NOT my Synology box but pihole. In this case simply doubling the artificial limit to 4000/60 solved the issue and was easily handled by a Raspberry Pi 4 w/2GB
  • It took way to long to find steps for problem resolution: the warning message in Pi-Hole diagnosis did not let me know: 1) it is an ongoing problem and all outbound DNS request coming from the Synology were being blocked (I finally caught it when my son told me he cannot get on the Minecraft server anymore) 2) the original message did not instruct me how to fix it in case I want to

My take-away here is that this is a suboptimal failsafe for a problem that, granted, I barely understand but keeps off -the rack software from well-known vendors from working as designed AND discoverability of the source of the issue is super low AND ease of removing the blocker is super complicated.
The team here doesn't own me anything and I owe the usage of pihole to this team of creators. My recommendation as a user would be: please address the three main issues outlined.

@MarkusNiGit
Copy link

MarkusNiGit commented Feb 14, 2022

I should also add:

  • This was the inital sync so transfer rate was higher than "normal changes" in cloud synch over time
  • I did not see that the rate limit was a throttling the client but flat out stopped it completely from any further DNS queries. Perhaps my observation here was wrong.
  • Based on what I am seeing, the rate limit was in place with cached DNS query results not new ones. See the picture after I increased the limit. This represents 78% cached DNS query results:
    image

@DL6ER
Copy link
Member Author

DL6ER commented Feb 15, 2022

@MarkusNiGit Sorry for the inconvenience this feature caused.

We've been discussing ways to improve the user experience but this is difficult. For instance, the jumping triangle used to be larger and jump more heavily in a previous version but users complained its too catchy so it was reduced in size to be less obtrusive. This may or not be the reason why you didn't see it before.

My initial idea to add a dialogue to the installer that explicitly asks you to configure the rate limit was discussed by then dismissed because it had a couple of drawbacks. The main being that there are no dialogues on upgrades, only on fresh installs, as the former is a semi-automated process that shouldn't require user interaction.

So I meant to add a dedicated page to the dashboard's settings page but, you know, family duties took over and, eventually, I forgot about it, so I'll use this as a reminder to hopefully get this done, soonish.

this situation was caused by a professional piece of hardware, with a professional piece of software on it (Synology after all), not some fly by night, rogue client

I know you said that a few times but I disagree on this bit. Let me say why: Synology might be a bigger company (about 600 employees according to Wikipedia), but this doesn't mean they are a global player like Google and similar. While the latter surely have a quality assurance center, I don't think this is true for Synology. At least not for their software or, at least, not for this part of it. Why do I say this? Because opening a dedicated connection per file (which is what I deduce from your explanation) is not good software design by any means. Imagine you are synchronizing a large number of really small files. The three-way handshake of TCP may easily be more traffic than the synchronization itself. Even more when TLS (HTTPS) is also in use. On top comes the thousands of DNS queries that, in a typical setup without a Pi-hole (absence of a caching DNS resolver) bounce back and forth between you and, say, Google DNS and cause additional delays and traffic all over the place. This all does not tell me that this is a good software.

snip

Concerning your three points: I disagree on the quality of the software and explained why (point 1), point 2 is unfortunate but it seems we cannot have a solution that suits all users, point 3: I'll work on this as said above.

@MarkusNiGit
Copy link

I used the word "professional" and not "quality" to describe the Synology solution deliberately. I cannot judge the quality of the software though even I can tell that multiple DNS requests per file to be transferred might appear inefficient but there might be very good reasons such as restrictions on the API put in place by the cloud provider. My point is that you cannot judge either whether this is "good software" and this vendor created facts by selling the solution as it is currently designed.

Based on your answer you seem to be equally concerned about "rogue clients" and enforcing quality on the software of vendors that sell professional solutions.
While I find it admirable that you are introducing a protection of the ecosystem, I wonder how many hits you are getting for each: a) number of rogue clients stopped vs. b) number of professional pieces of software stopped functioning. And what is the the right ratio? Is one rogue client acceptable? Is one professional piece of software and hardware stopped from functioning completely acceptable?
BTW, this sounds way more combative than I intend it to be. I think we agreed, at a minimum, on increasing discoverability and ease of changing for the end-user.

@wolph
Copy link

wolph commented Feb 28, 2022

This "feature" cost me quite a bit of time to track down... for the future it would be nice if features like these would be asked about during an upgrade instead of being silently enabled.

Either that or only enable features like these on new installations.

I understand that 1000 queries per minute seems like a reasonable limit, but pihole has enough power users and/or quirky setups that can run into issues.

@jfb-pihole
Copy link
Member

"it would be nice if features like these would be asked about during an upgrade instead of being silently enabled."

This is why we write and publish detailed release notes, and in our release post we remind readers to read the notes prior to upgrading.

https://pi-hole.net/blog/2021/02/16/pi-hole-ftl-v5-7-and-web-v5-4-released/

At user request, activation of rate limits was added to the diagnostic messages, and this change was also covered in release notes:

https://pi-hole.net/blog/2021/09/11/pi-hole-ftl-v5-9-web-v5-6-and-core-v5-4-released/

A later release did some tweaking to rate limits:

https://pi-hole.net/blog/2021/10/23/pi-hole-ftl-v5-11-web-v5-8-and-core-v5-6-released/

@wolph
Copy link

wolph commented Feb 28, 2022

"it would be nice if features like these would be asked about during an upgrade instead of being silently enabled."

This is why we write and publish detailed release notes, and in our release post we remind readers to read the notes prior to upgrading.

I know, and I did read them. I just never assumed I would hit the limit because at first glance it seemed reasonable.

Only after a bunch of random issues over te last few days did I notice that pi-hole was the culprit.

That's the issue with these types of changes. Everything might seem fine and dandy initially but break things down the line at some seemingly random moment when some service or cron job suddenly does a burst of requests.

@FreedomFaighter
Copy link

bf52156 seems to have altered the default rate limit of queries for pihole and if this should be opened in a new issue I can do that but in what basis is that number generated? I can't see how this limit would even support a normal network of a household of four. This is probably in the same concern others have expressed in this pull request.

@yubiuser
Copy link
Member

@FreedomFaighter

The commit you linked did not change any rate limit but introduced it in the first place. Before there was no rate-limit. This is also not a new commit, but over 1 year old. No need to open an new issue about it.

@lexcyn
Copy link

lexcyn commented Sep 2, 2022

Just chiming in I think this should be an option in the web GUI or opt-in. I was pulling my hair out why my network was grinding to a halt, but if I rebooted things it would work for a while then stop. Well, turns out the corporate VPN I am using (I work from home, no way around it) likes to randomly spam thousands of DNS requests for literally every network endpoint at random times, which rate limits me and blocks all further connections from my main router. I ended up disabling rate limiting on both my pihole's to resolve it, but it would be nice if there was just a toggle to turn it on/off, so I could turn it back on after my VPN is finished having its fit.

@jfb-pihole
Copy link
Member

"turns out the corporate VPN I am using (I work from home, no way around it) likes to randomly spam thousands of DNS requests for literally every network endpoint at random times"

It seems unusual that a corporate VPN would send DNS queries to a local DNS server, and not send the DNS through the VPN. Consider discussing this with corporate IT and moving the DNS traffic to the tunnel.

@lexcyn
Copy link

lexcyn commented Sep 2, 2022

"turns out the corporate VPN I am using (I work from home, no way around it) likes to randomly spam thousands of DNS requests for literally every network endpoint at random times"

It seems unusual that a corporate VPN would send DNS queries to a local DNS server, and not send the DNS through the VPN. Consider discussing this with corporate IT and moving the DNS traffic to the tunnel.

I agree - this has happened before, but our network team could not figure out what was going on. We have split tunneling enabled for our VPN client, so internet traffic goes through our local connection but internal goes through the VPN, and in this case, the DNS queries start 'bleeding' to local, causing the massive spike in queries. This morning there were over 260k requests after I removed rate limiting, so now I will re-enable and hope it doesn't happen again - but still, would be nice if there was a GUI option! If you're curious, we are using Global Protect by Palo Alto.

@D33M0N
Copy link

D33M0N commented Jan 4, 2024

Apparently steam-for-linux manages to hit the default limit 1000/60 easily while downloading games. Sadly the "per client" blocking kills the pi-hole server entirely for everyone, when you run it in the same workstation as your steam client.
Could you exclude the server itself (127.0.0.1) from the per-client blockage??

@yubiuser
Copy link
Member

yubiuser commented Jan 4, 2024

Not sure why we abandoned #1468 but maybe we could revive it at some point when all the other PRs are merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement PR: Approval Required Open Pull Request, needs approval SECURITY
Projects
None yet
Development

Successfully merging this pull request may close these issues.