-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block Bad Website Bots and Spiders #4860
Conversation
Piped way to add bad crawlers to blocked list, that are slowing and consuming most of the traffic . The other way would be the rDNS
Sorry I'm terrible with git . |
Hi instead of adding it on .htaccess would be better to add it here Line 2288 in b247caf
this function is used for the cache plugin, which checks if the client is a bot. if it is will only serve cached pages, if there is no cache for that page, it serves a blank page. for example, I have customers that use Semrush, if I add that they will have problems for sure. |
Just as an idea: If we wanted to make blocking such bots optional (e.g., adding the ability to configure which bots to block through an admin page or similar), something like this could be done through
..to ensure that a specific file is executed for every request, no matter which PHP file is actually requested. Such a file could run whatever blocking routines are necessary, pulling data from the database, set by the admin page, etc and etc. Possibly some slightly increased overhead when dealing with a small list of bots, but possibly also decreased overhead if the list of bots becomes much larger (since Anyway, just ideas, and would need further exploration regardless. :-) |
Sorry for the late reply . |
Hi do you think worth block bots at all? @Maikuolan I didn't know we can do that. but this may fail for PHP fpm (I guess) maybe a separate script that does not start the session or connect into the database, just to block the undesired connections |
@DanielnetoDotCom
|
the videos metadata are cached, it is required otherwise it may slowdown a lot the site.
Agree
Agree 100%, I just need to think of an option so you can choose what to block, sometimes you may want to allow some boots to access your page |
What about this update? so if you add this in your configuration.php you will stop bots before connecting to the database or open the session
|
adding some more bots
Letting knwo what bot was found
A more complete list maybe
|
Would that match exactly (e.g., like |
We definitely can do that, it is definitely possible, but good point. I'm also not sure how it would play with fpm. Another possible problem is that |
Correct, this is just a general sample, you can be more specific in your stopBotsList |
the way it is implemented now we do not need to modify the .htaccess |
@DanielnetoDotCom Thank you for the effort . I'm testing it out . I removed the 'bot' and 'crawler' , since ( googlebot does contain 'bot' , and yandexCrawler does contain 'crawler' . I wrote directly the names of the bots to block . The legitimate bots does respect the robots.txt fles . This function if useful only against those who ignores it . Thank you ! I'm testing it out . |
Great, feel free to suggest new bots names |
Allow create a whitelist to not block some bots
Ok, now you can create a whitelist to not stop some bots
|
Allow create a whitelist to not block some bots
I am just thinking if this should be the default configuration (enabled by default) for new installations |
Oh , this is great ! |
I want it if it is working good. I need to stop all the bot traffic I can. It is getting bad now. |
Piped way to add bad crawlers to blocked list, that are slowing and consuming most of the traffic .
The other way would be the rDNS