-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(plugin): Add new plugin ua-restriction for bot spider restriction #4587
Conversation
apisix/plugins/bot-restriction.lua
Outdated
additionalProperties = false, | ||
} | ||
|
||
local plugin_name = "bot-restriction" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bot-restriction is confusing. It just checks the UA. What about renaming it to ua-restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think so,this plugin is for spider detection,and include most common spider ua.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A real bot-detection in the industry area is not just spider detection and UA check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plugin is for the common BaiduSpider、360Spider and some dev tools detection. We have to use the product of professional security company to do the feature you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found similar function in krakend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't believe the nonsense bullshit. This way can only kick out script boys. But for the real hacker, checking the UA is definitely not enough.
We have to use the product of professional security company to do the feature you mentioned.
That's it. A real bot detection system should be as professional as them instead of just doing UA checks and declaring this solves the problem. People will laugh at APISIX. We should provide a mechanism that the professional security company can use to build a gateway, but not declare we are a security gateway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it more suitable to change plugin name to ua-restriction and remove the hard-coded ua list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
apisix/plugins/bot-restriction.lua
Outdated
} | ||
|
||
-- List taken from https://github.com/ua-parser/uap-core/blob/master/regexes.yaml | ||
local well_known_bots = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not hard code the UA list, as it could not be updated in time. It would be better to provide a mechanism but not the tool to check the UA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most spider bot UA contains “bot” or “spider” or “crawler” or dev http client
, the regex expression covers most common case,it will not update very frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it will be updated, isn't it? Better to require the user to choose their list instead of shipping a stale one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User can update the list in whitelist or blacklist configuration to support the ua not listed in our package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why ship a stale one and ask the user to update it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can support other plugin for client restrction like ua, ip, or other infomartion. This plugin is just for users not want to add bunch of ua regex rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This plugin is only for simplify usage.
If user do not want to use the plugin, they can use other restriction plugin and modify the rules the want.
t/plugin/bot-restriction.t
Outdated
} | ||
} | ||
|
||
location /disable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the /disable is unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
t/plugin/bot-restriction.t
Outdated
|
||
|
||
=== TEST 6: set blacklist | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blank line can be removed?
t/plugin/bot-restriction.t
Outdated
|
||
|
||
=== TEST 7: hit route and user-agent in blacklist | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
apisix/plugins/bot-restriction.lua
Outdated
end | ||
-- ignore multiple instances of request headers | ||
if type(user_agent) == "table" then | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ignore the UA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this corner case is that the user-agent become table when send multiple user-agent. Almost all the bot or http-client will not send request like this,i think we ignore it is a better choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if they send other UA, the check can be bypassed? This is not a good idea, especially in an open source project...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will check the table
apisix/plugins/bot-restriction.lua
Outdated
end | ||
local match, err = lrucache_useragent(user_agent, conf, match_user_agent, user_agent, conf) | ||
if err then | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to log the err?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide a mechanism but not the tool to check the UA
apisix/plugins/bot-restriction.lua
Outdated
type = "array", | ||
minItems = 1 | ||
}, | ||
blacklist = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about allowlist
and blocklist
, we should avoid using these sensitive words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| --------- | ------------- | ----------- | ------- | ----- | ---------------------------------------- | | ||
| allowlist | array[string] | optional | | | List of User-Agent of allowlist. | | ||
| denylist | array[string] | optional | | | List of User-Agent of denylist. | | ||
| message | string | optional | Not allowed. | [1, 1024] | Message of deny reason. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should clarify that [1, 1024]
is the length.
for _, v in ipairs(user_agent) do | ||
if type(v) == "string" then | ||
match = lrucache_useragent(v, conf, match_user_agent, v, conf) | ||
if match > MATCH_ALLOW then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use match > MATCH_DENY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this check should be MATCH_ALLOW, if ua in deny list, the result should be MATCH_DENY, otherwise, the result is MATCH_ALLOW or MATCH_NONE
apisix/plugins/ua-restriction.lua
Outdated
{required = {"allowlist"}}, | ||
{required = {"denylist"}}, | ||
}, | ||
minProperties = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minProperties
is useless?
@arthur-zhang |
I will update soon. |
t/plugin/ua-restriction.t
Outdated
|
||
|
||
|
||
=== TEST 16: hit route and user-agent in both allowlist and denylist, part 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=== TEST 16: hit route and user-agent in both allowlist and denylist, part 1 | |
=== TEST 16: hit route and user-agent in both allowlist and denylist, part 1 |
=== TEST 16: hit route and user-agent in both allowlist and denylist, part 1 | |
=== TEST 16: hit route and user-agent in both allowlist and denylist, pass(part 1) |
t/plugin/ua-restriction.t
Outdated
|
||
|
||
|
||
=== TEST 17: hit route and user-agent in both allowlist and denylist, part 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"denylist": [ | ||
"foo" | ||
], | ||
"disable": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does disable
mean here? I don't see it in the plugin properties doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disable is built-in properties of plugin schema
Let's merge master to make CI pass |
Seems CI system is not working |
Err, since the server is changed again... |
|
||
## Name | ||
|
||
The `ua-restriction` can restrict access to a Service or a Route by either `allowlist` or `denylist` `User-Agent`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broken sentence? What does the "by either allowlist
or denylist
User-Agent
" mean?
|
||
| Name | Type | Requirement | Default | Valid | Description | | ||
| --------- | ------------- | ----------- | ------- | ----- | ---------------------------------------- | | ||
| allowlist | array[string] | optional | | | List of User-Agent of allowlist. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| allowlist | array[string] | optional | | | List of User-Agent of allowlist. | | |
| allowlist | array[string] | optional | | | A list of allowed `User-Agent` headers . | |
| Name | Type | Requirement | Default | Valid | Description | | ||
| --------- | ------------- | ----------- | ------- | ----- | ---------------------------------------- | | ||
| allowlist | array[string] | optional | | | List of User-Agent of allowlist. | | ||
| denylist | array[string] | optional | | | List of User-Agent of denylist. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| denylist | array[string] | optional | | | List of User-Agent of denylist. | | |
| denylist | array[string] | optional | | | A list of denied `User-Agent` headers. | |
... | ||
``` | ||
|
||
Requests from bot User-Agent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requests from bot User-Agent: | |
Requests with the bot User-Agent: |
t/plugin/ua-restriction.t
Outdated
|
||
|
||
|
||
=== TEST 8: hit route and user-agent in denylist with multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With multiple what?
What this PR does / why we need it:
Support bot spider restriction, eg Baidu spider、go-httpclient etc
Pre-submission checklist: