Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(plugin): Add new plugin ua-restriction for bot spider restriction #4587

Merged
merged 14 commits into from
Jul 21, 2021
Merged

feat(plugin): Add new plugin ua-restriction for bot spider restriction #4587

merged 14 commits into from
Jul 21, 2021

Conversation

arthur-zhang
Copy link
Contributor

@arthur-zhang arthur-zhang commented Jul 12, 2021

What this PR does / why we need it:

Support bot spider restriction, eg Baidu spider、go-httpclient etc

Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)

Pre-submission checklist:

  • Did you explain what problem does this PR solve? Or what new features have been added?
  • Have you added corresponding test cases?
  • Have you modified the corresponding document?
  • Is this PR backward compatible? If it is not backward compatible, please discuss on the mailing list first

conf/config-default.yaml Outdated Show resolved Hide resolved
additionalProperties = false,
}

local plugin_name = "bot-restriction"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bot-restriction is confusing. It just checks the UA. What about renaming it to ua-restriction?

Copy link
Contributor Author

@arthur-zhang arthur-zhang Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so,this plugin is for spider detection,and include most common spider ua.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A real bot-detection in the industry area is not just spider detection and UA check.

Copy link
Contributor Author

@arthur-zhang arthur-zhang Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plugin is for the common BaiduSpider、360Spider and some dev tools detection. We have to use the product of professional security company to do the feature you mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found similar function in krakend

https://www.krakend.io/docs/throttling/botdetector/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't believe the nonsense bullshit. This way can only kick out script boys. But for the real hacker, checking the UA is definitely not enough.

We have to use the product of professional security company to do the feature you mentioned.

That's it. A real bot detection system should be as professional as them instead of just doing UA checks and declaring this solves the problem. People will laugh at APISIX. We should provide a mechanism that the professional security company can use to build a gateway, but not declare we are a security gateway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it more suitable to change plugin name to ua-restriction and remove the hard-coded ua list?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

}

-- List taken from https://github.com/ua-parser/uap-core/blob/master/regexes.yaml
local well_known_bots = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not hard code the UA list, as it could not be updated in time. It would be better to provide a mechanism but not the tool to check the UA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most spider bot UA contains “bot” or “spider” or “crawler” or dev http client, the regex expression covers most common case,it will not update very frequently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it will be updated, isn't it? Better to require the user to choose their list instead of shipping a stale one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User can update the list in whitelist or blacklist configuration to support the ua not listed in our package

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why ship a stale one and ask the user to update it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can support other plugin for client restrction like ua, ip, or other infomartion. This plugin is just for users not want to add bunch of ua regex rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plugin is only for simplify usage.
If user do not want to use the plugin, they can use other restriction plugin and modify the rules the want.

}
}

location /disable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the /disable is unused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



=== TEST 6: set blacklist

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blank line can be removed?



=== TEST 7: hit route and user-agent in blacklist

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

end
-- ignore multiple instances of request headers
if type(user_agent) == "table" then
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ignore the UA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this corner case is that the user-agent become table when send multiple user-agent. Almost all the bot or http-client will not send request like this,i think we ignore it is a better choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if they send other UA, the check can be bypassed? This is not a good idea, especially in an open source project...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will check the table

end
local match, err = lrucache_useragent(user_agent, conf, match_user_agent, user_agent, conf)
if err then
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to log the err?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

@spacewander spacewander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide a mechanism but not the tool to check the UA

type = "array",
minItems = 1
},
blacklist = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about allowlist and blocklist, we should avoid using these sensitive words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

| --------- | ------------- | ----------- | ------- | ----- | ---------------------------------------- |
| allowlist | array[string] | optional | | | List of User-Agent of allowlist. |
| denylist | array[string] | optional | | | List of User-Agent of denylist. |
| message | string | optional | Not allowed. | [1, 1024] | Message of deny reason. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clarify that [1, 1024] is the length.

for _, v in ipairs(user_agent) do
if type(v) == "string" then
match = lrucache_useragent(v, conf, match_user_agent, v, conf)
if match > MATCH_ALLOW then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use match > MATCH_DENY?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this check should be MATCH_ALLOW, if ua in deny list, the result should be MATCH_DENY, otherwise, the result is MATCH_ALLOW or MATCH_NONE

{required = {"allowlist"}},
{required = {"denylist"}},
},
minProperties = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minProperties is useless?

@spacewander
Copy link
Member

@arthur-zhang
Would you like to update the PR? Thanks!

@spacewander spacewander added the wait for update wait for the author's response in this issue/PR label Jul 19, 2021
@arthur-zhang
Copy link
Contributor Author

@arthur-zhang
Would you like to update the PR? Thanks!

I will update soon.




=== TEST 16: hit route and user-agent in both allowlist and denylist, part 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
=== TEST 16: hit route and user-agent in both allowlist and denylist, part 1
=== TEST 16: hit route and user-agent in both allowlist and denylist, part 1
Suggested change
=== TEST 16: hit route and user-agent in both allowlist and denylist, part 1
=== TEST 16: hit route and user-agent in both allowlist and denylist, pass(part 1)




=== TEST 17: hit route and user-agent in both allowlist and denylist, part 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"denylist": [
"foo"
],
"disable": true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does disable mean here? I don't see it in the plugin properties doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disable is built-in properties of plugin schema

@spacewander spacewander removed the wait for update wait for the author's response in this issue/PR label Jul 20, 2021
@spacewander
Copy link
Member

Let's merge master to make CI pass

@arthur-zhang
Copy link
Contributor Author

Seems CI system is not working

@spacewander
Copy link
Member

Err, since the server is changed again...

@arthur-zhang
Copy link
Contributor Author

Err, since the server is changed again...

image

apisix/plugins/ua-restriction.lua Show resolved Hide resolved

## Name

The `ua-restriction` can restrict access to a Service or a Route by either `allowlist` or `denylist` `User-Agent`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken sentence? What does the "by either allowlist or denylist User-Agent" mean?


| Name | Type | Requirement | Default | Valid | Description |
| --------- | ------------- | ----------- | ------- | ----- | ---------------------------------------- |
| allowlist | array[string] | optional | | | List of User-Agent of allowlist. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| allowlist | array[string] | optional | | | List of User-Agent of allowlist. |
| allowlist | array[string] | optional | | | A list of allowed `User-Agent` headers . |

| Name | Type | Requirement | Default | Valid | Description |
| --------- | ------------- | ----------- | ------- | ----- | ---------------------------------------- |
| allowlist | array[string] | optional | | | List of User-Agent of allowlist. |
| denylist | array[string] | optional | | | List of User-Agent of denylist. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| denylist | array[string] | optional | | | List of User-Agent of denylist. |
| denylist | array[string] | optional | | | A list of denied `User-Agent` headers. |

...
```

Requests from bot User-Agent:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Requests from bot User-Agent:
Requests with the bot User-Agent:

t/plugin/ua-restriction.t Show resolved Hide resolved
t/plugin/ua-restriction.t Show resolved Hide resolved



=== TEST 8: hit route and user-agent in denylist with multiple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With multiple what?

t/plugin/ua-restriction.t Show resolved Hide resolved
@spacewander spacewander changed the title feat(plugin): Add new plugin bot-restriction for bot spider restriction feat(plugin): Add new plugin ua-restriction for bot spider restriction Jul 21, 2021
@spacewander spacewander merged commit 71bc27c into apache:master Jul 21, 2021
@arthur-zhang arthur-zhang deleted the feat-bot-detection branch July 21, 2021 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants