-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add blocking by hostname patterns (--block-hostname) #1532
Conversation
Hey, thanks for making this PR! I've only skimmed it, so this is not a proper PR review, but it looks very good to me from this initial glance! Regarding the linter errors, you can safely add I didn't spot any architectural-level issues, but we'll probably have to delay a full, implementation-details-level, code review at least a week or two, sorry... 😞 We're currently very focused on polishing #1007, merging it in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for that!!!
As @na-- we don't really have the capacity to code review this in full, but I also took 10 minutes to look at it.
I don't see any blockers so far. I left some small comments, and would really like it if we have a bigger
test with:
- more matching hostnames
- a few not matching, preferably ones that are close to what is blocklisted so if
*.k6.io
is blocked .. thatk6.io
isn't and like if we matchtest.k6.io
something.test.k6.io
isn't matched and so on :D
Thanks again for this PR, and hopefully we will be able to code review this in full in two weeks :D
Thanks @na--, @mstoykov for taking time out of what seems to be a hectic handful of weeks to check up on this PR. It's much appreciated!
No worries at all, thank you for taking time to provide even ten minutes' review. Besides, I don't think there's a huge rush on this seeing as it is a >1 year old issue now 👍
I'll do so then. Thanks ^_^
Roger that, I'll add some more tests in with the next commit then. Also, I'll implement all the specific changes in your review up top. Take it easy! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the super late review, and thank you again for making this PR!
I just realized that domains are supposed to be case-insensitive... 😞 So, I think you should just use something like strings.ToLower()
or unicode.ToLower()
everywhere, both when you're adding domains to the blocklist, and when checking domains against it.
Hey, thank you for reviewing the changes and leaving feedback! Great suggestion on the map. I'll modify the implementation of the trie and add those tests over this weekend 👍 |
// UnmarshalJSON forms a HostnameTrie from the provided hostname pattern | ||
// list. | ||
func (t *HostnameTrie) UnmarshalJSON(data []byte) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, you should also implement the MarshalJSON()
method 😞 k6 needs to be able to marshal all of the options in a JSON format when it's making an archive bundle.
Try running k6 archive --archive-out test.tar --block-hostname="*" github.com/loadimpact/k6/samples/http_get.js
and then k6 run test.tar
. You'd get an error like this:
ERRO[0000] json: cannot unmarshal object into Go struct field Options.options.blockHostnames of type []string
this is because the metadata.json
in the archive will have "blockHostnames": {},
in it, which is not the array this code expects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it! Will go ahead and implement this. Is there anything else I've missed in my changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, having MarshalJSON()
that produces an array should take care of the issue, since your code can already deal with arrays (after the envconfig issue is worked around). I think you don't need to use k6 archive
every time though, k6 inspect --block-hostname="*" github.com/loadimpact/k6/samples/http_get.js
should mostly do the same things but give you faster feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or a unit test that makes an archive on the fly and runs it, for example take a look at https://github.com/loadimpact/k6/blob/4c39bcccf606aa3b4324ec5d4f3adfdfcf7d2fa5/js/runner_test.go#L179-L227
Or some of the other tests that make use of Runner.MakeArchive()
and run the result: https://github.com/loadimpact/k6/blob/4c39bcccf606aa3b4324ec5d4f3adfdfcf7d2fa5/js/runner.go#L110-L112
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the latest fixes! This is very close to merging, I just noticed a few remaining issues when I checked out the code and played with the binary compiled from it. Also, can you rebase your branch on the latest k6 master
, there have been some substantial changes there since this was published.
@@ -242,6 +341,9 @@ type Options struct { | |||
// Blacklist IP ranges that tests may not contact. Mainly useful in hosted setups. | |||
BlacklistIPs []*IPNet `json:"blacklistIPs" envconfig:"K6_BLACKLIST_IPS"` | |||
|
|||
// Block hostname patterns that tests may not contact. | |||
BlockedHostnames *HostnameTrie `json:"blockHostnames" envconfig:"K6_BLOCK_HOSTNAMES"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is somewhat problematic. Because of a bug in the envconfig
library, it initializes BlockedHostnames
, so the if opts.BlockedHostnames != nil
check in Apply()
will always be true. So, you're never able to set the blocked hostnames in the JS script, because Apply
will always overwrite them, even if it's with an empty list.
Try running this script with k6:
import http from "k6/http";
export let options = {
throw: true,
blockHostnames: ["*.io"],
};
export default function () {
http.get("https://test.k6.io/");
}
k6 will not block the domain and the request will happen! 😞 The proper fix for this would be to replace the envconfig library with something saner, but that's a huge refactoring task we've been postponing for a long time... 😭 For now, maybe we shouldn't use a pointer here, and instead have the pointer HostnameTrie
struct? Or, maybe remove envconfig:"K6_BLOCK_HOSTNAMES"
and add support for environment variables whenever we fix envconfig?
See kelseyhightower/envconfig#113 and #1560 and other envconfig issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
envconfig strikes again. I will experiment with best options here since having configuration through environment variables is desirable for this sort of thing. Worst comes to worst, I will remove the envconfig support.
// HostnameTrie is a tree-structured list of hostname matches with support | ||
// for wildcards exclusively at the start of the pattern. Items may only | ||
// be inserted and searched. Internationalized hostnames are valid. | ||
type HostnameTrie struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that you don't have other properties in this struct, you might be able to have something like type HostnameTrie map[rune]HostnameTrie
, which might resolve the envconfig issue I posted below. Though I'm not sure if this wouldn't cause other issues 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried doing something similar when I was working on 372f140 and ran into some issues. Will try again in the next commit 👍
Codecov Report
@@ Coverage Diff @@
## master #1532 +/- ##
==========================================
- Coverage 75.23% 74.86% -0.37%
==========================================
Files 150 164 +14
Lines 10924 14027 +3103
==========================================
+ Hits 8219 10502 +2283
- Misses 2238 2993 +755
- Partials 467 532 +65
Continue to review full report at Codecov.
|
} | ||
|
||
if _, wild := t.children['*']; wild { | ||
return "*", true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to allow "*" as a valid pattern? Right now this blocks any requests from being made, which doesn't seem useful or necessary. It also blocks requests to IPs, which seems confusing:
WARN[0000] Request Failed error="Get \"http://127.0.0.1:8000/\": hostname (127.0.0.1) is in a blocked pattern (*)"
I guess it would be useful to block IP ranges, but that seems out of scope for this PR, and it steps on the current K6_BLACKLIST_IPS
option, so that would need addressing.
So we should probably check whether host
is an IP in dialer.go
, and skip hostname-based blocking if it is. This will have a minor conflict with #1489, but it should be easily resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to allow "*" as a valid pattern? Right now this blocks any requests from being made, which doesn't seem useful or necessary.
I suppose it would be useless - I sought to keep the implementation as literally as possible to what was discussed in #972. I can change this, if deemed necessary.
It also blocks requests to IPs, which seems confusing ... I guess it would be useful to block IP ranges, but that seems out of scope for this PR, and it steps on the current K6_BLACKLIST_IPS option, so that would need addressing.
I agree. I see a few potential solutions to this problem:
- Distinguish the dialing process for hostnames and IPs
- Allow this option to subsume IP blocking
- Resolve all hostnames to their IPs when reading options, rather than blocking hostnames directly on-dial.
The latter seems more appealing in my opinion, but that would fail to address the naive blocking method currently employed. Another issue I see is that any of these solutions, the two options will end up stepping on top of one another, like you said. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 3
is not possible really, as hostnames can have multiple IPs and they can change over the course of the test. While currently, k6 caches resolved IPs indefinitely, this is going to change (actually pretty soon if everything works all right ;) ) .
I do think that 1
is the correct way, and I also think that maybe we can leave it for ... after you merge it and we merge #1489 . As far as I can see this will have a slight performance impact, but nothing big.
The other problem I propose is fixed before the actual trie implementation, but in the config checking - we just don't let the pattern to be .. just *
. Maybe we even should require it to be at least 4 5 characters as I am pretty sure there are no longer any 1 letter domains ... actually they are apparently reserved
This is an implementation of the
--block-hostname
option as described in #972.It provides:
.
K6_BLOCK_HOSTNAMES
, the JSON fieldblockHostnames
, and the CLI option--block-hostname
Implementation details
The trie is implemented as typename
HostnameTrie
in lib/options.go. It is designed to handle internationalized hostnames and the aforementioned limited wildcard syntax.The only thing notable is that a rune is held at each node instead of a byte to address internationalized hostnames.
The provided interface for the type is:
(*HostnameTrie) Insert(s string) error
(*HostnameTrie) Contains(s string) (bool, string)
(*HostnameTrie) UnmarshalJSON([]byte) error
(*HostnameTrie) UnmarshalText([]byte) error
A pointer to this type is passed to the Dialer as BlockedHostnames and to the Options as the same.
golintci warnings
There are two golintci warnings in this PR, being a line that's too long in cmd/options.go and use of a global variable in lib/options.go.
The long line is just the description of the option. I figured it wasn't too big a deal since all option descriptions were about as long.
The global variable I would like to remove, but eliminating it would mean either:
So it might be a necessary evil.
Other notes
Since the interface for inserting and checking strings against the hostname trie lends itself to using the string type, but the trie uses runes, type conversions are carried out on function call. This makes the code less elegant, but the interface cleaner.
This PR closes #972.