-
Notifications
You must be signed in to change notification settings - Fork 672
Conversation
I have just tested this and can confirm that it fixes #494. However, I am somehow seeing much slower responses once in a while, could this by any chance introduce performance regression that could cause a few seconds of delay to some lookups? |
I have just redone the same test from scratch with one machine with clean WeaveDNS log.
|
To me it seems like reverse queries take place during regular lookup... |
I can also confirm that repeating the same 5 quicker with the currently published version (b00be09) has no such delay.
|
I will double check the patch, looking for the delay @errordeveloper noticed... |
@errordeveloper I've been trying to find that delay but I have not been able to reproduce it. For a name registered in a peer like
and I do not see anything wrong in the logs. Could you repeat the tests with dig instead of nslookup? nslookup uses its own internal DNS client library, and that could lead to unexpected results... Anyway, I have updated the PR with some cleanups in the code, specially in the cache... |
Looks like it's more specific to busybox implementation of nslookup, which is quite weird. |
fine by me |
I will double check a couple and things anyway, just in case I'm missing something... |
@inercia what's the plan for the test tool - should it not live in the weave repo, particularly now we're making progress on CI? |
@awh My plan was to integrate this tool in some CI job, but it was developed in a rush and I took a short path: I used Mininet for creating virtual hosts and, even when I was trying to avoid Docker so I could run it in a hosted CI service, I didn't realize that Mininet requires OpenVSwitch and OpenVSwitch can not be installed either. So we could either a) update the tool for launching WeaveDNS instances while controlling their binding ports and so, or b) move to a CI service that allows Docker instances and launch WeaveDNS in a container... @errordeveloper If you could test the latest version, that would be great. Could you also look for possible patterns in the delays you observed? For example, if they happen every 30secs, they could be related to cache expiration... |
@inercia my gut tells me it's just that version of nslookup doing an additional reverse lookup... |
There is a lot of change in this commit, the (vast) majority of which does not address #494! After a careful reading I discern the following logical changes:
Apologies if I've misread, it was quite difficult to unpick everything! I would have broken these out as separate commits, each commented as per my bullets above, if for no other reason than reviewing it was extremely difficult; arguably most of it belongs in one or more separate PRs anyway. Finally (and this is a general comment, to be addressed in the longer term perhaps), I'm not a great fan of passing a |
@awh Apologies if you found the PR a bit messy. The PR evolved from a simple fix to a general cleanup in the code, trying to remove some parts we do not plan to use in the near future, as well as some internal API improvements. The fix is limited to the fact of using different lookups sets depending on the value stored in the cache. And I would consider enhancements those unreported bugs you mentioned. For example, it is not imperative to set the authoritative flag in our mDNS responses (in fact, we have never done that), but I think it would more appropriate. I would rather not add typing to cache entries: I think that would involve introspection for every cache access and (I'm not an expert in Go's implementation but) that would probably have an unnecessary cost. I prefer the nil response (as the response is... hmmm... nil!) and a specific TTL. Please let me know if you find particular issues you consider not appropriate for this PR, and please assign comments to specific code lines, so they disappear if the line is modified or disappears... |
Would be great if we can pinpoint what exactly the problem was... |
Thanks a lot for testing this, @errordeveloper. As you said, it would be nice to know what was exactly the problem with nslookup, but we can live with that... In the (near) future, I will improve our integration tests for DNS, so this kind of bugs should be easier to catch... Anyway, I will split this PR in several smaller PRs, as @awh and @rade suggested... |
…he value stored in the cache.
Yes, a single 2nd PR should do. For now at least. |
Ok, I'll do that... |
Fixes #494 by using a different lookups set depending on the value stored in the cache.
PS: these changes have been tested with this new WeaveDNS test tool.