-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(resty.dns.client): fix the UDP sockets leak #10691
Conversation
82f6e1c
to
6e1abe8
Compare
kong/resty/dns/client.lua
Outdated
@@ -702,6 +702,14 @@ local function parseAnswer(qname, qtype, answers, try_list) | |||
return true | |||
end | |||
|
|||
local function close_socks(resolver) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have an upstream PR for this? if not can we send one (for lua-resty-dns
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no upstream PR committed yet, actually I think it might not be lua-resty-dns
s business, the sockets could be destroyed in 2.8, and I don't know why not being destroyed in 3.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, then maybe we should look for the root-cause first, before fixing it.
are we keeping them cached somewhere? such that they stay alive instead of getting GC'ed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also; an upstream PR for always closing the sockets would be a sanity item imo. So would still make sesne to me.
since we're now fiddling with that libs internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems not being cached somewhere, the resolver is created in ngx.timer
, I'm not sure if the timer matters. However the logic of dns queries is similar between in 2.x and 3.x, so I have no idea why it just happens in 3.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ADD-SP and I did further investigations and found that this is caused by timer-ng lib. After replacing the timer-ng with the native timer to schedule the dns query, this issue never happens. I'm not sure but it seems like there are some differences between timer-ng and native time over garbage collection.
Even if the issue is caused by timer-ng, it still makes sense to me that the sockets should be released explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very strange, we try to make timer-ng run full GC after each handler and this issue does not appear.
So there is no leak, but it looks like the GC has slowed down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tieske I committed an upstream PR: openresty/lua-resty-dns#64
Can we just take this PR as a workaround so that this issue could be fixed quickly? If the PR for resty-dns is released, we can revert this PR and make a better one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on OpenResty's claim at here, there are three ways to release socket objects:
- Call the
close()
method explicitly. - Released by the Lua GC.
- Current client HTTP request finishes processing.
Since lua-resty-dns does not explicitly close socket objects or expose the close()
method to users, the relevant socket objects can only be released through methods 2 and 3.
The reason why the native timer is not problematic is that once the timer ends, the fake request is considered finished (effectively fulfilling the condition of method 3). However, timer_ng reuses the existing ngx_timer to execute tasks, so the expected condition of method 3 did not occur and it was only dependent on method 2. I believe this is why those UDP socket objects were leaked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just take this PR as a workaround so that this issue could be fixed quickly? If the PR for resty-dns is released, we can revert this PR and make a better one.
Yes, that'd work. Please create an item to track this. eg a PR based on the proposed changes in resty-dns, or an internal Jira ticket, whatever works for you.
f711900
to
f238569
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Thijs mentioned it seems more appropriate to add a new close()
method to the upstream project lua-resty-dns, and then we call it here.
But overall this looks great; nice test BTW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I saw an upstream PR already out there, and this is just a temporary solution as it will switch to calling the convenient close function after the upstream one merged and released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the better way is write a patch to lua-resty-dns
until they merge our fix and release a new version.
I don't want to do something in the core that should be done by an external dependency, which would reduce maintainability.
@@ -145,5 +145,60 @@ for _, strategy in helpers.each_strategy() do | |||
assert.response(r).has.status(503) | |||
end) | |||
end) | |||
describe("udp sockets", function() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, we should write tests for all fixes, but this issue is caused by third-party dependencies and I'm not sure how we should do it.
I think it's too heavy for introducing an integration test for this purpose, I'd prefer to use unit tests to cover it.
The perfect thing to do would be to have lua-resty-dns
cover the issue, but until then I think it would be better to use unit tests instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that even if we change this pr to a patch, the test case is still coverable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the issue can be covered, but does it make more sense to move the tests to unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that makes more sense, just a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is NOT A BLOCKER for this PR.
@Tieske FYI, we plan to write a patch to the |
makes sense to me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec/02-integration/05-proxy/05-dns_spec.lua
If someone who has no context for this issue read this test, it is hard to understand why we write this test.
So could you please add some comments to explain more?
df98961
to
524a86e
Compare
build/openresty/patches/lua-resty-dns-0.22_01-destory_resolver.patch
Outdated
Show resolved
Hide resolved
My 2cts: the upstream is moving slowly, so the patch is going to exist for quite a while probably. I personally find patches cumbersome to maintain, and the code is less clear, because the code is not in-line. You already need to know the patch exists to be able to reason about what you actually read in the code. That said; it's not my call, so feel free to ignore. |
In `resty.dns.client` module a resolver is created for domain query. When `resolver:new()` is called, a UDP socket is created. In 2.x, the UDP socket is released quickly, while in 3.x, it has to take several seconds to release. If the ttl of dns reponses is shorter than the time it waits, the leak will happen. In this PR, a forcible release is added. FTI-4962
Co-authored-by: Thijs Schreijer <thijs@thijsschreijer.nl>
Co-authored-by: Zijing Zhang <50045289+pluveto@users.noreply.github.com>
Co-authored-by: Vinicius Mignot <vinicius.mignot@gmail.com>
e759f88
to
5cf355a
Compare
…cessary (#10783) * fix(loggly&tcp-log&udp-log&reports): close sockets explicitly when necessary For those sockets created inside the schedule of timer-ng, the sockets must be closed explicitly. For more details, please refer to: [#10691](#10691) Co-authored-by: Chrono <chrono_cpp@me.com> Co-authored-by: Jun Ouyang <ouyangjun1999@gmail.com>
#10691) * fix(resty.dns.client): fix the leak of UDP sockets in resty.dns.client In `resty.dns.client` module a resolver is created for domain query. When `resolver:new()` is called, a UDP socket is created. In 2.x, the UDP socket is released quickly, while in 3.x, it has to take several seconds to release. If the ttl of dns reponses is shorter than the time it waits, the leak will happen. In this PR, a forcible release is added. FTI-4962 FTI-5803 --------- Co-authored-by: Thijs Schreijer <thijs@thijsschreijer.nl> Co-authored-by: Zijing Zhang <50045289+pluveto@users.noreply.github.com> Co-authored-by: Vinicius Mignot <vinicius.mignot@gmail.com>
Summary
In
resty.dns.client
module whenresolver:new()
is called, a UDP socket is created. In 2.x, the UDP socket is released quickly, while in 3.x, it has to take several seconds to release. If the ttl of dns responses is shorter than the time it waits, the leak will happen. In this PR, a forcible release is added.FTI-4962
Checklist
Full changelog
Issue reference
FTI-4962