Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catz: <defunct> processes then under load & forking #298

Open
pettai opened this issue Oct 3, 2023 · 5 comments
Open

catz: <defunct> processes then under load & forking #298

pettai opened this issue Oct 3, 2023 · 5 comments

Comments

@pettai
Copy link

pettai commented Oct 3, 2023

I'm testing the catalog zones branch #294

I've been running it on smaller name servers and it works apart from earlier reported issue #288.

**UPDATE: This is still present on 4.10.1 running on arm64, but thanks to @wtoorop updated work on catz support, the defunct processes are not as many usually, and they go away pretty quickly **

A new thing I've discovered is that then NSD is put under production load and it forks very frequently due to zone updates, the catalog zones branch of NSD gets processes during the fork-cycles. Example output:

nsd       663232       1  0 10:52 ?        00:00:22 /usr/sbin/nsd -d -P
nsd       663233  663232 81 10:52 ?        00:47:23 /usr/sbin/nsd -d -P
nsd       869581  663233  5 11:51 ?        00:00:00 [nsd: server 1] <defunct>
nsd       869582  663233  5 11:51 ?        00:00:00 [nsd: server 2] <defunct>
nsd       869584  663233  5 11:51 ?        00:00:00 [nsd: server 3] <defunct>
nsd       869585  663233  8 11:51 ?        00:00:00 [nsd: server 4] <defunct>
nsd       869586  663233  6 11:51 ?        00:00:00 [nsd: server 5] <defunct>
nsd       869588  663233  9 11:51 ?        00:00:00 [nsd: server 6] <defunct>
nsd       869589  663233  5 11:51 ?        00:00:00 [nsd: server 7] <defunct>
nsd       869590  663233  5 11:51 ?        00:00:00 [nsd: server 8] <defunct>
nsd       869591  663233  5 11:51 ?        00:00:00 [nsd: server 9] <defunct>
nsd       869592  663233  5 11:51 ?        00:00:00 [nsd: server 10] <defunct>
nsd       869602  663233  5 11:51 ?        00:00:00 [nsd: server 11] <defunct>
nsd       869606  663233 11 11:51 ?        00:00:00 [nsd: server 12] <defunct>
nsd       869607  663233 16 11:51 ?        00:00:00 [nsd: server 13] <defunct>
nsd       869608  663233 10 11:51 ?        00:00:00 [nsd: server 14] <defunct>
nsd       869609  663233 11 11:51 ?        00:00:00 [nsd: server 15] <defunct>
nsd       869610  663233 11 11:51 ?        00:00:00 [nsd: server 16] <defunct>
nsd       869611  663233 11 11:51 ?        00:00:00 [nsd: server 17] <defunct>
nsd       869612  663233 10 11:51 ?        00:00:00 [nsd: server 18] <defunct>
nsd       869613  663233 11 11:51 ?        00:00:00 [nsd: server 19] <defunct>
nsd       869614  663233 11 11:51 ?        00:00:00 [nsd: server 20] <defunct>
nsd       869615  663233 11 11:51 ?        00:00:00 [nsd: server 21] <defunct>
nsd       869616  663233 22 11:51 ?        00:00:00 [nsd: server 22] <defunct>
nsd       869617  663233 10 11:51 ?        00:00:00 [nsd: server 23] <defunct>
nsd       869618  663233 11 11:51 ?        00:00:00 [nsd: server 24] <defunct>
nsd       869619  663233  9 11:51 ?        00:00:00 [nsd: main] <defunct>
nsd       869620  663233  2 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869621  663233  1 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869622  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869623  663233  5 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869624  663233  1 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869625  663233  3 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869626  663233  1 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869627  663233  1 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869628  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869629  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869657  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869671  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869678  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869679  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869680  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869681  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869696  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869699  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869701  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869707  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869710  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869712  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869713  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869714  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869717  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869719  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869720  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869721  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P
nsd       869726  663233  0 11:51 ?        00:00:00 /usr/sbin/nsd -d -P

I can't find any hints in the log then using verbosity 3, but NSD doesn't seem to suffer from an operational POV.
It's siblings (name servers) that are running the "regular" branch haven't showed this behavior at all, so I thought it would be good to mention this for the catalog zones developer(s).

@k0ekk0ek
Copy link

k0ekk0ek commented Oct 3, 2023

@pettai, thank you for taking time to test. It's really appreciated! At this point though, the catalog zone code is broken. I'm in the process of reworking it, but it'll take me some time before I get it to a point where it makes sense to properly test it (there's more edge-cases than you'd expect). I do very much like to work with you on testing the feature once we reach that point. Shall I ping/mention you on the PR at that time?

(I'm sorry I can't do more at this point, but I doubt much of the code will stay the same)

@pettai
Copy link
Author

pettai commented Nov 1, 2023

@k0ekk0ek have you gotten any further with the catz code?

@k0ekk0ek
Copy link

k0ekk0ek commented Nov 1, 2023

Hi @pettai! Yes and no. I was under the impression that I was able to make quick work of this, but identified some issues with the current implementation(1) and the specification(2) itself.

  1. Transfers are handled in nsd, but that can cause race conditions as "dynamic" zones (ones added via nsd-control) are normally added via a task from xfrd (process that is not recycled and tracks zone state). It's much more reliable if catalog zones are handled in a similar way. i.e. the logic behind adding/deleting zones is handled in xfrd and zones are configured in nsd like they are when added via nsd-control.
  2. The specification allows for multiple catalog zones to contain the same member zone which can lead to inconsistencies that cannot be fixed automatically (other implementations have that limitation too) and ended up spending significant time in trying to come up with a solution. The one we chose (for now) is to only support a single catalog zone as a source of authority. That's also more in line with how the configuration files work. i.e. users have a single configuration file (possibly includes, but there's at least a single entity that specifies (order of) parameters for NSD) and the catalog zone is then authoritative, whereas if multiple catalog zones are configured, member zone state spans multiple zones each. IMHO the latter is not in line with how the DNS is designed(?). As a consequence, we will support the group property (mapped to patterns), but not the coo property.

We'd actually appreciate your input on point 2.

Lastly, @wtoorop, has taken up the job.

@pettai
Copy link
Author

pettai commented Nov 1, 2023

feedback regarding 2)
Is sounds sane not to allow multiple catalog zones to feed NSD with the same zone to start with, but how can NSD decide which of the catalog zones that is the authoritative one for a specific zone if it's in two or more different catalog zones?
Some sort of allow/deny filter configuration mechanism on the catalog zone-zones seems appropriate. And I assume that zones from the catalog zone(s) always will have less precedence than zones configured in the configuration file(s).
Regarding the coo -property, I don't see that we would need that now or anytime soon.
(We are just need of basic support for for a few catalog zones to easy zone mgmt for our customers and ourselves...)

@k0ekk0ek
Copy link

k0ekk0ek commented Nov 2, 2023

Configured zones will indeed always have higher precedence.

As for the multiple zones problem, we intend to only allow for a single catalog zone to be configured (for the time being). A filter feels like more work than simply configuring the zones(?) The coo property aims to solve transferring of ownership, but it's incomplete (my opinion). e.g. what happens if we want to transfer ownership back, or what happens if the operator decides he shouldn't have transferred it back at all? My reasoning is that the DNS is hierarchical, multiple sources of authority with conflicting views cannot be merged. At least, not without additional information(?) Requiring the serial to be specified with the member zone might alleviate some of the pain, but I haven't given this nearly enough thought yet.

Of course, it's also entirely possible I'm not seeing the complete picture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants