Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"state" should be something else than "ok" then served-serial and commit-serial are not equal #419

Open
pettai opened this issue Jan 6, 2025 · 8 comments
Assignees

Comments

@pettai
Copy link

pettai commented Jan 6, 2025

It's a NSD behavior follow-up on #417

nsd-control zonestatus reports zone state: ok even thought served-serial is several updates away from current commit-serial.

Example https://github.com/NLnetLabs/nsd/issues/417

Generally, perhaps it shouldn't be regarded as a fault state (since the new zone data is suppose to be installed, but currently hasn't been that yet).
So perhaps something like state: old-serial or similar would be an indication that the zone currently isn't on par with the latest zone NSD knows about.

This would make it much simpler to discover then the NSD updating process isn't working as expected (as per #417)

@pettai pettai changed the title state should something else than "ok" then served-serial and commit-serial are not equal "state" should report something else than "ok" then served-serial and commit-serial are not equal Jan 6, 2025
@pettai pettai changed the title "state" should report something else than "ok" then served-serial and commit-serial are not equal "state" should be something else than "ok" then served-serial and commit-serial are not equal Jan 6, 2025
@wtoorop wtoorop self-assigned this Jan 7, 2025
@wtoorop
Copy link
Member

wtoorop commented Jan 7, 2025

Good idea. I'll implement that shortly.

wtoorop added a commit that referenced this issue Jan 7, 2025
As requested in Issue #419
"old-serial" is printed as state with nsd-control when the served serial is older than the one received by transfer.
Also, state "future-serial" is printed if the served serial is newer than the one received by transfer.
@pettai
Copy link
Author

pettai commented Jan 8, 2025

your fix worked for us 👍

@pettai pettai closed this as completed Jan 9, 2025
@pettai pettai reopened this Jan 13, 2025
@pettai
Copy link
Author

pettai commented Jan 13, 2025

Apparently, this can also happen for catzones too:

root@sunic:~# nsd-control zonestatus catz.catalog.
zone:	catz.catalog.
	catalog: consumer
	state: ok
	served-serial: "1734425182 since 2025-01-13T08:45:02"
	commit-serial: none
	wait: "80155 sec between attempts"

This happened after a restart of nsd due to the previous bug.
And none of the catz-zones where served at this state either.

After a manual nsd-control transfer catz.catalog. the catz-zones started to be served.

@wtoorop
Copy link
Member

wtoorop commented Jan 13, 2025

Ack, served-serial: none and commit-serial: something is okay, but not the other way around. But there are some retries implemented already. I'll have to consider this...

@pettai
Copy link
Author

pettai commented Jan 13, 2025

If I was unclear, the issue was not just commit-serial: none, but the fact that NSD didn't serve any of the catzones of the catalog zone. First after I did nsd-control transfer catz.catalog. (no serial update) all the catalog zones got loaded and served immediately.
I'm guessing that catz.catalog. was loaded from disk upon the restart of NSD, but none of the zones from the catalog zone where loaded. (Note the comment after catalog: consumer that wasn't present after the NSD restart)

zone:	catz.catalog.
	catalog: consumer (serial: 1734425182, # members: 414)
	state: ok
	served-serial: "1734425182 since 2025-01-13T19:45:01"
	commit-serial: "1734425182 since 2025-01-13T19:45:01"
	wait: "80100 sec between attempts"

No logs mentioned anything about the catz.catalog zone where present either.
Usually we see:

Jan 13 09:04:19 sunic nsd[3443194]: zone swupki.se read with success

But nothing happened with the catz.catalog. zone until I manually started a transfer of it...

Jan 13 09:14:22 sunic nsd[3443189]: control cmd:  transfer catz.catalog.
Jan 13 09:14:22 sunic nsd[3443189]: xfrd: zone catz.catalog. committed "received update to serial 1734425182 at 2025-01-13T09:14:22 from 192.71.XX.YYY"
Jan 13 09:14:22 sunic nsd[3443194]: zone catz.catalog. received update to serial 1734425182 at 2025-01-13T09:14:22 from 192.71.XX.YYY of 21975 bytes in 0.000623 seconds
Jan 13 09:14:26 sunic nsd[3443189]: zone catz.catalog. received update to serial 1734425182 at 2025-01-13T09:14:22 from 192.71.XX.YYY of 21975 bytes in 0.000623 seconds
Jan 13 09:14:26 sunic nsd[3443189]: Adding '00265b000a898dab.zones.catz.catalog.' PTR 'example.se'
[...]

I realize this issue might only be touching the state: ok issue.
But I wanted to bring it up now that we saw it, and it's nsd-control zonestatus reported this as ok, yet NSD didn't work as expected.

@wtoorop
Copy link
Member

wtoorop commented Jan 23, 2025

I realize this issue might only be touching the state: ok issue.
But I wanted to bring it up now that we saw it, and it's nsd-control zonestatus reported this as ok, yet NSD didn't work as expected.

That is peculiar. I tried this myself, but with me it did start to add the zones, even though the catalog zone itself was read from disk:

[2025-01-23 15:34:25.231] nsd[411416]: notice: nsd starting (NSD 4.11.2)
[2025-01-23 15:34:25.235] main[411417]: info: zone catalog1.invalid read with success
[2025-01-23 15:34:25.252] main[411417]: notice: nsd started (NSD 4.11.2), pid 411416
[2025-01-23 15:34:25.253] xfrd[411416]: info: zone catalog1.invalid read with success
[2025-01-23 15:34:25.253] xfrd[411416]: info: Adding '0270267e.zones.catalog1.invalid.' PTR 'zone105.invalid'

I did notice however that all the zones in the catalog are then freshly transferred from the primary, even though they did have zone files on disk, and even though their state was properly recorded in the xfrd.state file. This needs fixing, but I'd rather create a new issue for it.

May I ask what the refresh, retry and expire timer values of the catz.catalog. SOA record are?

@pettai
Copy link
Author

pettai commented Jan 24, 2025

May I ask what the refresh, retry and expire timer values of the catz.catalog. SOA record are?

It seems to be what knot creates/generates by default:

; zone catz.catalog. written by NSD 4.11.1 on Mon Jan 20 18:47:01 2025
; received update to serial 1734425182 at 2025-01-20T18:45:01 from xx.yyy.zzz.aaa TSIG verified with key XXX
$ORIGIN catalog.
catz	0	IN	SOA	invalid. invalid. (
		1734425182 3600 600 2147483646 0 )
	0	IN	NS	invalid.
$ORIGIN catz.catalog.
version	0	IN	TXT	"2"
$ORIGIN zones.catz.catalog.
[...]

Should I open a separate issue for this then? I can cut&paste the stuff over to it
(Note that, not all NSD nodes have problem during restarts, most of them will load all catz -zones)

@wtoorop
Copy link
Member

wtoorop commented Jan 25, 2025

Should I open a separate issue for this then? I can cut&paste the stuff over to it
(Note that, not all NSD nodes have problem during restarts, most of them will load all catz -zones)

Yes, I would prefer that, then I can merge PR #420

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants