[nsd-users] startup race leading to NOT IMPL on AXFR?

Phil Pennock nsd-users+phil at spodhuis.org
Thu Mar 9 03:25:26 UTC 2017


Folks, running nsd 4.1.14 as the hidden master for the domain
"exim.org", noticed something weird after rebooting for kernel security
updates.  Have checked doc/ChangeLog for 4.1.15 and don't see anything
relevant.

Seems to be a race in startup which can result in returning NOT IMPL?

System is Ubuntu (14.04.5 LTS), has static IPv4 and IPv6 assigned as
fixed values in /etc/network/interfaces but the default route is learnt
via route advertisements.  Because I need to wait for some read-only
bind-mounts constructing the chroot to be available, I have
/etc/init/nsd.override anyway to specify the start-up condition and it's
guarded on `... and net-device-up IFACE!=lo`.

We have unbound listening on localhost:53, so nsd is being given
explicit IPs to bind.

Since the last-reboot-before-today I also have a
`/etc/network/if-up.d/010wait-for-ipv6` which is trying to wait on
IPv6 being configured, as Upstart seems to emit as soon as IPv4 is
ready, before even the statically configured IPv6 is ready.

In previous reboots, nsd would fail to start because it couldn't bind
the addresses for serving, because they weren't yet available.  Because
the wait-script is buggy (I now know what's needed to fix it) we only
got a "bit" of a delay, so the first attempt nsd failed as before, but
the third attempt succeeded: the IPv6 address was present, so nsd could
start.  But routing was not yet present.

[2017-03-09 02:42:02.037] nsd[1081]: notice: nsd starting (NSD 4.1.14)
[2017-03-09 02:42:02.050] nsd[1081]: info: setup SSL certificates
[2017-03-09 02:42:02.277] nsd[1087]: info: zonefile zonefiles/db.exim.org is not modified
[2017-03-09 02:42:02.277] nsd[1087]: info: zonefile zonefiles/db.testdns.exim.org is not modified
[2017-03-09 02:42:02.277] nsd[1087]: info: zonefile zonefiles/unsigned.invalid254.testdns.exim.org is not modified
[2017-03-09 02:42:02.277] nsd[1087]: notice: nsd started (NSD 4.1.14), pid 1081
[2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: sendto 2a02:898:31::53:0 failed Network is unreachable
[2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: zone exim.org: could not send notify #1 to 2a02:898:31::53:0
[2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: sendto 2a02:898:31::53:0 failed Network is unreachable
[2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: zone testdns.exim.org: could not send notify #1 to 2a02:898:31::53:0
[2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: sendto 2a02:898:31::53:0 failed Network is unreachable
[2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: zone invalid254.testdns.exim.org: could not send notify #1 to 2a02:898:31::53:0

Seeing this, I check that things are working with a NOTIFY:

[2017-03-09 02:46:57.186] nsd[1081]: info: new control connection from 127.0.0.1
[2017-03-09 02:46:57.210] nsd[1081]: info: remote control connection authenticated
[2017-03-09 02:46:57.247] nsd[1081]: info: control cmd:  notify testdns.exim.org
[2017-03-09 02:46:57.248] nsd[1081]: info: remote control operation completed
[2017-03-09 02:46:57.745] nsd[3709]: info: axfr for testdns.exim.org. from 2604:a880:800:a1::419:1001
[2017-03-09 02:46:57.908] nsd[3709]: error: failed reading from 2604:a880:800:a1::419:1001 tcp: Connection reset by peer
[2017-03-09 02:46:58.740] nsd[3709]: info: axfr for testdns.exim.org. from 2604:a880:800:a1::419:1001
[2017-03-09 02:46:58.901] nsd[3709]: error: failed reading from 2604:a880:800:a1::419:1001 tcp: Connection reset by peer
[2017-03-09 02:46:59.743] nsd[3709]: info: axfr for testdns.exim.org. from 2604:a880:800:a1::419:1001
[2017-03-09 02:46:59.906] nsd[3709]: error: failed reading from 2604:a880:800:a1::419:1001 tcp: Connection reset by peer

I check that server's logs; it's "NSD 4.1.7" from OS vendor packages,
and it sees:

[2017-03-09 02:46:57.649] nsd[1692]: error: xfrd: zone testdns.exim.org received error code NOT IMPL from [hidden-master]

... and similarly before that for the startup notifies: they did seem to
make it out in the end, but the secondary had been unable to transfer.

I restart nsd 4.1.14 on the hidden master and the problem disappears.


I can't prove the two are connected in causation, but what I see is
"server experiences issues sending on IPv6 at startup" is followed by
"secondary gets NOT IMPL on transfer".

Are they linked?  Is there an issue here?

(Meanwhile, I'm going to defer nsd startup until we also have an IPv6
route).

Thanks,
-Phil



More information about the nsd-users mailing list