[nsd-users] NSD as slave leaves a child process behind

Ville Mattila vmattila at csc.fi
Mon Sep 14 16:25:26 CEST 2009


Hi,

We've run into problems with NSD v3.2.3 (on Red Hate Enterprise Linux
5.4 x86_64) failing to kill one of it's children processes while NSD
is reloading after it has received an update to a zone from master.
Everything seems to running fine, but 'nsdc stop' and 'nsdc patch'
etc. really don't work because the reload updates NSD the pid file but
the child that was left behind really is handling all the queries
and zone updates.

Our NSD is a slave to some thousands of zones and because of the way we
automatically create NSD configuration (we configure NSD to use every
other NS of the zone as master even though typically only one of the other
NS servers of a zone actually allows an AXFR for us), our NSD typically
has n*10 zone transfers in SYN_SENT state destined to fail after connect
timeout and for a lot of AXFR requests our NSD gets REFUSED response.

I'd appreciate help hunting down the cause for this problem;  we need to
get 'nsdc stop && nsdc start' working to be able to restart NSD reliably
after generating new nsd.conf automatically.  (Of course I could write
a script which kills NSD some other way instead of 'nsdc stop' but I'd
prefer fixing NSD/nsdc/our environment.)

-----
Here's how the problem evolves on our systems:

1. Initially (after e.g. server reboot) NSD is running fine.  Three nsd
processes (with default server-count setting), one of which (pid 20925
here) is the parent to the other two (20933 and 20934).  The parent NSD
owns all tcp LISTEN and udp sockets, and child 20933 is the only one
trying to do some AXFRs from masters:

   % fgrep 'nsd[20925]' /var/log/messages
   Sep 14 14:59:50 isar nsd[20925]: nsd started (NSD 3.2.3), pid 20925

   % ps -ef | grep nsd.conf
   nsd      20925     1 56 14:59 ?        00:00:04 /usr/sbin/nsd -c /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
   nsd      20933 20925 11 14:59 ?        00:00:00 /usr/sbin/nsd -c /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
   nsd      20934 20925  1 14:59 ?        00:00:00 /usr/sbin/nsd -c /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf

   % sudo netstat -anp | grep nsd | grep -v -E '(LISTEN|ESTABLISHED)'
   tcp        0      0 128.214.248.137:53          0.0.0.0:*      LISTEN  20925/nsd
   tcp        0      0 193.167.245.84:53           0.0.0.0:*      LISTEN  20925/nsd
   tcp        0      0 195.148.12.100:53           0.0.0.0:*      LISTEN  20925/nsd
   tcp        0      0 2001:708:10:70::55:1:53     :::*           LISTEN  20925/nsd
   tcp        0      0 2001:708:10:55::53:53       :::*           LISTEN  20925/nsd
   udp        0      0 128.214.248.137:53          0.0.0.0:*              20925/nsd
   udp        0      0 193.167.245.84:53           0.0.0.0:*              20925/nsd
   udp        0      0 195.148.12.100:53           0.0.0.0:*              20925/nsd
   udp        0      0 2001:708:10:70::55:1:53     :::*                   20925/nsd
   udp        0      0 2001:708:10:55::53:53       :::*                   20925/nsd

   % sudo netstat -anp -A inet,inet6 | grep nsd | grep SYN_SENT
   (master server IP addresses obfuscated as masterX)
   tcp        0      1 128.214.248.137:57130       masterA:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:53660       masterA:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:37449       masterB:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:50400       masterC:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:48250       masterD:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:35477       masterE:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:34076       masterE:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:48529       masterE:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:48535       masterE:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:35901       masterD:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:37632       masterF:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:47745       masterE:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:32931       masterF:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:49364       masterG:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:37719       masterD:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:36117       masterF:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:37209       masterD:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:45596       masterG:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:60195       masterD:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:54687       masterE:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:54131       masterF:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:60484       masterE:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:55516       masterD:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:50900       masterF:53 SYN_SENT    20933/nsd
   tcp        0      1 128.214.248.137:36627       masterG:53 SYN_SENT    20933/nsd


2. Soon NSD receives an update to one of the zones and reloads, and
updates the pid file:

   (from /var/log/messages:)
   Sep 14 15:01:55 isar nsd[20925]: signal received, reloading...
   Sep 14 15:01:55 isar nsd[21072]: memory recyclebin holds 535424 bytes

   % cat $(nsd-checkconf -o pidfile /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf)
   21072

3. Now observe the child pid 20933 still is running directly under init process (pid 1).
The child 20933 now owns the LISTEN sockets.

   % ps -ef | grep nsd.conf
   nsd      20933     1  0 14:59 ?        00:00:01 /usr/sbin/nsd /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
   nsd      21072     1  2 15:01 ?        00:00:00 /usr/sbin/nsd /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf
   nsd      21073 21072  1 15:01 ?        00:00:00 /usr/sbin/nsd /v/net/ns-secondary.funet.fi/etc/nsd/nsd-isar.conf

   % sudo netstat -anp | grep nsd | grep -v -E '(LISTEN|ESTABLISHED)'
   tcp        0      0 128.214.248.137:53          0.0.0.0:*	LISTEN	20933/nsd
   tcp        0      0 193.167.245.84:53           0.0.0.0:*	LISTEN	20933/nsd
   tcp        0      0 195.148.12.100:53           0.0.0.0:*	LISTEN	20933/nsd
   tcp        0      0 2001:708:10:70::55:1:53     :::*		LISTEN	20933/nsd
   tcp        0      0 2001:708:10:55::53:53       :::*		LISTEN	20933/nsd
   udp        0      0 128.214.248.137:53          0.0.0.0:*		20933/nsd
   udp        0      0 193.167.245.84:53           0.0.0.0:*		20933/nsd
   udp        0      0 195.148.12.100:53           0.0.0.0:*		20933/nsd
   udp        0      0 2001:708:10:70::55:1:53     :::*			20933/nsd
   udp        0      0 2001:708:10:55::53:53       :::*			20933/nsd

Steps 2-4 repeat over and over: process 20933 keeps on running and
responding to DNS queries.

I'm not sure if process 20933 should terminate in reloads but for sure
nsdc cannot find it because nsdc depends on pidfile being correct.
-----

Regards,
-- 
Ville Mattila, System Specialist, Funet network, CSC
PO Box 405, FIN-02101 Espoo, Finland, fax +358 9 457 2302
CSC is the Finnish IT Center for Science, http://www.csc.fi/, email:
ville.mattila at csc.fi


More information about the nsd-users mailing list