no response from nsd 2.3.4 with more than 12 server processes

dr. W.C.A. Wijngaards wouter at NLnetLabs.nl
Fri Jun 2 11:10:36 UTC 2006


Hi Jad,

I've tried to reproduce this, and on a AMD linux 2.6.16 system I get
replies using 9 servers, but not with 10 servers. (with the
freshly-released nsd 2.3.5 by the way).

The code does nothing special with the number of servers it forks. Each
server select()s on the port. With 10 servers, none of the servers come
out of select(). With 9 or fewer, one comes out of select() and handles
the udp message and goes back into select().

There is no immediately solution, I cannot unblock select(). Both
Solaris and linux then have this feature.

So, I can reproduce your problem, and I will be looking into it. Entered
as http://www.nlnetlabs.nl/bugs/show_bug.cgi?id=134
Thank you for the report.

As for your problem with killing them off, when you start it doubly,
i.e. you start one when the old one is still running, then NSD detects
the old NSD, and continues to attempt to start as well. It overwrites
the pidfile with its own pid. But then fails because it cannot bind to
the port (it is in use by the old NSD and you start on the same port),
and then it exits and unlinks the pidfile. This removes the pidfile.
I cannot easily fix this, as there is only one pid file, and two NSDs
running.

Thank you for the report,
	Wouter

jad at nominet.org.uk wrote:
> Hi,
> 
> I just thought I would try and load test nsd 2.3.4 on a Sun T2000 running 
> Solaris 10 01/06 but I am having a few problems.
> 
> If I specify the number of servers to be more than 12 I get no response 
> from nsd when I try to query it. With 12 or less servers it works fine. I 
> am specifying the number of servers by editing nsdc and adding -N 12 to 
> the flags line. The server I am using has 32 virtual processors so in the 
> end I want to use -N 32. For example after starting with -N 13, dig gives 
> the following
> 
> dig @localhost jadjadjad.example.com soa
> ; <<>> DiG 9.2.4 <<>> @localhost jadjadjad.example.com soa
> ;; global options:  printcmd
> ;; connection timed out; no servers could be reached
> 
> with -N 12 I get a response.
> 
> Has anyone else tried running nsd with so many servers?
> 
> BTW - I also noticed that if I start nsd using nsdc start and then try and 
> start it again nsdc correctly reports that nsd is already running. However 
> for some reason the pid file gets removed. For example
> bash-3.00# /opt/nsd/sbin/nsdc start
> bash-3.00# ls -l /opt/nsd/etc/nsd/nsd.pid 
> -rw-r--r--   1 nsd      other          6 May 31 15:27 
> /opt/nsd/etc/nsd/nsd.pid
> bash-3.00# /opt/nsd/sbin/nsdc start
> [1149085664] nsd[26337]: warning: nsd is already running as 26320, 
> continuing
> bash-3.00# ls -l /opt/nsd/etc/nsd/nsd.pid 
> /opt/nsd/etc/nsd/nsd.pid: No such file or directory
> bash-3.00# /opt/nsd/sbin/nsdc stop 
> nsd is not running
> bash-3.00# ps -ef | grep nsd
>      nsd 26324 26320   0 15:27:35 ?           0:00 /opt/nsd/sbin/nsd -f 
> /opt/nsd/etc/nsd/nsd.db -P /opt/nsd/etc/nsd/nsd.pid 
>     root 26343 26314   0 15:27:59 pts/2       0:00 grep nsd
>      nsd 26326 26320   0 15:27:35 ?           0:00 /opt/nsd/sbin/nsd -f 
> /opt/nsd/etc/nsd/nsd.db -P /opt/nsd/etc/nsd/nsd.pid 
> ...
> 
> Thanks
> John
> _______________________________________________
> nsd-users mailing list
> nsd-users at NLnetLabs.nl
> http://open.nlnetlabs.nl/mailman/listinfo/nsd-users


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 251 bytes
Desc: OpenPGP digital signature
URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20060602/aef816cb/attachment.bin>


More information about the nsd-users mailing list