[nsd-users] Xfrd scalability problem

Dan Durrer dan at vitalwerks.com
Thu Mar 11 20:33:51 UTC 2010


I only see just the one nsd process in top.   After about a day it started returning responses for the zones. However I tried an update, saw that nsd received the notify and wrote the changes, but still hasn't updated what it's serving just yet. I'll try the patch a little later today when I get some free time.




On Mar 11, 2010, at 10:03 AM, Martin Svec wrote:

> Dan, I think that you hit the same problem with xfrd as me. Which of
> your nsd processes uses 100% cpu? There should be at least three -- a
> main process, a child process, and a xfrd process. Xfrd is usually the
> one that has different RES memory usage in "top" than the others.
> 
> If you want to be sure, try the attached patch. It logs every
> netio_dispatch call with some useful numbers:
> "netio dsp: nnnnnn, sum: xxxx, fds: yyyy, tms: zzzz", where xxxx = total
> number of handlers scanned, yyyy = handlers with a socket, zzzz =
> handlers with a timeout. If your log will be continually populated with
> thousands of these lines having xxxx roughly equal to the number of your
> zones, you met the scalability issue.
> 
> Be careful! Do not try the patch on a server that processes any queries.
> The patch logs at least one line for every dns query too ;-))
> 
> I guess that you run NSD as a slave... If so, could you please send me
> few typical "netio dsp" lines from your log? I'm curious what will be
> the yyyy and zzzz numbers for you.
> 
> Martin
> 
> 
> 
> Dan Durrer napsal(a):
>> I have also been trying to run some tests using 60k+ zones.  I grabbed a very recent snapshot of these zones from bind so there shouldn't be too many zones that need updating.  But its been 30 minutes or more  and all zones seem to be returning servfail.  I see some zone transfer traffic in the logs.  CPU on the nsd process shows 99.9%, w/ 3.3% memory usage.  Centos,   8gb ram, quad core 5500.  Also applied the Memory patch posted earlier this month on 3.2.4.   In bind we use the serial-query-rate option,  the default value is too low for how often our zones change.  Does an option like that exist in nsd?
>> 
>> Any help would be appreciated.  The performance of NSD on a single zone is phenomenal. 112k qps on this hardware.
>> Dan
>> 
>> 
>> On Feb 28, 2010, at 11:30 AM, Martin Švec wrote:
>> 
>> 
>>> Hello again,
>>> 
>>> I think that xfrd daemon suffers a scalability problem with respect to the number of zones. For every zone, xfrd adds a netio_handler to the linked list of handlers. Then, every netio_dispatch call sequentially scans the entire list for "valid" filedescriptors and timeouts. With a large number of zones, this scan is pretty expensive and superfluous, because almost all zone filedescriptors/timeouts are usually not assigned. The problem is most obvious during "nsdc reload". Because server_reload function sends soa infos of all zones to xfrd, xfrd performs full scan of the linked list for every zone. So the resulting complexity of reload is O(n^2). Just try "nsdc reload" with 65000 zones and you'll see that xfrd daemon consumes 100% CPU for several _minutes_! However, I guess that the scalability problem is not only limited to reload, because _every_ socket communication with xfrd goes through the same netio_dispatch. There is "perf record" result of xfrd process during reload:
>>> 
>>> # Overhead  Command        Shared Object  Symbol
>>> # ........  .......  ...................  ......
>>> #
>>>  98.69%      nsd  /usr/sbin/nsd        [.] netio_dispatch
>>>   0.06%      nsd  [kernel]             [k] unix_stream_recvmsg
>>>   0.05%      nsd  /usr/sbin/nsd        [.] rbtree_find_less_equal
>>>   0.04%      nsd  [kernel]             [k] kfree
>>>   0.04%      nsd  [kernel]             [k] copy_to_user
>>> 
>>> Then, "perf annotate netio_dispatch" shows that the heart of the problem is indeed in the loop scanning the linked list (because of gcc optimizations, line numbers are only estimative):
>>> 
>>> 48.24% /work/nsd-3.2.4/netio.c:158
>>> 45.41% /work/nsd-3.2.4/netio.c:158
>>> 2.14% /work/nsd-3.2.4/netio.c:172
>>> 2.14% /work/nsd-3.2.4/netio.c:156
>>> 1.81% /work/nsd-3.2.4/netio.c:172
>>> 
>>> I wonder why the linked list in xfrd contains netio_handlers of _all_ zones. Wouldn't be better to dynamically add/remove zone handlers only when their filedescriptors/timeouts are assigned/cleared? And perhaps replace the linked list with a more scalable data structure? (Or NSD is intentionally designed to serve only a small number of zones? ;-))
>>> 
>>> Best regards
>>> Martin Svec
>>> 
>>> 
>>> _______________________________________________
>>> nsd-users mailing list
>>> nsd-users at NLnetLabs.nl
>>> http://open.nlnetlabs.nl/mailman/listinfo/nsd-users
>>> 
>> 
>> 
> 
> diff -urN nsd-3.2.4/netio.c nsd-3.2.4-new/netio.c
> --- nsd-3.2.4/netio.c	2010-02-28 18:41:31.433799257 +0100
> +++ nsd-3.2.4-new/netio.c	2010-02-28 18:46:31.237102213 +0100
> @@ -129,6 +129,10 @@
> 	int rc;
> 	int result = 0;
> 	
> +	unsigned int total_handlers = 0;
> +	unsigned int fd_hits = 0;
> +	unsigned int tm_hits = 0;
> +	
> 	assert(netio);
> 
> 	/*
> @@ -155,7 +159,9 @@
> 
> 	for (elt = netio->handlers; elt; elt = elt->next) {
> 		netio_handler_type *handler = elt->handler;
> +		++total_handlers;
> 		if (handler->fd >= 0 && handler->fd < (int)FD_SETSIZE) {
> +			++fd_hits;
> 			if (handler->fd > max_fd) {
> 				max_fd = handler->fd;
> 			}
> @@ -171,6 +177,7 @@
> 		}
> 		if (handler->timeout && (handler->event_types & NETIO_EVENT_TIMEOUT)) {
> 			struct timespec relative;
> +			++tm_hits;
> 
> 			relative.tv_sec = handler->timeout->tv_sec;
> 			relative.tv_nsec = handler->timeout->tv_nsec;
> @@ -187,6 +194,8 @@
> 		}
> 	}
> 
> +	log_msg(LOG_INFO, "netio dsp: %u, sum: %u, fds: %u, tms: %u.", (uintptr_t) netio, total_handlers, fd_hits, fd_hits);
> +
> 	if (have_timeout && minimum_timeout.tv_sec < 0) {
> 		/*
> 		 * On negative timeout for a handler, immediatly




More information about the nsd-users mailing list