[nsd-users] Xfrd scalability problem

W.C.A. Wijngaards wouter at NLnetLabs.nl
Mon Mar 1 08:16:17 UTC 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Martin,

Thanks for the perf measurements.  I did not know that.  I wrote that
code some time ago, and decided against optimizing xfrd like this,
because the netio handler is also used by the server processes.  Those
processes listen on only a limited number of sockets, and thus this is
more efficient for them.  If this is the only bottleneck for a larger
number of zones, then it may be relatively easy to fix.

Best regards,
   Wouter

On 02/28/2010 08:30 PM, Martin ?vec wrote:
> Hello again,
> 
> I think that xfrd daemon suffers a scalability problem with respect to
> the number of zones. For every zone, xfrd adds a netio_handler to the
> linked list of handlers. Then, every netio_dispatch call sequentially
> scans the entire list for "valid" filedescriptors and timeouts. With a
> large number of zones, this scan is pretty expensive and superfluous,
> because almost all zone filedescriptors/timeouts are usually not
> assigned. The problem is most obvious during "nsdc reload". Because
> server_reload function sends soa infos of all zones to xfrd, xfrd
> performs full scan of the linked list for every zone. So the resulting
> complexity of reload is O(n^2). Just try "nsdc reload" with 65000 zones
> and you'll see that xfrd daemon consumes 100% CPU for several _minutes_!
> However, I guess that the scalability problem is not only limited to
> reload, because _every_ socket communication with xfrd goes through the
> same netio_dispatch. There is "perf record" result of xfrd process
> during reload:
> 
> # Overhead  Command        Shared Object  Symbol
> # ........  .......  ...................  ......
> #
>    98.69%      nsd  /usr/sbin/nsd        [.] netio_dispatch
>     0.06%      nsd  [kernel]             [k] unix_stream_recvmsg
>     0.05%      nsd  /usr/sbin/nsd        [.] rbtree_find_less_equal
>     0.04%      nsd  [kernel]             [k] kfree
>     0.04%      nsd  [kernel]             [k] copy_to_user
> 
> Then, "perf annotate netio_dispatch" shows that the heart of the problem
> is indeed in the loop scanning the linked list (because of gcc
> optimizations, line numbers are only estimative):
> 
> 48.24% /work/nsd-3.2.4/netio.c:158
> 45.41% /work/nsd-3.2.4/netio.c:158
> 2.14% /work/nsd-3.2.4/netio.c:172
> 2.14% /work/nsd-3.2.4/netio.c:156
> 1.81% /work/nsd-3.2.4/netio.c:172
> 
> I wonder why the linked list in xfrd contains netio_handlers of _all_
> zones. Wouldn't be better to dynamically add/remove zone handlers only
> when their filedescriptors/timeouts are assigned/cleared? And perhaps
> replace the linked list with a more scalable data structure? (Or NSD is
> intentionally designed to serve only a small number of zones? ;-))
> 
> Best regards
> Martin Svec
> 
> 
> _______________________________________________
> nsd-users mailing list
> nsd-users at NLnetLabs.nl
> http://open.nlnetlabs.nl/mailman/listinfo/nsd-users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAkuLd9AACgkQkDLqNwOhpPgFvACfX5IQLLcI9iCwBMWaGmVtzK1J
7xsAn2UdLeJXS90z/Z5dvKERxN5P9Xqu
=cgTf
-----END PGP SIGNATURE-----



More information about the nsd-users mailing list