Maintained by: NLnet Labs

[Unbound-users] Release 0.7.2

W.C.A. Wijngaards
Wed Jan 30 11:48:19 CET 2008

Hash: SHA1

Hi Alexander,

Alexander Gall wrote:
| Wouter,
| Here's an update on my testing of 0.7.2 (I had the flu last week, back
| to work now :-)

Thank you very much for the test. I'll examine your logs below.

| Total run time so far is about 62 hours with a bit over 10 million
| answered queries.  I was confident enough to let unbound run for
| almost two days just now, which should have been enough to test most
| of the TTL and DNSSEC validation expiration logic I guess.

Good to know that validation was tested too; and works :-)

| Everything else is left at the default values.  Please let me know if
| you want me to test anything non-default in particular.

| I did not notice any problems at all and didn't get any reports from
| our users neitherl.  I count this as a success :-)

10 million real user queries is not something that I can try myself.

| unbound also correctly dealt with the situation that it had a trust
| anchor defined for (from the set of trust anchors
| distributed by RIPE NCC
| <>),
| but the corresponding DNSKEY is missing from the zone:
| [1201660228] unbound[16620:0] info: failed to prime trust anchor --
could not fetch DNSKEY rrset < DNSKEY IN>

Ah, yes, in such a situation you can wait for to fix their
zone and after 900 (default bogus ttl) unbound will pick that up, or
instead change your config and kill -HUP unbound (this also clears the

| Operational experience: I was able to integrate unbound into our
| anycast caching system without problems.  This allows me to run BIND
| and unbound in parallel on different anycast instances just as I had
| planned to do.  All of this is looking very good.

Oh this is really nice. Would be interesting to know of any noticable
differences between bind and unbound. Apart from version.bind CH TXT, of

Your log file. Thank you for sharing the statistics.
* There are many TCP connect errors. I assume now, that someone
configured a zone as NS, with a nameserver on
their local subnet. Unbound cannot contact that nameserver, and tries
(finally) to use TCP on it; which gives this error.

I think the log file should not be cluttered with zone administration
mistakes by others. I can demote this particular error to a higher
verbosity level (2), or I can print the address that failed and then you
(the operator) or a script can pickup those and block them
(do-not-query-address: in the config file).

I think I'll demote the error message, as it does not bother the
resolver operator. What do you think? Would you like to have the
addresses printed to the log anyway?

* You have 93% cache hits. With the default 4+4 Mb cache (4 mb for rrset
data, 4 mb for message data), so unbound caps memory usage at about
20-30 Mb total for the process. For 10 million queries. This is
impressive. You could try to increase cache size to improve cache hits;
but it doesn't seem worth the effort.

* The requestlist (this is the to-do list of pending recursive queries)
stays nice and small as well. If the computer were unable to bear the
load, this number would shoot up as requests come in faster than they
could be handled.

* For the histogram, onlookers please note the average reply time
printed a) does not include the cache responses (which are better
measured in qps then seconds per query) b) is skewed because of really
large upper numbers caused by unbound retrying very hard for a couple of
records (remote server down). In a newer unbound the median is printed
as well, a nicer way to average recursion speed.

* There is a significant bump on the lower end of your histogram, at 32
microsec. I assume this is because a lot of recursion requests are due
to a CNAME. Like where a CNAME is used to load-balance with DNS.
Consequently, I need to pay attention to CNAME-processing when I do
optimization, good to know.

Thank you,

Best regards,
~   Wouter

Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora -