Bugzilla – Bug 804
Unbound fails to resolve after connectivity loss.
Last modified: 2016-09-06 16:07:52 CEST
If there is a connectivity failure that causes timeouts or ICMP no route messages to the default gateway, the unbound daemon will permanently lose the ability to resolve hosts until it is restarted. Behaviour does not seem consistent, and it seems to happen more often when connectivity is rapidly flapping between timeouts and available.
to clarify, the loss of the ability to resolve continues _after_ connectivity is restored and working again.
Hi Martian67, Yes unbound is waiting 15 minutes to spare the other servers traffic. There is documentation here how this happens. http://unbound.net/documentation/info_timeout.html If this happens a lot for you, change the infra-ttl timeout from 15 minutes to something (a lot) lower, like 60 seconds. Best regards, Wouter
DNS resolving will continue to be broken far beyond 15 minutes, it will fail to work indefinitely until the daemon is restarted. Given the transient nature of this issue its hard to replicate on demand, do you have any debugging commands i should run when it occurs again?
I can confirm this behavior--increasing or decreasing infra-host-ttl has no positive effect. Some more detail may be found at https://www.reddit.com/r/openbsd/comments/4jw4sx/the_day_some_of_the_dns_stopped.
I think, I'm frequently running into this problem, too and would like to know if this is going to be 'fixed' in the next build and if perhaps a pre-release version of such a build could be made available (for my platform) for testing.
Hi, Could the exponential backoff and rapid-flapping result in marking all servers as permanently down? And thus have (hours-long) timeouts? How could unbound detect this rapid-flapping or connectivityloss (or connectivity resumption)? Right now detection is on a server-by-server basis. Best regards, Wouter
Hi, There is a fix in the code repository that should alleviate some of the issue. But has not solved the underlying cause. It'll remove the 'waiting for empty list' entries from the requestlist. That should make things work again after an outage. Those shouldn't get in that state either - but now that error is logged when that happens not when unbound has ground to a stop. And logs about how that happens (with high verbosity) may help. Best regards, Wouter
Hi, Fixed the state machines getting stuck in waiting for empty_list. It was the failures caused by the outage causing the counter for num_target_queries to be not reset properly, causing the other queries to remain stuck in the waiting state. This should resolve the issue, I think. Best regards, Wouter
(In reply to Wouter Wijngaards from comment #8) > Hi, > > Fixed the state machines getting stuck in waiting for empty_list. It was > the failures caused by the outage causing the counter for num_target_queries > to be not reset properly, causing the other queries to remain stuck in the > waiting state. This should resolve the issue, I think. > > Best regards, Wouter So, for heaven's sake (or whatever you value), PLEASE make available a compiled executable (for Windows - btw: why is there no 64bit version? )asap.
Hi, 32bit versions here: http://www.nlnetlabs.nl/~wouter/unbound-1.5.10_20160825.zip and unbound_setup_1.5.10_20160825.exe At the same place I have also put 64bit compiled versions (untested, but I set the compiler flag to 64bit): unbound-1.5.10rc7.zip unbound_setup_1.5.10rc7.exe Best regards, Wouter
(In reply to Wouter Wijngaards from comment #10) > Hi, > > 32bit versions here: > http://www.nlnetlabs.nl/~wouter/unbound-1.5.10_20160825.zip > and unbound_setup_1.5.10_20160825.exe > > At the same place I have also put 64bit compiled versions (untested, but I > set the compiler flag to 64bit): > unbound-1.5.10rc7.zip unbound_setup_1.5.10rc7.exe > > Best regards, Wouter 32bit version installed and so far running without any problems on one machine. 64bit version reports "fatal error: could not read config file". Appears, that the 64bit version expects it's config file to be founhd in 'Program Files (x86)\Unbound'. That's funny, because the 32bit version, along with it's config file, is quite happy living under 'Program Files\Unbound' ...
(Somehow I can't edit my recent post:) I double-checked registry entries for both the 32bit and 64bit versions. Both have 'ImagePath' as 'REG_EXPAND_SZ' '"C:\Program Files\Unbound.exe" -c "C:\Program Files\Unbound\service.conf" -w service', but the 64bit version somehow ignores the path specified for 'service.conf'.
More observations about the 64bit Windows version: (1) placing a copy of 'service.conf' in 'C:\Program Files(x86)\Unbound' is sufficient to get the 64bit version running, even it is installed under 'Program Files\Unbound'. (2) while the 32bit version of Unbound feels content with occupying around 1.4MB of memory under low load, the 64bit version, under comparable load, consumes a whopping 65.4MB! That's about 50 times as much and definitely does not look right.
More on the 64bit Windows version: Memory footprint readjusted itself from the initially reported 65.4MB to perfectly acceptable 3.7MB after one hour. So, except for it expecting to find 'service.conf' under 'Program Files (x86)\Unbound' [that should be easy to fix], the 64bit Windows version is doing fine now.
(In reply to drahnier from comment #14) > More on the 64bit Windows version: > Memory footprint readjusted itself from the initially reported 65.4MB to > perfectly acceptable 3.7MB after one hour. So, except for it expecting to > find 'service.conf' under 'Program Files (x86)\Unbound' [that should be easy > to fix], the 64bit Windows version is doing fine now. Update: The 'service.conf' issue appears to be with 'unbound-checkconf', and not with 'unbound' itself. 'unbound-checkconf' requires a full path name as argument and if that is not given, uses "C:\Program Files (x86)\Unbound' as prefix for 'service.conf', completely disregarding the fact that the install directory is 'Program Files\Unbound' and that it is the 64bit version of 'unbound' which is running. IMHO, 'unbound-checkconf' should look for 'service.conf' in the directory where 'unbound' is installed first, before applying any defaults ... Btw: The 64bit version is doing just fine.
Haven't seen the issue manifest itself with this fix, seems like its been fixed.
Hi, The original bug has been fixed! Thanks for the debug information. Best regards, Wouter