Bug 689

Summary: unbound-anchor hangs for 50 minutes
Product: unbound Reporter: Tomas Hozza <thozza>
Component: serverAssignee: unbound team <unbound-team>
Status: ASSIGNED ---    
Severity: major CC: cathya, wouter
Priority: P5    
Version: 1.5.4   
Hardware: x86_64   
OS: Linux   

Description Tomas Hozza 2015-07-20 12:54:58 CEST

We have a bug in Fedora [1] with user claiming that the installation of unbound-libs package, during which we are calling unbound-anchor to update the root key, hangs for 50 minutes. Even though I'm not able to reproduce it, the user is. I asked him to provide backtrace from the process when it hangs:

~~~ snip
unbound-anchor -a /var/lib/unbound/root.anchor -c /etc/unbound/icannbundle.pem

Which also hanged.

I got this bt each time with gdb:

#0  0x00003fff8e0a4958 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x00003fff8e76a824 in 00000043.plt_call.bufferevent_disable () from /lib64/libevent-2.0.so.5
#2  0x00003fff8e74b61c in .event_base_loop () from /lib64/libevent-2.0.so.5
#3  0x00003fff8e74cec4 in .event_base_dispatch () from /lib64/libevent-2.0.so.5
#4  0x00003fff8e8eaa6c in 0000106d.plt_call.PyTuple_New () from /lib64/libunbound.so.2
#5  0x00003fff8e965fd4 in .ub_resolve () from /lib64/libunbound.so.2
#6  0x0000000045d746b0 in 000002e1.plt_call.ub_ctx_create ()
#7  0x00003fff8dfa438c in .generic_start_main.isra () from /lib64/libc.so.6
#8  0x00003fff8dfa45b4 in .__libc_start_main () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()

unbound-anchor is a C binary. Therefore it is strange that there is the "PyTuple_New" call, since it is part of the Python bindings in libunbound.

The machine has Internet access.

Do you as the upstream have any idea why and how could unbound-anchor hang for such a long time and in the end ends successfully?

Thanks in advance.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1228445
Comment 1 Wouter Wijngaards 2015-07-20 13:04:01 CEST
Hi Tomas,

It should not hang.  The python call, if unbound is reading a config file (with -C option perhaps?) and that config contains a python section would be loaded and processed.

But that does not explain why it would hang.  In ub_resolve, so it is trying to resolve data.iana.org (but why, because mostly it would try to get the . DNSKEY).  So it must be trying to do a . DNSKEY query.

Does the user have options like ssl-upstream set that the unbound-anchor gets via a config file in some way?

I would like to be able to reproduce it, I wonder what external cause makes it behave in this way.

Best regards,
Comment 2 Tomas Hozza 2015-07-20 13:15:39 CEST
This is happening during the installation. This means that there is no unbound.conf installed, but even it if was, the ssl-upstream is not set, so should use default value (which seems to be "no").

I was thinking that maybe the users resolver from DHCP is not able to provide DNSSEC data and libunbound is trying to do the validation.

I could ask for the communication packet dump. If you have any ideas what would be helpful, I can ask for it.

Comment 3 Wouter Wijngaards 2015-07-20 14:16:57 CEST
Hi Tomas,

Yes, I would think timeouts would be fast for that.  Indeed, somehow the network is down or non-responsive or something?

You could, you know, just not call unbound-anchor during installation.  In the initial Makefiles I called it because I did not know how else to fetch a root anchor for the user.

OTOH it should work.  I am interested in that packet trace.  Is there something else that is out of the ordinary during installation.  Are there SELinux denied calls?

Best regards, Wouter