Bug 4227 - Unbound 1.9 crashes intermittently
Unbound 1.9 crashes intermittently
Status: RESOLVED FIXED
Product: unbound
Classification: Unclassified
Component: server
1.9.0
x86_64 Linux
: P5 normal
Assigned To: unbound team
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-02-25 16:25 CET by Isaac McDonald
Modified: 2019-02-28 16:12 CET (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Isaac McDonald 2019-02-25 16:25:58 CET
Hello,

Unbound 1.9 is crashing intermittently on two separate CentOS 7 servers. These servers were previously running Unbound 1.8.0 without issue. The error is:  vmap.c:367: Assertion nread >= 0 failed in evmap_io_del

If I can provide additional debugging information please let me know.

The Unbound configuration is as follows:


server:
        verbosity: 1
        statistics-interval: 0
        statistics-cumulative: yes
        extended-statistics: yes
        ratelimit: 1000
        ratelimit-size: 4m
        ratelimit-slabs: 8
        ratelimit-factor: 10
        num-threads: 8
        interface: 216.187.165.2
        interface-automatic: no
        msg-cache-size: 200m
        msg-cache-slabs: 8
        rrset-cache-size: 400m
        rrset-cache-slabs: 8
        infra-cache-slabs: 8
        access-control: 216.187.128.0/18 allow_snoop
        access-control: 204.130.255.0/24 allow_snoop
        access-control: 64.122.164.0/24 allow_snoop
        access-control: 192.168.0.0/16 allow_snoop
        access-control: 172.16.0.0/12 allow_snoop
        access-control: 10.0.0.0/8 allow_snoop
        chroot: ""
        username: "unbound"
        directory: "/etc/unbound"
        logfile: "/var/log/unbound/unbound.log"
        log-time-ascii: yes
        pidfile: "/var/run/unbound/unbound.pid"
        hide-version: yes
        harden-glue: yes
        harden-dnssec-stripped: yes
        harden-below-nxdomain: yes
        harden-referral-path: yes
        use-caps-for-id: no
        unwanted-reply-threshold: 10000000
        prefetch: yes
        prefetch-key: yes
        rrset-roundrobin: yes
        minimal-responses: yes
        serve-expired: yes
        trusted-keys-file: /etc/unbound/keys.d/*.key
        auto-trust-anchor-file: "/var/lib/unbound/root.key"
        val-clean-additional: yes
        val-permissive-mode: no
        val-log-level: 1
        key-cache-size: 85m
        key-cache-slabs: 8
        include: /etc/unbound/local.d/*.conf
        interface: 216.187.165.2@853
        tls-service-key: "/etc/pki/tls/private/example.com.key"
        tls-service-pem: "/etc/pki/tls/certs/example.com.crt"
        tls-port: 853
        aggressive-nsec: yes
        qname-minimisation: yes
        so-reuseport: yes

remote-control:
        control-enable: yes
        control-interface: 172.31.142.251
        control-port: 8953
        server-key-file: "/etc/unbound/unbound_server.key"
        server-cert-file: "/etc/unbound/unbound_server.pem"
        control-key-file: "/etc/unbound/unbound_control.key"
        control-cert-file: "/etc/unbound/unbound_control.pem"

include: /etc/unbound/conf.d/*.conf
Comment 1 Wouter Wijngaards 2019-02-25 16:50:22 CET
Hi Imcdona,

I have a fix in the code repository.  But, it is based on the assertion you list not on replicating the bug itself, so you can update from the code repository and then see if that fixes the bug?

I paired the event delete and add for libevent.  The new out of order processing in 1.9.0 had this imbalanced.

Best regards, Wouter

Index: services/listen_dnsport.c
===================================================================
--- services/listen_dnsport.c	(revision 5121)
+++ services/listen_dnsport.c	(working copy)
@@ -1636,10 +1636,12 @@
 	
 	if(wr) {
 		req->cp->tcp_is_reading = 0;
+		comm_point_stop_listening(req->cp);
 		comm_point_start_listening(req->cp, -1,
 			req->cp->tcp_timeout_msec);
 	} else if(rd) {
 		req->cp->tcp_is_reading = 1;
+		comm_point_stop_listening(req->cp);
 		comm_point_start_listening(req->cp, -1,
 			req->cp->tcp_timeout_msec);
 		/* and also read it (from SSL stack buffers), so
@@ -1647,6 +1649,7 @@
 		 * the TLS frame is sitting in the buffers. */
 		req->read_again = 1;
 	} else {
+		comm_point_stop_listening(req->cp);
 		comm_point_start_listening(req->cp, -1,
 			req->cp->tcp_timeout_msec);
 		comm_point_listen_for_rw(req->cp, 0, 0);
@@ -1759,6 +1762,7 @@
 		 * clear to write to */
 	send_it:
 		c->tcp_is_reading = 0;
+		comm_point_stop_listening(c);
 		comm_point_start_listening(c, -1, c->tcp_timeout_msec);
 		return;
 	}
Index: util/netevent.c
===================================================================
--- util/netevent.c	(revision 5121)
+++ util/netevent.c	(working copy)
@@ -989,10 +989,10 @@
 		c->tcp_is_reading = 1;
 	c->tcp_byte_count = 0;
 	/* switch from listening(write) to listening(read) */
-	comm_point_stop_listening(c);
 	if(c->tcp_req_info) {
 		tcp_req_info_handle_writedone(c->tcp_req_info);
 	} else {
+		comm_point_stop_listening(c);
 		comm_point_start_listening(c, -1, -1);
 	}
 }
@@ -1006,11 +1006,11 @@
 	if(c->tcp_do_toggle_rw)
 		c->tcp_is_reading = 0;
 	c->tcp_byte_count = 0;
-	if(c->type == comm_tcp)
-		comm_point_stop_listening(c);
 	if(c->tcp_req_info) {
 		tcp_req_info_handle_readdone(c->tcp_req_info);
 	} else {
+		if(c->type == comm_tcp)
+			comm_point_stop_listening(c);
 		fptr_ok(fptr_whitelist_comm_point(c->callback));
 		if( (*c->callback)(c, c->cb_arg, NETEVENT_NOERROR, &c->repinfo) ) {
 			comm_point_start_listening(c, -1, c->tcp_timeout_msec);
Comment 2 Isaac McDonald 2019-02-25 19:02:48 CET
Unbound was originally installed on these servers via a third party YUM repo. Compiling from the latest source isn't going to be easy. I'd like to avoid having to do that if at all possible.
Comment 3 Isaac McDonald 2019-02-25 19:40:21 CET
I spoke too soon. The maintainers of the third part repo are going to apply the patch to their latest build. I'll get back to you after I've had an opportunity to test.
Comment 4 Isaac McDonald 2019-02-28 16:08:41 CET
Unbound has been running solid since 2/25 without crashing. Your patch appears to have fixed the issue.

Thanks
Comment 5 Wouter Wijngaards 2019-02-28 16:12:15 CET
Hi Isaac,

Very good!  Thanks for the report.  I intend to move towards a production release with the fixes in there.  This makes the fixes also available for the other users.

Best regards, Wouter