Bug 1236

Summary: can't find a stable version of unbound
Product: unbound Reporter: Chris Stradtman <chris>
Component: serverAssignee: unbound team <unbound-team>
Status: ASSIGNED ---    
Severity: major CC: cathya, wouter
Priority: P5    
Version: unspecified   
Hardware: x86_64   
OS: Linux   

Description Chris Stradtman 2017-03-12 19:37:31 CET
I've been trying to find a stable version of unbound.  At the moment I'm at a barely workable situation. Where the OS auto restarts unbound everytime it crashes.

I'm currently running version 1.6.2 from the master branch

(Prior to this current situation unbound would lock up until I did a kill -9 on the process)

I've built unbound with 

 ./configure --with-pythonmodule  --enable-dnstap=no --enable-cachedb=no --with-libevent --enable-debug --with-pthreads

I've tried it so far with and without
libevent 
pthreads
cache-db
dnstap

The only variation I haven't tried is dropping the python module since that is critical to what we are doing with unbound.

Currently unbound is crashing (on both centos and ubuntu) with  a variety of errors.  A sample of my log file looks like.

Mar 12 05:07:09 sfo unbound: [7253:2] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 05:07:16 sfo unbound: [7273:3] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 05:07:56 sfo unbound: [7278:0] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 05:11:03 sfo unbound: [7285:2] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 05:11:22 sfo unbound: [7330:3] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 05:12:31 sfo unbound: [7337:0] fatal error: services/mesh.c:572: mesh_new_prefetch: assertion n != NULL failed
Mar 12 05:13:17 sfo unbound: [7347:2] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 05:13:35 sfo unbound: [7365:3] fatal error: services/mesh.c:572: mesh_new_prefetch: assertion n != NULL failed
Mar 12 05:14:19 sfo unbound: [7370:3] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 05:14:37 sfo unbound: [7383:1] fatal error: util/rbtree.c:324: change_child_ptr: assertion child->parent == old || child->parent == new failed
Mar 12 05:16:30 sfo unbound: [7388:0] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 05:16:48 sfo unbound: [7411:1] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 05:17:16 sfo unbound: [7424:2] fatal error: services/mesh.c:572: mesh_new_prefetch: assertion n != NULL failed
Mar 12 05:17:58 sfo unbound: [7433:0] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 05:18:32 sfo unbound: [7443:0] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 05:31:54 sfo unbound: [7452:2] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 06:25:51 sfo unbound: [7571:1] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 06:25:52 sfo unbound: [8060:1] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 06:25:53 sfo unbound: [8065:3] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 06:25:54 sfo unbound: [8072:1] fatal error: util/rbtree.c:324: change_child_ptr: assertion child->parent == old || child->parent == new failed
Mar 12 06:25:56 sfo unbound: [8077:0] fatal error: util/rbtree.c:324: change_child_ptr: assertion child->parent == old || child->parent == new failed
Mar 12 06:27:12 sfo unbound: [8082:3] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 06:52:29 sfo unbound: [8099:1] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed
Mar 12 07:42:41 sfo unbound: [8294:0] fatal error: services/mesh.c:419: mesh_new_client: assertion n != NULL failed



Does anybody have an pointers on where to look or what a workaround might be?

Thanks

Chris Stradtman
Comment 1 Wouter Wijngaards 2017-03-13 08:19:54 CET
Hi Chris,

Something bad must be in the python module.  Can you disable that, temporarily, to see if that is the problem?

Looks like the mesh is getting edited (lookup queries deleted?) without using the right routine to fix up all the trees and other references.

Best regards, Wouter