Bug 3817 - core dump happens in some condition
core dump happens in some condition
Status: RESOLVED FIXED
Product: unbound
Classification: Unclassified
Component: server
1.6.0
x86_64 Linux
: P2 critical
Assigned To: unbound team
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-03-15 13:54 CET by sunhubs
Modified: 2018-04-09 08:08 CEST (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description sunhubs 2018-03-15 13:54:25 CET
hi, i use libunbound to do timeout control test, but i got core dump sometimes, not so frequently but it does. core dump file looks:

#0  comm_point_start_listening (c=0x0, newfd=-1, msec=-1) at util/netevent.c:2098
#1  0x00007f45d5c095e0 in tube_queue_item (tube=0x7f4594062880, msg=0x7f44943faaf0 "", len=20) at util/tube.c:474
#2  0x00007f45d5bd617c in add_bg_result (w=0x7f4594344e40, q=0x7f45941f9b30, pkt=<value optimized out>, err=0,
    reason=<value optimized out>) at libunbound/libworker.c:740
#3  0x00007f45d5bea2f3 in mesh_state_cleanup (mstate=0x7f45940eba10) at services/mesh.c:643
#4  0x00007f45d5bea414 in mesh_state_delete (qstate=<value optimized out>) at services/mesh.c:694
#5  0x00007f45d5beb741 in mesh_delete_helper (mesh=0x7f45944b6fd0) at services/mesh.c:212
#6  mesh_delete (mesh=0x7f45944b6fd0) at services/mesh.c:224
#7  0x00007f45d5bd6725 in libworker_delete_env (w=0x7f4594344e40) at libunbound/libworker.c:85
#8  0x00007f45d5bd67ce in libworker_delete (w=0x7f4594344e40) at libunbound/libworker.c:106
#9  0x00007f45d5bd777e in libworker_dobg (arg=0x7f4594344e40) at libunbound/libworker.c:364
#10 0x00000031f5c07aa1 in start_thread () from /lib64/libpthread.so.0
#11 0x00000031f58e893d in clone () from /lib64/libc.so.6

i use it in a thread-mode on centos 6.3, my interface calling order is:
ub_ctx_create -> to create ctx
ub_ctx_async -> to set as thread mode
ub_ctx_resolvconf -> to read resolv.conf
ub_resolve_async -> to do asychronous query
ub_fd, FD_ZERO, FD_SET, select -> to do epoll
ub_process ->to process data

would you help to analyze this core file?
Comment 1 Wouter Wijngaards 2018-03-15 14:17:38 CET
Hi sunhubs,

The stack trace looks like the context service thread (libworker_dobg) is exiting.

While it is exiting, it looks like it is deleting data.  And while this is happening another result is coming in, getting added to the tube, but the tube has been deleted, and now it tries to use a NULL pointer.

I.e. during the delete, in mesh_state_cleanup, it calls add_bg_result. but the necessary tube item has been deleted already.

Best regards, Wouter
Comment 2 Wouter Wijngaards 2018-03-15 14:24:10 CET
Hi Sunhubs,

This patch should solve the problem.  I'll add a similar patch to the development code repository.


Index: util/tube.c
===================================================================
--- util/tube.c	(revision 4220)
+++ util/tube.c	(working copy)
@@ -454,8 +454,9 @@
 
 int tube_queue_item(struct tube* tube, uint8_t* msg, size_t len)
 {
-	struct tube_res_list* item = 
-		(struct tube_res_list*)malloc(sizeof(*item));
+	struct tube_res_list* item;
+	if(!tube || !tube->res_com) return NULL;
+	item = (struct tube_res_list*)malloc(sizeof(*item));
 	if(!item) {
 		free(msg);
 		log_err("out of memory for async answer");


Best regards, Wouter
Comment 3 Wouter Wijngaards 2018-03-15 14:30:36 CET
Hi,

That solved the crash, this patch solves the problem better.  By not making items when they shouldn't be.

Index: libunbound/libworker.c
===================================================================
--- libunbound/libworker.c	(revision 4584)
+++ libunbound/libworker.c	(working copy)
@@ -365,6 +365,7 @@
 
 	/* cleanup */
 	m = UB_LIBCMD_QUIT;
+	w->want_quit = 1;
 	tube_remove_bg_listen(w->ctx->qq_pipe);
 	tube_remove_bg_write(w->ctx->rr_pipe);
 	libworker_delete(w);
@@ -713,6 +714,10 @@
 	uint8_t* msg = NULL;
 	uint32_t len = 0;
 
+	if(w->want_quit) {
+		context_query_delete(q);
+		return;
+	}
 	/* serialize and delete unneeded q */
 	if(w->is_bg_thread) {
 		lock_basic_lock(&w->ctx->cfglock);
Index: libunbound/libworker.h
===================================================================
--- libunbound/libworker.h	(revision 4584)
+++ libunbound/libworker.h	(working copy)
@@ -75,6 +75,8 @@
 	int is_bg;
 	/** is this a bg worker that is threaded (not forked)? */
 	int is_bg_thread;
+	/** want to quit, stop handling new content */
+	int want_quit;
 
 	/** copy of the module environment with worker local entries. */
 	struct module_env* env;

Best regards, Wouter
Comment 4 sunhubs 2018-03-16 02:34:40 CET
(In reply to Wouter Wijngaards from comment #3)
> Hi,
> 
> That solved the crash, this patch solves the problem better.  By not making
> items when they shouldn't be.
> 
> Index: libunbound/libworker.c
> ===================================================================
> --- libunbound/libworker.c	(revision 4584)
> +++ libunbound/libworker.c	(working copy)
> @@ -365,6 +365,7 @@
>  
>  	/* cleanup */
>  	m = UB_LIBCMD_QUIT;
> +	w->want_quit = 1;
>  	tube_remove_bg_listen(w->ctx->qq_pipe);
>  	tube_remove_bg_write(w->ctx->rr_pipe);
>  	libworker_delete(w);
> @@ -713,6 +714,10 @@
>  	uint8_t* msg = NULL;
>  	uint32_t len = 0;
>  
> +	if(w->want_quit) {
> +		context_query_delete(q);
> +		return;
> +	}
>  	/* serialize and delete unneeded q */
>  	if(w->is_bg_thread) {
>  		lock_basic_lock(&w->ctx->cfglock);
> Index: libunbound/libworker.h
> ===================================================================
> --- libunbound/libworker.h	(revision 4584)
> +++ libunbound/libworker.h	(working copy)
> @@ -75,6 +75,8 @@
>  	int is_bg;
>  	/** is this a bg worker that is threaded (not forked)? */
>  	int is_bg_thread;
> +	/** want to quit, stop handling new content */
> +	int want_quit;
>  
>  	/** copy of the module environment with worker local entries. */
>  	struct module_env* env;
> 
> Best regards, Wouter

hi Wouter,
   thanks for your help, it looks work, but i'd like to confirm the two patches has been released or will be released in which version?

Best regards, Eddy
Comment 5 Wouter Wijngaards 2018-03-16 07:49:34 CET
Hi sunhubs,

Likely in the next release of unbound, 1.7.1.  That'll take some time, 1.7.0 was just released.

Best regards, Wouter
Comment 6 sunhubs 2018-04-09 08:08:47 CEST
(In reply to Wouter Wijngaards from comment #5)
> Hi sunhubs,
> 
> Likely in the next release of unbound, 1.7.1.  That'll take some time, 1.7.0
> was just released.
> 
> Best regards, Wouter

Got it, thanks