Bugzilla – Bug 3736
0 TTL domains stuck on SERVFAIL unless manually flushed with serve-expired on
Last modified: 2018-03-13 17:04:41 CET
Config: serve-expired: yes 1) Query comes in for a foo.bar 2) Recursive lookup is performed. TTL is 0. Therefore, foo.bar is not cached ====== Upstream server goes down ====== 3) Query comes in for foo.bar. 4) Recursive lookup returns SERVFAIL. This is cached. ====== Upstream server is back up ====== 5) Query comes in for foo.bar. 6) serve_expired the SERVFAIL. Trigger a prefetch. 7) Upstream returns 0 TTL, therefore the cached SERVFAIL is not overwritten. Client always gets a SERVFAIL unless it is manually flushed from cache.
Hi Saksham, Yes, I saw your email. The fix is much more complicated than you suggest, and I am in the middle of the 1.7.0 release. Thanks for the report, it looks like a subtle problem with caching. Best regards, Wouter
Hi Saksham, This patch probably solves the problem, Best regards, Wouter Index: services/cache/dns.c =================================================================== --- services/cache/dns.c (revision 4579) +++ services/cache/dns.c (working copy) @@ -108,6 +108,48 @@ } } +/** delete message from message cache */ +static void +msg_cache_remove(struct module_env* env, uint8_t* qname, size_t qnamelen, + uint16_t qtype, uint16_t qclass, uint16_t flags) +{ + struct query_info k; + hashvalue_type h; + + k.qname = qname; + k.qname_len = qnamelen; + k.qtype = qtype; + k.qclass = qclass; + k.local_alias = NULL; + h = query_info_hash(&k, flags); + slabhash_remove(env->msg_cache, h, &k); +} + +/** remove servfail msg cache entry */ +static void +msg_del_servfail(struct module_env* env, struct query_info* qinfo, + uint32_t flags) +{ + struct msgreply_entry* e; + /* see if the entry is servfail, and then remove it, so that + * lookups move from the cacheresponse stage to the recursionresponse + * stage */ + e = msg_cache_lookup(env, qinfo->qname, qinfo->qname_len, + qinfo->qtype, qinfo->qclass, flags, *env->now, 0); + if(!e) return; + /* we don't check for the ttl here, also expired servfail entries + * are removed. If the user uses serve-expired, they would still be + * used to answer from cache */ + if(FLAGS_GET_RCODE(((struct reply_info*)e->entry.data)->flags) + != LDNS_RCODE_SERVFAIL) { + lock_rw_unlock(&e->entry.lock); + return; + } + lock_rw_unlock(&e->entry.lock); + msg_cache_remove(env, qinfo->qname, qinfo->qname_len, qinfo->qtype, + qinfo->qclass, flags); +} + void dns_cache_store_msg(struct module_env* env, struct query_info* qinfo, hashvalue_type hash, struct reply_info* rep, time_t leeway, int pside, @@ -132,6 +174,12 @@ * which could be useful for delegation information */ verbose(VERB_ALGO, "TTL 0: dropped msg from cache"); free(rep); + /* if the message is SERVFAIL in cache, remove that SERVFAIL, + * so that the TTL 0 response can be returned for future + * responses (i.e. don't get answered by the servfail from + * cache, but instead go to recursion to get this TTL0 + * response). */ + msg_del_servfail(env, qinfo, flags); return; }
Hi Saksham, The issue is solved, with the change below. Thanks for the report! What it does is that it removes SERVFAIL responses from cache when storing a 0TTL response. This will also cause 0TTL responses to be available when serve-expired is not in use when they are blocked by a SERVFAIL response in cache, although that normally would timeout in a couple seconds, with the fix the 0TTL response is available immediately after the 0TTL response has the cachestore routine called. Best regards, Wouter Index: services/cache/dns.c =================================================================== --- services/cache/dns.c (revision 4582) +++ services/cache/dns.c (working copy) @@ -135,7 +135,7 @@ * lookups move from the cacheresponse stage to the recursionresponse * stage */ e = msg_cache_lookup(env, qinfo->qname, qinfo->qname_len, - qinfo->qtype, qinfo->qclass, flags, *env->now, 0); + qinfo->qtype, qinfo->qclass, flags, 0, 0); if(!e) return; /* we don't check for the ttl here, also expired servfail entries * are removed. If the user uses serve-expired, they would still be
Thanks for the quick turnaround! Verified fix.