Grammar for DNS (RFC 1035) zone files

Up front: it is impossible to write a clean lexer/grammer for DNS zone files. At first it looks like it is easy to make such a beast, but when you start implementing it the details make it messy.

For NSD the parser was rewritten using bison and flex. This was done to make it easier to add new resource records. There is a slight decrease in speed, but as the zone compiler is not critical to NSD's performance this not too relevant. The overall effect should be in the order of 10-20 %. The memory consumption of the new parser is equal the parser used in NSD 1.2.

This parser is currently included in NSD1.4-alpha1 and has shown to work ok, but it was my (Miek) first attempt in using bison and flex. The lexer part is located in the file zlexer.lex, the grammar is in zparser.y

It helped a lot that I had the following page at my disposal: dinosaur.compilertools.net

Zone file lexer

Finding a good grammar and lexer for BIND zonefiles is kinda hard. There are no real keywords and the meaning of most of the strings depends on the position relative to the other strings. An example, the following is a valid SOA record:

$ORIGIN example.org.
    SOA    soa    soa    ( 1 2 3 4 5 6 )

This SOA records means the administrator has an email address of soa@example.org. and the first nameserver is named soa.example.org. Both completely valid. The numbers are of course totally bogus.

Another example would be:

$ORIGIN example.org.
    SOA    soa    soa    ( 1 2 ) ( 3 4 ) ( 5 ) ( 6 )

The parsing of parentheses was also not trivial. Whitespace is also significant in zonefiles. The TAB before SOA has to be returned as previous_domain token by the lexer. Newlines inside parentheses are returned as SPACE which works but required some changes in the definitions of the resource records.

As shown above a simple grep -i for SOA does not do the trick. The lexer takes care of this tricky part by using an extra variable in_rr which is an enum containing: outside, expecting_dname, after_dname, reading_type. The semantics are as follows:

  • outside, not in an RR (start of a line or a $-directive);
  • expecting_dname, parse owner name of RR;
  • after_dname, parse ttl, class;
  • reading_type, we expect the RR type now;

With in_rr the lexer can say that in the first example above the first SOA is the actual record type, because it is located after a TAB. After we have found the TAB we set in_rr to after_dname which means we actually are expecting a RR type.

Again this is also not trivial because the class (IN) and TTL are also optional, if there are not specified we should substitute the current defaults from the zone we are parsing (this happens in the grammar). A DNS zonefile is further complicated by the unknown RR record types, which will require additional tweaking. This is on the TODO list for alpha-2.

Zone file grammar

After the lexer was written the grammar itself is quite clean and nice. The basic idea is that every RR consists of single line (the parentheses are handled in the lexer - so this really is the case). If a line is not a RR it is either a comment, empty or a $-directive. Some $-directives are handled inside the lexer ($INCLUDE) while others ($ORIGIN) must be dealt with inside the grammer.

An RR is defined as:

rr:     ORIGIN SP rrrest

and:

rrrset: classttl rtype

And then we have a whole list of:

rtype: TXT sp rdata_txt
       | DS sp rdata_ds
       | AAAA sp rdata_aaaa

which are then parsed by using the rdata_ rule. Shown here is the one for the SOA:

rdata_soa:  dname sp dname sp STR sp STR sp STR sp STR sp STR trail
    {
        /* convert the soa data */
        zadd_rdata_domain( current_parser, $1); /* prim. ns */
        zadd_rdata_domain( current_parser, $3); /* email */
        zadd_rdata_wireformat( current_parser,  \
                zparser_conv_rdata_period(zone_region, $5.str) ); /* serial */
        zadd_rdata_wireformat( current_parser,  \
                zparser_conv_rdata_period(zone_region, $7.str) ); /* refresh */
        zadd_rdata_wireformat( current_parser,  \
                zparser_conv_rdata_period(zone_region, $9.str) ); /* retry */
        zadd_rdata_wireformat( current_parser,  \
                zparser_conv_rdata_period(zone_region, $11.str) ); /* expire */
        zadd_rdata_wireformat( current_parser,  \
                zparser_conv_rdata_period(zone_region, $13.str) ); /* minimum */

        /* XXX also store the minium in case of no TTL? */
        if ( (current_parser->minimum = zparser_ttl2int($11.str) ) == -1 )
            current_parser->minimum = DEFAULT_TTL;
    };

The semantic actions in the grammer store the RR data for processing by the zone compiler. The resulting database is then used by NSD the serve the data.