Advanced 15 min DNS 2

SERVFAIL — Lame Delegation

Symptome

- DNS queries return SERVFAIL intermittently — some queries succeed, others fail
- Failure depends on which nameserver the recursive resolver picks from the NS set
- `dig @ns2.example.com example.com A` returns SERVFAIL or REFUSED while `dig @ns1.example.com` works
- DNS monitoring shows alternating SERVFAIL and NOERROR with no code changes
- Packet captures show the recursive resolver querying a lame NS and receiving no authoritative answer

Grundursachen

  • Registrar's NS records list a nameserver that no longer hosts the zone data
  • Secondary nameserver not configured or zone transfer (AXFR) from primary is failing
  • DNS provider deactivated the zone but the registrar's NS delegation still points to them
  • Partial nameserver migration — some NS records updated to new provider, others still stale
  • Hosting provider changed nameserver hostnames without notifying the domain owner

Diagnose

**Step 1 — List the registered NS records**

```bash
# Query the parent TLD for the delegated NS records:
dig example.com NS @a.gtld-servers.net
# This shows what the registrar has published

# Compare with what the domain's own NS returns:
dig example.com NS +short
```

**Step 2 — Query each NS directly for the zone**

```bash
# Test each nameserver listed in the NS RRset:
for NS in $(dig example.com NS +short); do
echo -n "$NS -> "
dig @$NS example.com SOA +noall +comments | grep -E 'status|ANSWER'
done
# A lame NS will respond with REFUSED, SERVFAIL, or an empty ANSWER section
# A healthy NS will respond with status: NOERROR and the SOA record
```

**Step 3 — Check zone transfer on the secondary NS**

```bash
# From the secondary nameserver (if you have access):
sudo rndc zonestatus example.com
# 'zone not loaded due to errors' or 'XFER failed' = lame secondary

# Check AXFR logs:
grep 'AXFR\|transfer' /var/log/named/default | tail -20
```

**Step 4 — Verify zone is loaded on each NS**

```bash
# Request SOA from each NS with +norec (no recursion):
dig @ns1.example.com example.com SOA +norec
dig @ns2.example.com example.com SOA +norec
# AA flag in flags section = authoritative answer = zone is loaded
# Missing AA flag = lame nameserver
```

**Step 5 — Use online tools for lame delegation detection**

```bash
# IntoDNS, DNSViz, or Zonemaster detect lame nameservers:
# https://intodns.com/example.com
# https://dnsviz.net/?name=example.com
# https://zonemaster.net/run-test?nameserver=example.com
```

Lösung

**Fix 1 — Remove the lame nameserver from the NS delegation**

Log in to your registrar and remove the nameserver that is not serving the zone. Keep only the nameservers that have the zone loaded and are responding with AA:

```bash
# After updating at registrar, verify the change propagated:
dig example.com NS @a.gtld-servers.net +short
# Should no longer list the lame NS
```

**Fix 2 — Fix the zone transfer on the secondary NS**

```named
# /etc/named.conf on primary — allow AXFR to secondary:
zone "example.com" {
type master;
file "/etc/named/zones/db.example.com";
allow-transfer { 198.51.100.2; }; # secondary NS IP
};
```

```bash
# On secondary — trigger immediate zone transfer:
sudo rndc retransfer example.com
sudo rndc zonestatus example.com
```

**Fix 3 — Add the zone to a new replacement NS**

```bash
# On the new secondary NS — configure the zone:
cat >> /etc/named.conf << 'EOF'
zone "example.com" {
type slave;
masters { 192.0.2.1; }; # primary NS IP
file "/var/cache/named/example.com.db";
};
EOF
sudo systemctl reload named
```

**Fix 4 — Update NS records at the registrar to the correct nameservers**

If the DNS provider changed hostnames, update the NS records at the registrar to match the new provider nameserver hostnames. Allow 24–48 hours for NS propagation.

Prävention

- After any DNS provider migration, verify all NS records are updated at the registrar and serving the zone
- Monitor each nameserver individually for SOA serial and AA flag response in your DNS monitoring
- Test AXFR health on secondary nameservers weekly; alert if zone serial falls behind primary
- Use Zonemaster or IntoDNS in your deployment pipeline to catch lame delegation before going live
- Document all nameservers in use and audit against registrar records quarterly

Verwandte Statuscodes

Verwandte Begriffe