SERVFAIL — Lame Delegation
Triệu chứng
- DNS queries return SERVFAIL intermittently — some queries succeed, others fail
- Failure depends on which nameserver the recursive resolver picks from the NS set
- `dig @ns2.example.com example.com A` returns SERVFAIL or REFUSED while `dig @ns1.example.com` works
- DNS monitoring shows alternating SERVFAIL and NOERROR with no code changes
- Packet captures show the recursive resolver querying a lame NS and receiving no authoritative answer
- Failure depends on which nameserver the recursive resolver picks from the NS set
- `dig @ns2.example.com example.com A` returns SERVFAIL or REFUSED while `dig @ns1.example.com` works
- DNS monitoring shows alternating SERVFAIL and NOERROR with no code changes
- Packet captures show the recursive resolver querying a lame NS and receiving no authoritative answer
Nguyên nhân gốc rễ
- Registrar's NS records list a nameserver that no longer hosts the zone data
- Secondary nameserver not configured or zone transfer (AXFR) from primary is failing
- DNS provider deactivated the zone but the registrar's NS delegation still points to them
- Partial nameserver migration — some NS records updated to new provider, others still stale
- Hosting provider changed nameserver hostnames without notifying the domain owner
Chẩn đoán
**Step 1 — List the registered NS records**
```bash
# Query the parent TLD for the delegated NS records:
dig example.com NS @a.gtld-servers.net
# This shows what the registrar has published
# Compare with what the domain's own NS returns:
dig example.com NS +short
```
**Step 2 — Query each NS directly for the zone**
```bash
# Test each nameserver listed in the NS RRset:
for NS in $(dig example.com NS +short); do
echo -n "$NS -> "
dig @$NS example.com SOA +noall +comments | grep -E 'status|ANSWER'
done
# A lame NS will respond with REFUSED, SERVFAIL, or an empty ANSWER section
# A healthy NS will respond with status: NOERROR and the SOA record
```
**Step 3 — Check zone transfer on the secondary NS**
```bash
# From the secondary nameserver (if you have access):
sudo rndc zonestatus example.com
# 'zone not loaded due to errors' or 'XFER failed' = lame secondary
# Check AXFR logs:
grep 'AXFR\|transfer' /var/log/named/default | tail -20
```
**Step 4 — Verify zone is loaded on each NS**
```bash
# Request SOA from each NS with +norec (no recursion):
dig @ns1.example.com example.com SOA +norec
dig @ns2.example.com example.com SOA +norec
# AA flag in flags section = authoritative answer = zone is loaded
# Missing AA flag = lame nameserver
```
**Step 5 — Use online tools for lame delegation detection**
```bash
# IntoDNS, DNSViz, or Zonemaster detect lame nameservers:
# https://intodns.com/example.com
# https://dnsviz.net/?name=example.com
# https://zonemaster.net/run-test?nameserver=example.com
```
```bash
# Query the parent TLD for the delegated NS records:
dig example.com NS @a.gtld-servers.net
# This shows what the registrar has published
# Compare with what the domain's own NS returns:
dig example.com NS +short
```
**Step 2 — Query each NS directly for the zone**
```bash
# Test each nameserver listed in the NS RRset:
for NS in $(dig example.com NS +short); do
echo -n "$NS -> "
dig @$NS example.com SOA +noall +comments | grep -E 'status|ANSWER'
done
# A lame NS will respond with REFUSED, SERVFAIL, or an empty ANSWER section
# A healthy NS will respond with status: NOERROR and the SOA record
```
**Step 3 — Check zone transfer on the secondary NS**
```bash
# From the secondary nameserver (if you have access):
sudo rndc zonestatus example.com
# 'zone not loaded due to errors' or 'XFER failed' = lame secondary
# Check AXFR logs:
grep 'AXFR\|transfer' /var/log/named/default | tail -20
```
**Step 4 — Verify zone is loaded on each NS**
```bash
# Request SOA from each NS with +norec (no recursion):
dig @ns1.example.com example.com SOA +norec
dig @ns2.example.com example.com SOA +norec
# AA flag in flags section = authoritative answer = zone is loaded
# Missing AA flag = lame nameserver
```
**Step 5 — Use online tools for lame delegation detection**
```bash
# IntoDNS, DNSViz, or Zonemaster detect lame nameservers:
# https://intodns.com/example.com
# https://dnsviz.net/?name=example.com
# https://zonemaster.net/run-test?nameserver=example.com
```
Giải quyết
**Fix 1 — Remove the lame nameserver from the NS delegation**
Log in to your registrar and remove the nameserver that is not serving the zone. Keep only the nameservers that have the zone loaded and are responding with AA:
```bash
# After updating at registrar, verify the change propagated:
dig example.com NS @a.gtld-servers.net +short
# Should no longer list the lame NS
```
**Fix 2 — Fix the zone transfer on the secondary NS**
```named
# /etc/named.conf on primary — allow AXFR to secondary:
zone "example.com" {
type master;
file "/etc/named/zones/db.example.com";
allow-transfer { 198.51.100.2; }; # secondary NS IP
};
```
```bash
# On secondary — trigger immediate zone transfer:
sudo rndc retransfer example.com
sudo rndc zonestatus example.com
```
**Fix 3 — Add the zone to a new replacement NS**
```bash
# On the new secondary NS — configure the zone:
cat >> /etc/named.conf << 'EOF'
zone "example.com" {
type slave;
masters { 192.0.2.1; }; # primary NS IP
file "/var/cache/named/example.com.db";
};
EOF
sudo systemctl reload named
```
**Fix 4 — Update NS records at the registrar to the correct nameservers**
If the DNS provider changed hostnames, update the NS records at the registrar to match the new provider nameserver hostnames. Allow 24–48 hours for NS propagation.
Log in to your registrar and remove the nameserver that is not serving the zone. Keep only the nameservers that have the zone loaded and are responding with AA:
```bash
# After updating at registrar, verify the change propagated:
dig example.com NS @a.gtld-servers.net +short
# Should no longer list the lame NS
```
**Fix 2 — Fix the zone transfer on the secondary NS**
```named
# /etc/named.conf on primary — allow AXFR to secondary:
zone "example.com" {
type master;
file "/etc/named/zones/db.example.com";
allow-transfer { 198.51.100.2; }; # secondary NS IP
};
```
```bash
# On secondary — trigger immediate zone transfer:
sudo rndc retransfer example.com
sudo rndc zonestatus example.com
```
**Fix 3 — Add the zone to a new replacement NS**
```bash
# On the new secondary NS — configure the zone:
cat >> /etc/named.conf << 'EOF'
zone "example.com" {
type slave;
masters { 192.0.2.1; }; # primary NS IP
file "/var/cache/named/example.com.db";
};
EOF
sudo systemctl reload named
```
**Fix 4 — Update NS records at the registrar to the correct nameservers**
If the DNS provider changed hostnames, update the NS records at the registrar to match the new provider nameserver hostnames. Allow 24–48 hours for NS propagation.
Phòng ngừa
- After any DNS provider migration, verify all NS records are updated at the registrar and serving the zone
- Monitor each nameserver individually for SOA serial and AA flag response in your DNS monitoring
- Test AXFR health on secondary nameservers weekly; alert if zone serial falls behind primary
- Use Zonemaster or IntoDNS in your deployment pipeline to catch lame delegation before going live
- Document all nameservers in use and audit against registrar records quarterly
- Monitor each nameserver individually for SOA serial and AA flag response in your DNS monitoring
- Test AXFR health on secondary nameservers weekly; alert if zone serial falls behind primary
- Use Zonemaster or IntoDNS in your deployment pipeline to catch lame delegation before going live
- Document all nameservers in use and audit against registrar records quarterly