SERVFAIL — DNS Server Failure
Симптомы
- `dig example.com` returns `status: SERVFAIL` with empty ANSWER and AUTHORITY sections
- Browser shows `DNS_PROBE_FINISHED_BAD_CONFIG` or generic connection error
- Intermittent resolution failures — domain resolves sometimes but not always
- `dig example.com +dnssec` shows `RRSIG` present but validation fails with SERVFAIL
- Application logs: `dns.resolver.NoAnswer` or `dns.exception.Timeout` with SERVFAIL context
- Browser shows `DNS_PROBE_FINISHED_BAD_CONFIG` or generic connection error
- Intermittent resolution failures — domain resolves sometimes but not always
- `dig example.com +dnssec` shows `RRSIG` present but validation fails with SERVFAIL
- Application logs: `dns.resolver.NoAnswer` or `dns.exception.Timeout` with SERVFAIL context
Первопричины
- DNSSEC chain of trust broken — DS record in parent zone does not match DNSKEY in child zone
- Authoritative nameservers are unreachable — firewall blocks UDP/TCP port 53 or server is down
- Recursive resolver timeout — authoritative NS is slow to respond and resolver gives up
- Zone transfer failure — secondary nameserver has stale or missing zone data
- Misconfigured SOA record pointing to a non-existent primary nameserver
Диагностика
**Step 1 — Confirm SERVFAIL and check DNSSEC**
```bash
dig example.com A
# status: SERVFAIL means resolver could not get an authoritative answer
# Try with DNSSEC disabled to isolate DNSSEC issue
dig example.com A +cd # +cd = checking disabled (bypass DNSSEC validation)
# If +cd succeeds but normal query fails → DNSSEC problem
```
**Step 2 — Check DNSSEC chain**
```bash
# Check DS record in parent zone
dig example.com DS +short
# Check DNSKEY in child zone
dig example.com DNSKEY +short
# Validate RRSIG on A record
dig example.com A +dnssec +short
# Use DNSViz for visual chain analysis
# https://dnsviz.net/?name=example.com
```
**Step 3 — Test authoritative NS reachability**
```bash
# Find authoritative nameservers
dig example.com NS +short
# Query each NS directly
dig @ns1.example.com example.com A
dig @ns2.example.com example.com A
# Test port 53 TCP/UDP from your network
nc -vuz ns1.example.com 53
nc -vz ns1.example.com 53
```
**Step 4 — Check from multiple resolvers**
```bash
# Google Public DNS
dig @8.8.8.8 example.com A
# Cloudflare
dig @1.1.1.1 example.com A
# Authoritative (bypasses resolver caching)
NS=$(dig example.com NS +short | head -1)
dig @$NS example.com A +norecurse
# REFUSED from authoritative = NS not serving this zone
```
**Step 5 — Inspect resolver logs**
```bash
# systemd-resolved
sudo journalctl -u systemd-resolved -n 50 --no-pager | grep 'SERVFAIL\|example.com'
# BIND9 resolver
sudo named-checkconf
tail -50 /var/log/named/default
```
```bash
dig example.com A
# status: SERVFAIL means resolver could not get an authoritative answer
# Try with DNSSEC disabled to isolate DNSSEC issue
dig example.com A +cd # +cd = checking disabled (bypass DNSSEC validation)
# If +cd succeeds but normal query fails → DNSSEC problem
```
**Step 2 — Check DNSSEC chain**
```bash
# Check DS record in parent zone
dig example.com DS +short
# Check DNSKEY in child zone
dig example.com DNSKEY +short
# Validate RRSIG on A record
dig example.com A +dnssec +short
# Use DNSViz for visual chain analysis
# https://dnsviz.net/?name=example.com
```
**Step 3 — Test authoritative NS reachability**
```bash
# Find authoritative nameservers
dig example.com NS +short
# Query each NS directly
dig @ns1.example.com example.com A
dig @ns2.example.com example.com A
# Test port 53 TCP/UDP from your network
nc -vuz ns1.example.com 53
nc -vz ns1.example.com 53
```
**Step 4 — Check from multiple resolvers**
```bash
# Google Public DNS
dig @8.8.8.8 example.com A
# Cloudflare
dig @1.1.1.1 example.com A
# Authoritative (bypasses resolver caching)
NS=$(dig example.com NS +short | head -1)
dig @$NS example.com A +norecurse
# REFUSED from authoritative = NS not serving this zone
```
**Step 5 — Inspect resolver logs**
```bash
# systemd-resolved
sudo journalctl -u systemd-resolved -n 50 --no-pager | grep 'SERVFAIL\|example.com'
# BIND9 resolver
sudo named-checkconf
tail -50 /var/log/named/default
```
Решение
**Fix 1 — Re-sign or remove broken DNSSEC records**
```bash
# BIND9: re-sign the zone
dnssec-signzone -A -3 $(head -c 6 /dev/random | base64) \
-N INCREMENT -o example.com -t db.example.com
sudo rndc reload example.com
# Or temporarily disable DNSSEC at the registrar:
# Remove DS records from the parent zone (registrar panel)
# DNS propagation: 24-48h
```
**Fix 2 — Restore unreachable authoritative nameserver**
```bash
# Restart BIND9
sudo systemctl restart named
# Open port 53 in firewall
sudo ufw allow 53/udp
sudo ufw allow 53/tcp
# Security Group (AWS) — add inbound rules:
# UDP 53 from 0.0.0.0/0, TCP 53 from 0.0.0.0/0
```
**Fix 3 — Force zone reload on secondary NS**
```bash
# BIND9 — trigger zone transfer
sudo rndc retransfer example.com
# Verify zone loaded
sudo rndc zonestatus example.com
```
**Fix 4 — Fix SOA record**
```dns
; Correct SOA format
example.com. 3600 IN SOA ns1.example.com. admin.example.com. (
2024010101 ; serial
3600 ; refresh
900 ; retry
604800 ; expire
300 ; minimum TTL / negative cache TTL
)
```
```bash
# BIND9: re-sign the zone
dnssec-signzone -A -3 $(head -c 6 /dev/random | base64) \
-N INCREMENT -o example.com -t db.example.com
sudo rndc reload example.com
# Or temporarily disable DNSSEC at the registrar:
# Remove DS records from the parent zone (registrar panel)
# DNS propagation: 24-48h
```
**Fix 2 — Restore unreachable authoritative nameserver**
```bash
# Restart BIND9
sudo systemctl restart named
# Open port 53 in firewall
sudo ufw allow 53/udp
sudo ufw allow 53/tcp
# Security Group (AWS) — add inbound rules:
# UDP 53 from 0.0.0.0/0, TCP 53 from 0.0.0.0/0
```
**Fix 3 — Force zone reload on secondary NS**
```bash
# BIND9 — trigger zone transfer
sudo rndc retransfer example.com
# Verify zone loaded
sudo rndc zonestatus example.com
```
**Fix 4 — Fix SOA record**
```dns
; Correct SOA format
example.com. 3600 IN SOA ns1.example.com. admin.example.com. (
2024010101 ; serial
3600 ; refresh
900 ; retry
604800 ; expire
300 ; minimum TTL / negative cache TTL
)
```
Профилактика
- Monitor DNSSEC validity continuously with tools like Zonemaster or DNSViz and alert on chain breaks
- Use two geographically separate authoritative nameservers for redundancy
- Set up zone transfer monitoring — alert if secondary NS has not successfully pulled in X hours
- Rotate DNSSEC ZSK keys according to your signing policy and automate with dnssec-keymgr
- Test DNS resolution from multiple public resolvers after every DNS change before closing the change window
- Use two geographically separate authoritative nameservers for redundancy
- Set up zone transfer monitoring — alert if secondary NS has not successfully pulled in X hours
- Rotate DNSSEC ZSK keys according to your signing policy and automate with dnssec-keymgr
- Test DNS resolution from multiple public resolvers after every DNS change before closing the change window