This is the field guide for diagnosing GRE tunnel failures on Cisco IOS XE. Every failure mode is paired with the symptoms you will see, the show commands that confirm the diagnosis, and the fix. The order is roughly "most common first," so if you are 30 minutes into an incident and need a starting point, work down the list and stop when something matches.
This is part of the PingLabz GRE Tunnels: The Complete Guide. For protocol theory and configuration, start there.
First Checks: The Five-Minute Triage
Before any deep debugging, run these five commands. They will tell you which sub-system is broken and where to focus.
R1# show interface Tunnel0 | include Tunnel|line proto|MTU|Keepalive|source|dest
Tunnel0 is up, line protocol is up
MTU 17916 bytes, ...
Keepalive set (10 sec), retries 3
Tunnel source 198.51.100.1, destination 203.0.113.1
Tunnel transport MTU 1476 bytes
R1# show ip route 203.0.113.1
Routing entry for 203.0.113.1/32
Known via "static", distance 1, metric 0
Routing Descriptor Blocks:
* 198.51.100.2
Route metric is 0, traffic share count is 1
R1# ping 203.0.113.1 source 198.51.100.1
!!!!!
Success rate is 100 percent (5/5)
R1# ping 10.0.0.2 source 10.0.0.1
!!!!!
Success rate is 100 percent (5/5)
R1# ping 192.168.2.1 source 192.168.1.1 size 1400 df-bit
!!!!!
Success rate is 100 percent (5/5)If all five pass, the tunnel is working at the IP level. Failures point at narrowly which layer is broken:
| Step that fails | Layer that is broken |
|---|---|
| Tunnel0 line protocol down | Local config or underlay route |
| Underlay ping fails | Underlay connectivity |
| Overlay ping fails | GRE encapsulation / firewall blocking IP 47 |
| Overlay 1400 df-bit fails | MTU problem on the path |
| LAN-to-LAN ping fails | Routing / static routes / routing protocol |
Recursive Routing
Symptom: Tunnel flaps every 30 seconds to a few minutes. Log shows %TUN-5-RECURDOWN: Tunnel0 temporarily disabled due to recursive routing.
Cause: The route to the tunnel destination IP is itself learned through the tunnel (or could be learned through it). The router cannot encapsulate packets to the tunnel destination if the only way it knows to reach the destination is via the tunnel.
Diagnosis:
R1# show ip route 203.0.113.1
Routing entry for 203.0.113.1/32
Known via "ospf 1", distance 110, metric 11000
Routing Descriptor Blocks:
* 10.0.0.2, from 10.0.0.2, ..., via Tunnel0 <- tunnel as next-hop = recursiveFix: Static route to the tunnel destination via the underlay, with administrative distance 1 to override anything the routing protocol learns.
R1(config)# ip route 203.0.113.1 255.255.255.255 198.51.100.2Or distribute-list out the underlay subnet from the tunnel-side routing protocol. The static-route fix is bulletproof and standard practice.
Firewall Blocking IP Protocol 47
The classic "tunnel works in lab, fails in production" failure.
Symptom: Tunnel0 shows up / up. Underlay ping between tunnel sources works. Overlay ping (10.0.0.1 to 10.0.0.2) fails with no response.
Cause: A stateful firewall in the underlay path drops IP protocol 47 (GRE) because it does not match TCP, UDP, or ICMP. The firewall sees the GRE packets as some weird non-standard protocol and silently drops them.
Diagnosis:
R1# show interface Tunnel0 | include packets
Input packets : 0
Output packets : 5234Output packets growing, input packets at zero, is the unmistakable sign that one direction (R1 -> R2) is leaving but nothing (or very little) is coming back. Confirm with packet captures on the underlay - SPAN both endpoints' WAN interfaces, look for IP protocol 47 in both directions.
Fix: Add an explicit allow rule for IP protocol 47 between the two tunnel-source IPs on every firewall in the path.
! On a Cisco ASA / FTD
access-list OUTSIDE-IN extended permit gre host 203.0.113.1 host 198.51.100.1
access-list INSIDE-OUT extended permit gre host 198.51.100.1 host 203.0.113.1MTU and Fragmentation Problems
Symptom: Small packets work. Pings work. Some applications work fine. Others stall, time out, or load partially. HTTPS to specific sites hangs.
Cause: Inner packets are too large for the tunnel after GRE+IPsec encapsulation. Without MSS clamping, TCP endpoints negotiate a segment size based on a 1500-byte assumption, and the resulting full-size segments either get dropped (DF=1 + ICMP filtered = PMTUD black hole) or fragmented (CPU-expensive).
Diagnosis:
R1# ping 10.0.0.2 size 1400 df-bit
!!!!!
R1# ping 10.0.0.2 size 1500 df-bit
M.M.M
Success rate is 0 percent (0/5)The "M" output is "could not fragment" - exactly the symptom that a real DF=1 packet would experience.
Fix: The two-line standard fix on both ends:
interface Tunnel0
ip mtu 1400
ip tcp adjust-mss 1360Full deep-dive on the math, IPv6 considerations, and how to walk the size up to find your real path MTU is at GRE MTU and Fragmentation: Fixing Tunnel Packet Loss.
Keepalive Flap with IPsec Rekey
Symptom: Tunnel up most of the time but goes down briefly, every hour or so, with logs showing Tunnel0 line protocol changed state to down followed quickly by up.
Cause: GRE keepalives time out during the IPsec SA rekey window. The default IKEv2 lifetime is 3,600 seconds (1 hour), which lines up with the symptom timing.
Diagnosis:
R1# show crypto ikev2 sa
Tunnel-id Local Remote fvrf/ivrf Status
1 198.51.100.1/500 203.0.113.1/500 none/none READY
Life/Active Time: 3600/3540 sec <- about to rekeyIf "Active Time" is close to "Life" and the tunnel just flapped, you have correlated the events.
Fix: Loosen the keepalive retry count or interval so it can ride through a brief rekey window:
interface Tunnel0
keepalive 10 5 ! 50-second timeout instead of 30Or stagger the IPsec lifetimes on the two ends so they do not rekey at the same exact second. Or migrate to BFD for faster, more reliable liveness detection.
OSPF Neighbor Stuck in INIT
Symptom: Tunnel up. Overlay ping works. OSPF neighbor relationship reaches INIT or 2-WAY but never goes to FULL.
Cause: OSPF hellos one-way. Either an ACL is blocking 224.0.0.5 inbound on one end, the OSPF authentication is mismatched, the OSPF area numbers do not match, or the network types do not match.
Diagnosis:
R1# debug ip ospf hello
*Apr 30 14:35:12: OSPF: Send hello to 224.0.0.5 area 0 on Tunnel0 from 10.0.0.1
*Apr 30 14:35:22: OSPF: Send hello to 224.0.0.5 area 0 on Tunnel0 from 10.0.0.1
! No "Rcv hello" entriesIf R1 only sends and never receives, R2's hellos are being dropped on the way to R1. Check on R2:
R2# debug ip ospf hello
*Apr 30 14:35:14: OSPF: Send hello to 224.0.0.5 area 0 on Tunnel0 from 10.0.0.2
! On R2 you see hellos sent in both directions but R2 also sees no "Rcv hello"Both ends sending, neither receiving = something in between is dropping the multicast. Check the firewall path; the standard "permit gre" rule should permit the GRE-encapsulated multicast but custom ACLs can interfere.
Fix: Verify both ends have matching area, hello/dead timers, network type, and authentication. Run show ip ospf interface Tunnel0 on each side and compare.
Tunnel0 Line Protocol Down
Symptom: Tunnel0 reports down/down or up/down. Configuration looks correct.
Causes (in order of likelihood):
- The tunnel source IP does not exist on the local router. Either the configured source-interface is down, or the configured source IP was misspelled.
- There is no route in the IP routing table to the tunnel destination.
- The tunnel destination cannot be the same as the tunnel source.
Diagnosis:
R1# show interface Tunnel0 | include source|destination|line
Tunnel0 is up, line protocol is down
Tunnel source 198.51.100.1, destination 203.0.113.1
R1# show ip route 203.0.113.1
% Network not in tableThe "Network not in table" line is the smoking gun.
Fix: Add a route to the tunnel destination via whatever next-hop the underlay requires.
R1(config)# ip route 203.0.113.1 255.255.255.255 198.51.100.2Useful Debugs
Each debug is paired with what it shows. Run them on a lab tunnel first; on a production tunnel they can produce a lot of output.
| Command | What it shows |
|---|---|
debug tunnel keepalive | GRE keepalive packets sent and received |
debug tunnel | All tunnel-state transitions and processing events |
debug ip ospf hello | OSPF Hello packets sent and received |
debug crypto ikev2 | IKEv2 SA negotiation and rekey events |
debug crypto ipsec | IPsec SA install and tear-down events |
debug ip packet detail (with ACL!) | Per-packet IP processing on the router. Always restrict to a small ACL or you will fill the console |
For debug ip packet detail, restrict scope tightly:
R1(config)# ip access-list extended DBG
R1(config-ext-nacl)# permit ip host 198.51.100.1 host 203.0.113.1
R1(config-ext-nacl)# permit ip host 203.0.113.1 host 198.51.100.1
R1# debug ip packet detail 100
! 100 references the access-list-extended ACL number; for named ACLs use 'list DBG'Always remember to undebug all when you are done. A debug left running in production has consumed many a router CPU.
Packet Capture Strategy
For problems that resist the show-and-debug approach, capture the actual packets. Embedded Packet Capture (EPC) on IOS XE captures to a buffer or file:
R1(config)# ip access-list extended GRE-CAPTURE
R1(config-ext-nacl)# permit gre host 198.51.100.1 host 203.0.113.1
R1(config-ext-nacl)# permit gre host 203.0.113.1 host 198.51.100.1
R1# monitor capture CAP interface GigabitEthernet1 both
R1# monitor capture CAP access-list GRE-CAPTURE
R1# monitor capture CAP buffer size 5
R1# monitor capture CAP start
! ... wait for the issue to reproduce ...
R1# monitor capture CAP stop
R1# show monitor capture CAP buffer brief
R1# monitor capture CAP export bootflash:gre.pcapOpen the resulting pcap in Wireshark on your laptop. GRE packets show up as IP protocol 47, the GRE header includes the encapsulated protocol type, and Wireshark decodes the inner packet automatically. Looking at a real packet capture is the fastest way to confirm whether the GRE encapsulation is what you expected, whether keepalives are being reflected, and whether IPsec is wrapping the GRE correctly.
When to Escalate
If the tunnel works in lab but fails over a specific carrier path, the problem may be a middlebox you cannot reach. Common culprits:
- ISP CGNAT that does not handle IP protocol 47.
- Carrier MPLS L3VPN with restrictive ACLs that drop GRE.
- Customer-edge firewall with deep packet inspection that rewrites GRE keepalive payloads.
- 5G mobile-broadband links with 1380-byte underlay MTU instead of 1500 (drop your
ip mtuto 1300 to test).
Open a ticket with the carrier or middlebox owner with packet captures from both ends and timestamps that line up. Most carrier "GRE does not work" issues turn out to be a default deny on a transit firewall that gets fixed quickly once they see the captures.
Summary
GRE is robust enough that production failures fall into a small number of recognizable patterns: recursive routing, IP protocol 47 blocked, MTU mismatch, keepalive flap during IPsec rekey, and routing-protocol issues that look like GRE issues but are not. Run the five-minute triage first. Match the symptom against the table above. Apply the corresponding fix.
If you bookmark one thing, bookmark the five-minute triage at the top of this article. It distinguishes between "the tunnel is broken" and "the routing on top of the tunnel is broken," and that distinction shapes the entire rest of the debugging session. The full GRE coverage is at the PingLabz GRE pillar.