GRE Tunnel Troubleshooting Guide

Step-by-step GRE tunnel troubleshooting on Cisco IOS XE. Five-minute triage, recursive routing, MTU drops, keepalive flap, debug commands, and EPC cap

This is the field guide for diagnosing GRE tunnel failures on Cisco IOS XE. Every failure mode is paired with the symptoms you will see, the show commands that confirm the diagnosis, and the fix. The order is roughly "most common first," so if you are 30 minutes into an incident and need a starting point, work down the list and stop when something matches.

This is part of the PingLabz GRE Tunnels: The Complete Guide. For protocol theory and configuration, start there.

First Checks: The Five-Minute Triage

Before any deep debugging, run these five commands. They will tell you which sub-system is broken and where to focus.

R1# show interface Tunnel0 | include Tunnel|line proto|MTU|Keepalive|source|dest
Tunnel0 is up, line protocol is up
  MTU 17916 bytes, ...
  Keepalive set (10 sec), retries 3
  Tunnel source 198.51.100.1, destination 203.0.113.1
  Tunnel transport MTU 1476 bytes

R1# show ip route 203.0.113.1
Routing entry for 203.0.113.1/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 198.51.100.2
      Route metric is 0, traffic share count is 1

R1# ping 203.0.113.1 source 198.51.100.1
!!!!!
Success rate is 100 percent (5/5)

R1# ping 10.0.0.2 source 10.0.0.1
!!!!!
Success rate is 100 percent (5/5)

R1# ping 192.168.2.1 source 192.168.1.1 size 1400 df-bit
!!!!!
Success rate is 100 percent (5/5)

If all five pass, the tunnel is working at the IP level. Failures point at narrowly which layer is broken:

Step that failsLayer that is broken
Tunnel0 line protocol downLocal config or underlay route
Underlay ping failsUnderlay connectivity
Overlay ping failsGRE encapsulation / firewall blocking IP 47
Overlay 1400 df-bit failsMTU problem on the path
LAN-to-LAN ping failsRouting / static routes / routing protocol

Recursive Routing

Symptom: Tunnel flaps every 30 seconds to a few minutes. Log shows %TUN-5-RECURDOWN: Tunnel0 temporarily disabled due to recursive routing.

Cause: The route to the tunnel destination IP is itself learned through the tunnel (or could be learned through it). The router cannot encapsulate packets to the tunnel destination if the only way it knows to reach the destination is via the tunnel.

Diagnosis:

R1# show ip route 203.0.113.1
Routing entry for 203.0.113.1/32
  Known via "ospf 1", distance 110, metric 11000
  Routing Descriptor Blocks:
  * 10.0.0.2, from 10.0.0.2, ..., via Tunnel0  <- tunnel as next-hop = recursive

Fix: Static route to the tunnel destination via the underlay, with administrative distance 1 to override anything the routing protocol learns.

R1(config)# ip route 203.0.113.1 255.255.255.255 198.51.100.2

Or distribute-list out the underlay subnet from the tunnel-side routing protocol. The static-route fix is bulletproof and standard practice.

Firewall Blocking IP Protocol 47

The classic "tunnel works in lab, fails in production" failure.

Symptom: Tunnel0 shows up / up. Underlay ping between tunnel sources works. Overlay ping (10.0.0.1 to 10.0.0.2) fails with no response.

Cause: A stateful firewall in the underlay path drops IP protocol 47 (GRE) because it does not match TCP, UDP, or ICMP. The firewall sees the GRE packets as some weird non-standard protocol and silently drops them.

Diagnosis:

R1# show interface Tunnel0 | include packets
     Input  packets : 0
     Output packets : 5234

Output packets growing, input packets at zero, is the unmistakable sign that one direction (R1 -> R2) is leaving but nothing (or very little) is coming back. Confirm with packet captures on the underlay - SPAN both endpoints' WAN interfaces, look for IP protocol 47 in both directions.

Fix: Add an explicit allow rule for IP protocol 47 between the two tunnel-source IPs on every firewall in the path.

! On a Cisco ASA / FTD
access-list OUTSIDE-IN extended permit gre host 203.0.113.1 host 198.51.100.1
access-list INSIDE-OUT extended permit gre host 198.51.100.1 host 203.0.113.1

MTU and Fragmentation Problems

Symptom: Small packets work. Pings work. Some applications work fine. Others stall, time out, or load partially. HTTPS to specific sites hangs.

Cause: Inner packets are too large for the tunnel after GRE+IPsec encapsulation. Without MSS clamping, TCP endpoints negotiate a segment size based on a 1500-byte assumption, and the resulting full-size segments either get dropped (DF=1 + ICMP filtered = PMTUD black hole) or fragmented (CPU-expensive).

Diagnosis:

R1# ping 10.0.0.2 size 1400 df-bit
!!!!!
R1# ping 10.0.0.2 size 1500 df-bit
M.M.M
Success rate is 0 percent (0/5)

The "M" output is "could not fragment" - exactly the symptom that a real DF=1 packet would experience.

Fix: The two-line standard fix on both ends:

interface Tunnel0
 ip mtu 1400
 ip tcp adjust-mss 1360

Full deep-dive on the math, IPv6 considerations, and how to walk the size up to find your real path MTU is at GRE MTU and Fragmentation: Fixing Tunnel Packet Loss.

Keepalive Flap with IPsec Rekey

Symptom: Tunnel up most of the time but goes down briefly, every hour or so, with logs showing Tunnel0 line protocol changed state to down followed quickly by up.

Cause: GRE keepalives time out during the IPsec SA rekey window. The default IKEv2 lifetime is 3,600 seconds (1 hour), which lines up with the symptom timing.

Diagnosis:

R1# show crypto ikev2 sa
Tunnel-id Local      Remote     fvrf/ivrf  Status
1         198.51.100.1/500  203.0.113.1/500  none/none  READY
      Life/Active Time: 3600/3540 sec   <- about to rekey

If "Active Time" is close to "Life" and the tunnel just flapped, you have correlated the events.

Fix: Loosen the keepalive retry count or interval so it can ride through a brief rekey window:

interface Tunnel0
 keepalive 10 5     ! 50-second timeout instead of 30

Or stagger the IPsec lifetimes on the two ends so they do not rekey at the same exact second. Or migrate to BFD for faster, more reliable liveness detection.

OSPF Neighbor Stuck in INIT

Symptom: Tunnel up. Overlay ping works. OSPF neighbor relationship reaches INIT or 2-WAY but never goes to FULL.

Cause: OSPF hellos one-way. Either an ACL is blocking 224.0.0.5 inbound on one end, the OSPF authentication is mismatched, the OSPF area numbers do not match, or the network types do not match.

Diagnosis:

R1# debug ip ospf hello
*Apr 30 14:35:12: OSPF: Send hello to 224.0.0.5 area 0 on Tunnel0 from 10.0.0.1
*Apr 30 14:35:22: OSPF: Send hello to 224.0.0.5 area 0 on Tunnel0 from 10.0.0.1
! No "Rcv hello" entries

If R1 only sends and never receives, R2's hellos are being dropped on the way to R1. Check on R2:

R2# debug ip ospf hello
*Apr 30 14:35:14: OSPF: Send hello to 224.0.0.5 area 0 on Tunnel0 from 10.0.0.2
! On R2 you see hellos sent in both directions but R2 also sees no "Rcv hello"

Both ends sending, neither receiving = something in between is dropping the multicast. Check the firewall path; the standard "permit gre" rule should permit the GRE-encapsulated multicast but custom ACLs can interfere.

Fix: Verify both ends have matching area, hello/dead timers, network type, and authentication. Run show ip ospf interface Tunnel0 on each side and compare.

Tunnel0 Line Protocol Down

Symptom: Tunnel0 reports down/down or up/down. Configuration looks correct.

Causes (in order of likelihood):

  1. The tunnel source IP does not exist on the local router. Either the configured source-interface is down, or the configured source IP was misspelled.
  2. There is no route in the IP routing table to the tunnel destination.
  3. The tunnel destination cannot be the same as the tunnel source.

Diagnosis:

R1# show interface Tunnel0 | include source|destination|line
Tunnel0 is up, line protocol is down
  Tunnel source 198.51.100.1, destination 203.0.113.1

R1# show ip route 203.0.113.1
% Network not in table

The "Network not in table" line is the smoking gun.

Fix: Add a route to the tunnel destination via whatever next-hop the underlay requires.

R1(config)# ip route 203.0.113.1 255.255.255.255 198.51.100.2

Useful Debugs

Each debug is paired with what it shows. Run them on a lab tunnel first; on a production tunnel they can produce a lot of output.

CommandWhat it shows
debug tunnel keepaliveGRE keepalive packets sent and received
debug tunnelAll tunnel-state transitions and processing events
debug ip ospf helloOSPF Hello packets sent and received
debug crypto ikev2IKEv2 SA negotiation and rekey events
debug crypto ipsecIPsec SA install and tear-down events
debug ip packet detail (with ACL!)Per-packet IP processing on the router. Always restrict to a small ACL or you will fill the console

For debug ip packet detail, restrict scope tightly:

R1(config)# ip access-list extended DBG
R1(config-ext-nacl)# permit ip host 198.51.100.1 host 203.0.113.1
R1(config-ext-nacl)# permit ip host 203.0.113.1 host 198.51.100.1
R1# debug ip packet detail 100
! 100 references the access-list-extended ACL number; for named ACLs use 'list DBG'

Always remember to undebug all when you are done. A debug left running in production has consumed many a router CPU.

Packet Capture Strategy

For problems that resist the show-and-debug approach, capture the actual packets. Embedded Packet Capture (EPC) on IOS XE captures to a buffer or file:

R1(config)# ip access-list extended GRE-CAPTURE
R1(config-ext-nacl)# permit gre host 198.51.100.1 host 203.0.113.1
R1(config-ext-nacl)# permit gre host 203.0.113.1 host 198.51.100.1

R1# monitor capture CAP interface GigabitEthernet1 both
R1# monitor capture CAP access-list GRE-CAPTURE
R1# monitor capture CAP buffer size 5
R1# monitor capture CAP start
! ... wait for the issue to reproduce ...
R1# monitor capture CAP stop
R1# show monitor capture CAP buffer brief
R1# monitor capture CAP export bootflash:gre.pcap

Open the resulting pcap in Wireshark on your laptop. GRE packets show up as IP protocol 47, the GRE header includes the encapsulated protocol type, and Wireshark decodes the inner packet automatically. Looking at a real packet capture is the fastest way to confirm whether the GRE encapsulation is what you expected, whether keepalives are being reflected, and whether IPsec is wrapping the GRE correctly.

When to Escalate

If the tunnel works in lab but fails over a specific carrier path, the problem may be a middlebox you cannot reach. Common culprits:

  • ISP CGNAT that does not handle IP protocol 47.
  • Carrier MPLS L3VPN with restrictive ACLs that drop GRE.
  • Customer-edge firewall with deep packet inspection that rewrites GRE keepalive payloads.
  • 5G mobile-broadband links with 1380-byte underlay MTU instead of 1500 (drop your ip mtu to 1300 to test).

Open a ticket with the carrier or middlebox owner with packet captures from both ends and timestamps that line up. Most carrier "GRE does not work" issues turn out to be a default deny on a transit firewall that gets fixed quickly once they see the captures.

Summary

GRE is robust enough that production failures fall into a small number of recognizable patterns: recursive routing, IP protocol 47 blocked, MTU mismatch, keepalive flap during IPsec rekey, and routing-protocol issues that look like GRE issues but are not. Run the five-minute triage first. Match the symptom against the table above. Apply the corresponding fix.

If you bookmark one thing, bookmark the five-minute triage at the top of this article. It distinguishes between "the tunnel is broken" and "the routing on top of the tunnel is broken," and that distinction shapes the entire rest of the debugging session. The full GRE coverage is at the PingLabz GRE pillar.

Read next

© 2025 Ping Labz. All rights reserved.