A GRE tunnel without keepalives is a tunnel that lies to you. It will report up / up as long as the local underlay route to the destination IP exists, even if the remote router has crashed and is not processing packets. Traffic black-holes, the routing protocols still believe they have a neighbor, and you are debugging blind. Cisco GRE keepalives fix this with a small, clever mechanism that adds liveness detection without requiring any cooperation from the remote end. This article explains how they work, how to configure and tune them, and the subtle ways they interact with IPsec.
The article is part of the PingLabz GRE Tunnels: The Complete Guide cluster.
Why GRE Needs Keepalives
GRE is stateless. Each packet stands on its own; there is no session, no sequence-number tracking by default, and no acknowledgment. The local Tunnel0 interface decides "up" versus "down" based on a single check: is there an active route in the IP routing table to the configured tunnel destination IP? If yes, the tunnel is up. The line protocol stays up indefinitely as long as the route exists, regardless of whether the far end is actually reachable, alive, or willing.
That gap matters in three real scenarios:
- Remote router crash or reboot. The route to the underlay IP is still valid (the path through the carrier is fine), but R2 is not actually answering anything. R1's tunnel stays up. Routing protocol neighbors hold for as long as their dead-timer permits, then tear down, but the tunnel itself reports up. If you are not running a routing protocol over the tunnel, traffic black-holes silently.
- Remote IPsec or interface failure. The remote Tunnel0 interface is configured but the IPsec profile or underlay interface is broken. Packets you send arrive at the underlay router but are never decapsulated. R1's view of "up / up" is wrong.
- Asymmetric path failure. R1 can reach R2 but not vice versa. R1's tunnel is up. R2's tunnel may be up too. Traffic flows one way and is dropped the other. Without bidirectional liveness, neither end notices.
Keepalives turn the tunnel from "up if the underlay route exists" into "up if my keepalive is being acknowledged by the remote end."
How GRE Keepalives Work
The mechanism is elegant. The local router builds a small packet whose inner content is itself a GRE-encapsulated reply addressed to the local router's underlay IP. It sends that packet through the tunnel as a normal GRE packet. The remote router receives it, looks at the inner header, sees a GRE packet bound for the local router's underlay IP, and routes it back through the tunnel. The local router receives its own keepalive back and counts that as a successful round-trip.
The clever part: the remote router does not need any keepalive configuration. It just needs to be alive and processing GRE. As long as the remote box is functional enough to do basic IP routing, it will reflect the keepalive back automatically. There is no protocol negotiation, no version mismatch, no need for both ends to enable keepalives.
Each end runs its keepalive independently. R1's keepalives prove R2 is alive. R2's keepalives prove R1 is alive. They are unrelated. You can turn on keepalives on one end without changing the other end's config, and that one end will detect failures of the other.
Configuration
R1(config)# interface Tunnel0
R1(config-if)# keepalive 10 3The two numbers are the interval (seconds) and the retry count. keepalive 10 3 means send a keepalive every 10 seconds and declare the tunnel down after 3 consecutive missed responses. So the worst-case detection time is 30 seconds (the third keepalive sent at second 30 plus its full timeout). The IOS XE default if you type just keepalive with no arguments is also 10 seconds and 3 retries.
For faster failover, you can tighten the interval:
R1(config-if)# keepalive 3 3That gives 9-second detection, at the cost of 33 percent more keepalive traffic on the tunnel. Going below keepalive 1 3 is supported but the marginal benefit drops off and the chance of false positives during normal jitter rises. For sub-second failover requirements, BFD over the tunnel (where the platform supports it) is the better tool than aggressive keepalives.
Verifying Keepalives Are Working
R1# show interface Tunnel0 | include Keepalive
Keepalive set (10 sec), retries 3If keepalives are not configured, this shows Keepalive not set. Fix it.
To watch keepalives in action:
R1# debug tunnel keepalive
Tunnel keepalive debugging is on
R1#
*Apr 30 14:12:03.215: Tunnel0: GRE keepalive sent s=198.51.100.1, d=203.0.113.1
*Apr 30 14:12:03.219: Tunnel0: GRE keepalive recv s=203.0.113.1, d=198.51.100.1
*Apr 30 14:12:13.215: Tunnel0: GRE keepalive sent s=198.51.100.1, d=203.0.113.1
*Apr 30 14:12:13.220: Tunnel0: GRE keepalive recv s=203.0.113.1, d=198.51.100.1Sent, received, sent, received. If you see only "sent" entries with no matching "recv" within the configured interval, the remote end is not reflecting them. That points at a remote router problem, an underlay path problem, or IPsec issues if the tunnel is wrapped.
Turn debug off when you are done: no debug tunnel keepalive or undebug all.
Keepalives and IPsec
This is where the subtle gotchas live. When you wrap GRE in IPsec, keepalives are still GRE packets, but they now have to traverse the IPsec encryption. Three interactions matter:
- IPsec rekey timing. When the IPsec SA rekeys (default lifetime is typically 3,600 seconds for IKEv2), there is a brief window where outbound traffic uses the new SA but the remote end has not switched. A keepalive sent in that window may be dropped. With
keepalive 10 3you can ride through a rekey because the next keepalive 10 seconds later will use the established new SA. Withkeepalive 1 3you are vulnerable to false-positive tunnel-down events at every rekey. - DPD (Dead Peer Detection) overlap. IPsec has its own liveness check called DPD. If you have both DPD on the IPsec session and GRE keepalives, you have two redundant liveness mechanisms. They do not conflict, but they should be tuned consistently. A common pattern is GRE keepalive 10 3 plus IPsec DPD interval 30. The GRE side detects faster; DPD provides a safety net for the IPsec layer specifically.
- NAT-T keepalives. If either tunnel endpoint sits behind NAT, IOS XE sends NAT-T keepalives every 20 seconds (default) to maintain the UDP 4500 NAT pinhole. Those are unrelated to GRE keepalives but show up as additional traffic. Do not confuse the two during traffic-analysis.
Vendor Interoperability
GRE keepalives are a Cisco extension to GRE. They are not in RFC 2784 or RFC 2890. Other vendors have their own approaches:
| Vendor / OS | GRE keepalive support | Notes |
|---|---|---|
| Cisco IOS / IOS XE / IOS XR | Yes (this article) | Reference implementation |
| Juniper Junos | Limited; OAM via Service PIC | Not the same packet format; do not assume interop |
| Arista EOS | Yes, Cisco-style | Generally interoperates with Cisco |
| Linux (kernel GRE) | No native | Use BFD over the tunnel or a userland keepalive |
| FortiGate | Limited | Most FortiGate GRE deployments rely on routing-protocol dead-timers |
| MikroTik RouterOS | Yes | Cisco-compatible keepalive packet format |
For Cisco-to-non-Cisco GRE tunnels, do not assume keepalives work bidirectionally. Test both directions. If keepalives do not interoperate, fall back to running a routing protocol with aggressive timers across the tunnel, or use BFD if both platforms support it.
Alternatives to GRE Keepalives
- BFD over the tunnel. Bidirectional Forwarding Detection. Sub-second failure detection, much more efficient than tightly-tuned keepalives, and standardized. Supported on most modern Cisco platforms. Use it when you need millisecond-grade failover and your platform / IOS XE version supports BFD on tunnel interfaces.
- Routing-protocol dead-timers. If you are running OSPF or EIGRP across the tunnel, the routing protocol's hello / dead-timer is itself a liveness check. Tune dead-timer aggressively (down to 3 seconds for OSPF) and you have effectively-instant detection without GRE keepalives. The downside: if the routing protocol breaks for a different reason (LSA flap, neighbor mismatch), the tunnel reports "up" but unusable.
- NHRP for DMVPN spokes. In a DMVPN deployment, NHRP messages between spokes and hubs serve as liveness checks for the dynamic spoke-to-spoke tunnels. GRE keepalives on the static spoke-to-hub tunnels are still common; the spoke-to-spoke side relies on NHRP holdtimes.
Troubleshooting Keepalive Failures
| Symptom | Likely cause | Where to check |
|---|---|---|
| Tunnel flaps every 30 seconds with no other obvious change | Keepalives configured but not returning | debug tunnel keepalive on both ends |
| Tunnel up but traffic black-holes | Keepalives never enabled | show interface Tunnel0 | inc Keepalive |
| Keepalives returning but routing protocol dies | Routing-protocol issue, not GRE | show ip ospf neighbor / show ip eigrp neighbors |
| Keepalives flap during IPsec rekey only | Aggressive timers + IPsec rekey window | Loosen keepalive to 10 3 or stagger rekey lifetime |
| Cisco-to-non-Cisco tunnel: keepalives one-way | Vendor-specific format mismatch | Disable on the non-Cisco end and rely on routing-protocol liveness |
Summary
GRE keepalives are the bare minimum hygiene for any production GRE tunnel. They cost almost nothing in bandwidth or CPU, they require no remote-end cooperation, and they catch the failure modes that vanilla GRE tunnel-state cannot. The default keepalive 10 3 is a sensible starting point: 30-second detection, immune to normal IPsec rekey jitter, low overhead. Tighten only if you have a measured need and you are sure the path can support it.
If you take one thing away from this article, take this: every time you build a GRE tunnel, set keepalives in the same config block. There is no production scenario where leaving them off improves anything. Combined with ip mtu 1400 from the MTU article and the recursive-routing static from the config lab, you have eliminated the three most common GRE production failures before the first user complains. The full cluster is at PingLabz GRE.