GRE Tunnel Keepalives Explained

Cisco GRE keepalives explained: mechanism, configuration, IPsec rekey interactions, vendor interop, and the 30-second failover default that just works

A GRE tunnel without keepalives is a tunnel that lies to you. It will report up / up as long as the local underlay route to the destination IP exists, even if the remote router has crashed and is not processing packets. Traffic black-holes, the routing protocols still believe they have a neighbor, and you are debugging blind. Cisco GRE keepalives fix this with a small, clever mechanism that adds liveness detection without requiring any cooperation from the remote end. This article explains how they work, how to configure and tune them, and the subtle ways they interact with IPsec.

The article is part of the PingLabz GRE Tunnels: The Complete Guide cluster.

Why GRE Needs Keepalives

GRE is stateless. Each packet stands on its own; there is no session, no sequence-number tracking by default, and no acknowledgment. The local Tunnel0 interface decides "up" versus "down" based on a single check: is there an active route in the IP routing table to the configured tunnel destination IP? If yes, the tunnel is up. The line protocol stays up indefinitely as long as the route exists, regardless of whether the far end is actually reachable, alive, or willing.

That gap matters in three real scenarios:

  • Remote router crash or reboot. The route to the underlay IP is still valid (the path through the carrier is fine), but R2 is not actually answering anything. R1's tunnel stays up. Routing protocol neighbors hold for as long as their dead-timer permits, then tear down, but the tunnel itself reports up. If you are not running a routing protocol over the tunnel, traffic black-holes silently.
  • Remote IPsec or interface failure. The remote Tunnel0 interface is configured but the IPsec profile or underlay interface is broken. Packets you send arrive at the underlay router but are never decapsulated. R1's view of "up / up" is wrong.
  • Asymmetric path failure. R1 can reach R2 but not vice versa. R1's tunnel is up. R2's tunnel may be up too. Traffic flows one way and is dropped the other. Without bidirectional liveness, neither end notices.

Keepalives turn the tunnel from "up if the underlay route exists" into "up if my keepalive is being acknowledged by the remote end."

How GRE Keepalives Work

The mechanism is elegant. The local router builds a small packet whose inner content is itself a GRE-encapsulated reply addressed to the local router's underlay IP. It sends that packet through the tunnel as a normal GRE packet. The remote router receives it, looks at the inner header, sees a GRE packet bound for the local router's underlay IP, and routes it back through the tunnel. The local router receives its own keepalive back and counts that as a successful round-trip.

The clever part: the remote router does not need any keepalive configuration. It just needs to be alive and processing GRE. As long as the remote box is functional enough to do basic IP routing, it will reflect the keepalive back automatically. There is no protocol negotiation, no version mismatch, no need for both ends to enable keepalives.

Each end runs its keepalive independently. R1's keepalives prove R2 is alive. R2's keepalives prove R1 is alive. They are unrelated. You can turn on keepalives on one end without changing the other end's config, and that one end will detect failures of the other.

Configuration

R1(config)# interface Tunnel0
R1(config-if)# keepalive 10 3

The two numbers are the interval (seconds) and the retry count. keepalive 10 3 means send a keepalive every 10 seconds and declare the tunnel down after 3 consecutive missed responses. So the worst-case detection time is 30 seconds (the third keepalive sent at second 30 plus its full timeout). The IOS XE default if you type just keepalive with no arguments is also 10 seconds and 3 retries.

For faster failover, you can tighten the interval:

R1(config-if)# keepalive 3 3

That gives 9-second detection, at the cost of 33 percent more keepalive traffic on the tunnel. Going below keepalive 1 3 is supported but the marginal benefit drops off and the chance of false positives during normal jitter rises. For sub-second failover requirements, BFD over the tunnel (where the platform supports it) is the better tool than aggressive keepalives.

Verifying Keepalives Are Working

R1# show interface Tunnel0 | include Keepalive
  Keepalive set (10 sec), retries 3

If keepalives are not configured, this shows Keepalive not set. Fix it.

To watch keepalives in action:

R1# debug tunnel keepalive
Tunnel keepalive debugging is on

R1#
*Apr 30 14:12:03.215: Tunnel0: GRE keepalive sent s=198.51.100.1, d=203.0.113.1
*Apr 30 14:12:03.219: Tunnel0: GRE keepalive recv s=203.0.113.1, d=198.51.100.1
*Apr 30 14:12:13.215: Tunnel0: GRE keepalive sent s=198.51.100.1, d=203.0.113.1
*Apr 30 14:12:13.220: Tunnel0: GRE keepalive recv s=203.0.113.1, d=198.51.100.1

Sent, received, sent, received. If you see only "sent" entries with no matching "recv" within the configured interval, the remote end is not reflecting them. That points at a remote router problem, an underlay path problem, or IPsec issues if the tunnel is wrapped.

Turn debug off when you are done: no debug tunnel keepalive or undebug all.

Keepalives and IPsec

This is where the subtle gotchas live. When you wrap GRE in IPsec, keepalives are still GRE packets, but they now have to traverse the IPsec encryption. Three interactions matter:

  • IPsec rekey timing. When the IPsec SA rekeys (default lifetime is typically 3,600 seconds for IKEv2), there is a brief window where outbound traffic uses the new SA but the remote end has not switched. A keepalive sent in that window may be dropped. With keepalive 10 3 you can ride through a rekey because the next keepalive 10 seconds later will use the established new SA. With keepalive 1 3 you are vulnerable to false-positive tunnel-down events at every rekey.
  • DPD (Dead Peer Detection) overlap. IPsec has its own liveness check called DPD. If you have both DPD on the IPsec session and GRE keepalives, you have two redundant liveness mechanisms. They do not conflict, but they should be tuned consistently. A common pattern is GRE keepalive 10 3 plus IPsec DPD interval 30. The GRE side detects faster; DPD provides a safety net for the IPsec layer specifically.
  • NAT-T keepalives. If either tunnel endpoint sits behind NAT, IOS XE sends NAT-T keepalives every 20 seconds (default) to maintain the UDP 4500 NAT pinhole. Those are unrelated to GRE keepalives but show up as additional traffic. Do not confuse the two during traffic-analysis.

Vendor Interoperability

GRE keepalives are a Cisco extension to GRE. They are not in RFC 2784 or RFC 2890. Other vendors have their own approaches:

Vendor / OSGRE keepalive supportNotes
Cisco IOS / IOS XE / IOS XRYes (this article)Reference implementation
Juniper JunosLimited; OAM via Service PICNot the same packet format; do not assume interop
Arista EOSYes, Cisco-styleGenerally interoperates with Cisco
Linux (kernel GRE)No nativeUse BFD over the tunnel or a userland keepalive
FortiGateLimitedMost FortiGate GRE deployments rely on routing-protocol dead-timers
MikroTik RouterOSYesCisco-compatible keepalive packet format

For Cisco-to-non-Cisco GRE tunnels, do not assume keepalives work bidirectionally. Test both directions. If keepalives do not interoperate, fall back to running a routing protocol with aggressive timers across the tunnel, or use BFD if both platforms support it.

Alternatives to GRE Keepalives

  • BFD over the tunnel. Bidirectional Forwarding Detection. Sub-second failure detection, much more efficient than tightly-tuned keepalives, and standardized. Supported on most modern Cisco platforms. Use it when you need millisecond-grade failover and your platform / IOS XE version supports BFD on tunnel interfaces.
  • Routing-protocol dead-timers. If you are running OSPF or EIGRP across the tunnel, the routing protocol's hello / dead-timer is itself a liveness check. Tune dead-timer aggressively (down to 3 seconds for OSPF) and you have effectively-instant detection without GRE keepalives. The downside: if the routing protocol breaks for a different reason (LSA flap, neighbor mismatch), the tunnel reports "up" but unusable.
  • NHRP for DMVPN spokes. In a DMVPN deployment, NHRP messages between spokes and hubs serve as liveness checks for the dynamic spoke-to-spoke tunnels. GRE keepalives on the static spoke-to-hub tunnels are still common; the spoke-to-spoke side relies on NHRP holdtimes.

Troubleshooting Keepalive Failures

SymptomLikely causeWhere to check
Tunnel flaps every 30 seconds with no other obvious changeKeepalives configured but not returningdebug tunnel keepalive on both ends
Tunnel up but traffic black-holesKeepalives never enabledshow interface Tunnel0 | inc Keepalive
Keepalives returning but routing protocol diesRouting-protocol issue, not GREshow ip ospf neighbor / show ip eigrp neighbors
Keepalives flap during IPsec rekey onlyAggressive timers + IPsec rekey windowLoosen keepalive to 10 3 or stagger rekey lifetime
Cisco-to-non-Cisco tunnel: keepalives one-wayVendor-specific format mismatchDisable on the non-Cisco end and rely on routing-protocol liveness

Summary

GRE keepalives are the bare minimum hygiene for any production GRE tunnel. They cost almost nothing in bandwidth or CPU, they require no remote-end cooperation, and they catch the failure modes that vanilla GRE tunnel-state cannot. The default keepalive 10 3 is a sensible starting point: 30-second detection, immune to normal IPsec rekey jitter, low overhead. Tighten only if you have a measured need and you are sure the path can support it.

If you take one thing away from this article, take this: every time you build a GRE tunnel, set keepalives in the same config block. There is no production scenario where leaving them off improves anything. Combined with ip mtu 1400 from the MTU article and the recursive-routing static from the config lab, you have eliminated the three most common GRE production failures before the first user complains. The full cluster is at PingLabz GRE.

Read next

© 2025 Ping Labz. All rights reserved.