GRE Tunnel Keepalives: How They Work and How to Configure

A GRE tunnel without keepalives is a tunnel that lies to you. It will report up / up as long as the local underlay route to the destination IP exists, even if the remote router has crashed and is not processing packets. Traffic black-holes, the routing protocols still believe they have a neighbor, and you are debugging blind. Cisco GRE keepalives fix this with a small, clever mechanism that adds liveness detection without requiring any cooperation from the remote end. This article explains how they work, how to configure and tune them, and the subtle ways they interact with IPsec.

The article is part of the PingLabz GRE Tunnels: The Complete Guide cluster.

Why GRE Needs Keepalives

GRE is stateless. Each packet stands on its own; there is no session, no sequence-number tracking by default, and no acknowledgment. The local Tunnel0 interface decides "up" versus "down" based on a single check: is there an active route in the IP routing table to the configured tunnel destination IP? If yes, the tunnel is up. The line protocol stays up indefinitely as long as the route exists, regardless of whether the far end is actually reachable, alive, or willing.

That gap matters in three real scenarios:

Remote router crash or reboot. The route to the underlay IP is still valid (the path through the carrier is fine), but R2 is not actually answering anything. R1's tunnel stays up. Routing protocol neighbors hold for as long as their dead-timer permits, then tear down, but the tunnel itself reports up. If you are not running a routing protocol over the tunnel, traffic black-holes silently.
Remote IPsec or interface failure. The remote Tunnel0 interface is configured but the IPsec profile or underlay interface is broken. Packets you send arrive at the underlay router but are never decapsulated. R1's view of "up / up" is wrong.
Asymmetric path failure. R1 can reach R2 but not vice versa. R1's tunnel is up. R2's tunnel may be up too. Traffic flows one way and is dropped the other. Without bidirectional liveness, neither end notices.

Keepalives turn the tunnel from "up if the underlay route exists" into "up if my keepalive is being acknowledged by the remote end."

How GRE Keepalives Work

The mechanism is elegant. The local router builds a small packet whose inner content is itself a GRE-encapsulated reply addressed to the local router's underlay IP. It sends that packet through the tunnel as a normal GRE packet. The remote router receives it, looks at the inner header, sees a GRE packet bound for the local router's underlay IP, and routes it back through the tunnel. The local router receives its own keepalive back and counts that as a successful round-trip.

The clever part: the remote router does not need any keepalive configuration. It just needs to be alive and processing GRE. As long as the remote box is functional enough to do basic IP routing, it will reflect the keepalive back automatically. There is no protocol negotiation, no version mismatch, no need for both ends to enable keepalives.

Each end runs its keepalive independently. R1's keepalives prove R2 is alive. R2's keepalives prove R1 is alive. They are unrelated. You can turn on keepalives on one end without changing the other end's config, and that one end will detect failures of the other.

Configuration

R1(config)# interface Tunnel0
R1(config-if)# keepalive 10 3

The two numbers are the interval (seconds) and the retry count. keepalive 10 3 means send a keepalive every 10 seconds and declare the tunnel down after 3 consecutive missed responses. So the worst-case detection time is 30 seconds (the third keepalive sent at second 30 plus its full timeout). The IOS XE default if you type just keepalive with no arguments is also 10 seconds and 3 retries.

For faster failover, you can tighten the interval:

R1(config-if)# keepalive 3 3

That gives 9-second detection, at the cost of 33 percent more keepalive traffic on the tunnel. Going below keepalive 1 3 is supported but the marginal benefit drops off and the chance of false positives during normal jitter rises. For sub-second failover requirements, BFD over the tunnel (where the platform supports it) is the better tool than aggressive keepalives.

Verifying Keepalives Are Working

R1# show interface Tunnel0 | include Keepalive
  Keepalive set (10 sec), retries 3

If keepalives are not configured, this shows Keepalive not set. Fix it.

To watch keepalives in action:

R1# debug tunnel keepalive
Tunnel keepalive debugging is on

R1#
*Apr 30 14:12:03.215: Tunnel0: GRE keepalive sent s=198.51.100.1, d=203.0.113.1
*Apr 30 14:12:03.219: Tunnel0: GRE keepalive recv s=203.0.113.1, d=198.51.100.1
*Apr 30 14:12:13.215: Tunnel0: GRE keepalive sent s=198.51.100.1, d=203.0.113.1
*Apr 30 14:12:13.220: Tunnel0: GRE keepalive recv s=203.0.113.1, d=198.51.100.1

Sent, received, sent, received. If you see only "sent" entries with no matching "recv" within the configured interval, the remote end is not reflecting them. That points at a remote router problem, an underlay path problem, or IPsec issues if the tunnel is wrapped.

Turn debug off when you are done: no debug tunnel keepalive or undebug all.

Keepalives and IPsec

This is where the subtle gotchas live. When you wrap GRE in IPsec, keepalives are still GRE packets, but they now have to traverse the IPsec encryption. Three interactions matter:

IPsec rekey timing. When the IPsec SA rekeys (default lifetime is typically 3,600 seconds for IKEv2), there is a brief window where outbound traffic uses the new SA but the remote end has not switched. A keepalive sent in that window may be dropped. With keepalive 10 3 you can ride through a rekey because the next keepalive 10 seconds later will use the established new SA. With keepalive 1 3 you are vulnerable to false-positive tunnel-down events at every rekey.
DPD (Dead Peer Detection) overlap. IPsec has its own liveness check called DPD. If you have both DPD on the IPsec session and GRE keepalives, you have two redundant liveness mechanisms. They do not conflict, but they should be tuned consistently. A common pattern is GRE keepalive 10 3 plus IPsec DPD interval 30. The GRE side detects faster; DPD provides a safety net for the IPsec layer specifically.
NAT-T keepalives. If either tunnel endpoint sits behind NAT, IOS XE sends NAT-T keepalives every 20 seconds (default) to maintain the UDP 4500 NAT pinhole. Those are unrelated to GRE keepalives but show up as additional traffic. Do not confuse the two during traffic-analysis.

Vendor Interoperability

GRE keepalives are a Cisco extension to GRE. They are not in RFC 2784 or RFC 2890. Other vendors have their own approaches:

Cisco IOS / IOS XE / IOS XR

GRE keepalive supportYes (this article)

Notes

Reference implementation

Juniper Junos

GRE keepalive support

Limited; OAM via Service PIC

Notes

Not the same packet format; do not assume interop

Arista EOS

GRE keepalive supportYes, Cisco-style

Notes

Generally interoperates with Cisco

Linux (kernel GRE)

GRE keepalive supportNo native

Notes

Use BFD over the tunnel or a userland keepalive

FortiGate

GRE keepalive supportLimited

Notes

Most FortiGate GRE deployments rely on routing-protocol dead-timers

MikroTik RouterOS

GRE keepalive supportYes

Notes

Cisco-compatible keepalive packet format

For Cisco-to-non-Cisco GRE tunnels, do not assume keepalives work bidirectionally. Test both directions. If keepalives do not interoperate, fall back to running a routing protocol with aggressive timers across the tunnel, or use BFD if both platforms support it.

Alternatives to GRE Keepalives

BFD over the tunnel. Bidirectional Forwarding Detection. Sub-second failure detection, much more efficient than tightly-tuned keepalives, and standardized. Supported on most modern Cisco platforms. Use it when you need millisecond-grade failover and your platform / IOS XE version supports BFD on tunnel interfaces.
Routing-protocol dead-timers. If you are running OSPF or EIGRP across the tunnel, the routing protocol's hello / dead-timer is itself a liveness check. Tune dead-timer aggressively (down to 3 seconds for OSPF) and you have effectively-instant detection without GRE keepalives. The downside: if the routing protocol breaks for a different reason (LSA flap, neighbor mismatch), the tunnel reports "up" but unusable.
NHRP for DMVPN spokes. In a DMVPN deployment, NHRP messages between spokes and hubs serve as liveness checks for the dynamic spoke-to-spoke tunnels. GRE keepalives on the static spoke-to-hub tunnels are still common; the spoke-to-spoke side relies on NHRP holdtimes.

Troubleshooting Keepalive Failures

Tunnel flaps every 30 seconds with no other obvious change

Likely cause

Keepalives configured but not returning

Where to check

debug tunnel keepalive on both ends

Tunnel up but traffic black-holes

Likely cause

Keepalives never enabled

Where to check

show interface Tunnel0 | inc Keepalive

Keepalives returning but routing protocol dies

Likely cause

Routing-protocol issue, not GRE

Where to check

show ip ospf neighbor / show ip eigrp neighbors

Keepalives flap during IPsec rekey only

Likely cause

Aggressive timers + IPsec rekey window

Where to check

Loosen keepalive to 10 3 or stagger rekey lifetime

Cisco-to-non-Cisco tunnel: keepalives one-way

Likely cause

Vendor-specific format mismatch

Where to check

Disable on the non-Cisco end and rely on routing-protocol liveness

Summary

GRE keepalives are the bare minimum hygiene for any production GRE tunnel. They cost almost nothing in bandwidth or CPU, they require no remote-end cooperation, and they catch the failure modes that vanilla GRE tunnel-state cannot. The default keepalive 10 3 is a sensible starting point: 30-second detection, immune to normal IPsec rekey jitter, low overhead. Tighten only if you have a measured need and you are sure the path can support it.

If you take one thing away from this article, take this: every time you build a GRE tunnel, set keepalives in the same config block. There is no production scenario where leaving them off improves anything. Combined with ip mtu 1400 from the MTU article and the recursive-routing static from the config lab, you have eliminated the three most common GRE production failures before the first user complains. The full cluster is at PingLabz GRE.

GRE Tunnel Keepalives Explained

Why GRE Needs Keepalives

How GRE Keepalives Work

Configuration

Verifying Keepalives Are Working

Keepalives and IPsec

Vendor Interoperability

Alternatives to GRE Keepalives

Troubleshooting Keepalive Failures

Summary

J

GRE on Linux: ip tunnel add Commands and Examples

GRE Tunnel Troubleshooting Guide

mGRE and DMVPN Introduction