BGP Convergence: Timers, BFD, and Fast Failover

Default BGP convergence is slow by design. With a hold time of 180 seconds and keepalive of 60, a failed peer isn't detected for up to 3 minutes. For internet-facing links, that might be acceptable. For enterprise or data center BGP - where BGP is the routing protocol, not just a peering protocol - 3 minutes of downtime is catastrophic. This article covers every knob available on IOS XE to reduce BGP failover time.

Default Timers and Their Impact

Keepalive

Default60 seconds

Impact

Sent to confirm liveness

Hold time

Default180 seconds

Impact

Session declared dead after 3 missed keepalives

ConnectRetry

Default60 seconds

Impact

Retry interval after TCP connection failure

MRAI (Min Route Advertisement Interval)

Default30s eBGP, 5s iBGP

Impact

Minimum time between UPDATE messages to same peer

Worst-case detection time is the hold timer: 180 seconds. Add UPDATE processing, best path recalculation, and MRAI delay, and total convergence can exceed 3-4 minutes.

Option 1: Aggressive Timers

R1-HQ(config-router)# neighbor 172.16.0.2 timers 3 9

Keepalive 3 seconds, hold time 9 seconds. Detection in ~9 seconds. The lower of both peers' hold times is negotiated.

Tradeoffs:

More control-plane traffic (keepalive every 3s per peer)
Higher CPU usage - significant with many peers
Risk of false positives during CPU spikes or brief congestion

For a handful of critical eBGP sessions, aggressive timers are fine. For hundreds of iBGP peers, the CPU overhead is significant.

Option 2: BFD (Bidirectional Forwarding Detection)

BFD is the recommended approach for sub-second BGP failover. It runs independently of BGP at the data plane level, detecting link or path failures in milliseconds.

! Enable BFD on the interface
R1-HQ(config)# interface GigabitEthernet0/0
R1-HQ(config-if)# bfd interval 100 min_rx 100 multiplier 3

! Tie BGP to BFD
R1-HQ(config)# router bgp 65001
R1-HQ(config-router)# neighbor 172.16.0.2 fall-over bfd

This configures BFD with 100ms transmit/receive intervals and a multiplier of 3 - detection in 300ms. When BFD detects a failure, BGP immediately tears down the session without waiting for the hold timer.

BFD Advantages over Aggressive Timers

Sub-second detection: 50-300ms typical, vs minimum ~9 seconds with timers
Hardware-assisted: On supported platforms, BFD runs in the forwarding ASIC, not the CPU
Protocol-independent: One BFD session can serve BGP, OSPF, and static routes simultaneously
No false positives from CPU spikes: BFD runs at the data plane, unaffected by route processor load

BFD for iBGP (Multihop)

For iBGP sessions over loopbacks (multihop), use multihop BFD:

R1-HQ(config-router)# neighbor 2.2.2.2 fall-over bfd multi-hop

Multihop BFD runs over the IP path between loopbacks. It detects failures along the entire path, not just a single link.

Option 3: BGP Fast External Fallover

Enabled by default on IOS XE. When the interface toward an eBGP peer goes down (link-down event), BGP immediately tears down the session without waiting for timers:

! Verify it's enabled (default)
R1-HQ# show ip bgp neighbors 172.16.0.2 | include fast
  External BGP neighbor may be up to 1 hop away, connected check is enabled
  Fast external fallover enabled

This works for directly connected eBGP peers when the local interface goes down. It does NOT help if the remote end fails (your interface stays up) or if there's an intermediate device.

Option 4: Prefix-Independent Convergence (PIC)

In a large BGP table (950K+ prefixes), even after detecting a failure, the router must re-evaluate best path for every affected prefix and update the FIB. PIC pre-calculates backup paths so the FIB can switch immediately when the primary fails:

R1-HQ(config-router)# bgp additional-paths install
R1-HQ(config-router-af)# bgp additional-paths select best 2

With PIC, the first (primary) and second-best paths are both installed in the FIB. When the primary fails, the FIB switches to the backup in constant time regardless of table size - convergence goes from O(n) to O(1).

Option 5: Tuning MRAI

The Minimum Route Advertisement Interval limits how frequently BGP sends UPDATEs to a peer. Default is 30 seconds for eBGP - meaning after detecting a failure, you might wait up to 30 seconds before the withdrawal is sent:

R1-HQ(config-router)# neighbor 172.16.0.2 advertisement-interval 0

Setting to 0 sends UPDATEs immediately. This reduces convergence time but increases UPDATE volume during instability. Use on critical sessions where fast propagation matters.

Complete Fast Convergence Configuration

! Interface-level BFD
interface GigabitEthernet0/0
 bfd interval 100 min_rx 100 multiplier 3
!
router bgp 65001
 ! BFD-backed eBGP session
 neighbor 172.16.0.2 fall-over bfd
 neighbor 172.16.0.2 advertisement-interval 0
 neighbor 172.16.0.2 timers 10 30
 !
 ! BFD-backed iBGP session
 neighbor 2.2.2.2 fall-over bfd multi-hop
 !
 address-family ipv4 unicast
  bgp additional-paths select best 2
  bgp additional-paths install

This gives: ~300ms failure detection (BFD), immediate UPDATE propagation (MRAI 0), and instant FIB switchover (PIC). Total convergence under 1 second for most scenarios.

Verification

R1-HQ# show bfd neighbors
NeighAddr     LD/RD    RH/RS    State    Int
172.16.0.2    1/2      Up       Up       Gi0/0

R1-HQ# show bfd neighbors detail
NeighAddr: 172.16.0.2
  LD/RD: 1/2
  RH/RS: Up
  Session state: UP
  Holddown (hits): 0(0)
  Interval: 100ms, Multiplier: 3
  Registered protocols: BGP

R1-HQ# show ip bgp neighbors 172.16.0.2 | include BFD
  Using BFD to detect fast fallover
  BFD session state: UP

Troubleshooting

BFD configured but not detecting failures

Cause

BFD not enabled on the interface, or platform doesn't support hardware BFD

Fix

Verify bfd interval on the interface. Check show bfd neighbors for session state.

BGP session flapping with aggressive timers

Cause

Timers too low for the platform's CPU capacity - keepalives delayed during route processing

Fix

Increase timers or switch to BFD (hardware-assisted, immune to CPU spikes).

Fast failover not working for iBGP

Cause

fall-over bfd requires multi-hop keyword for loopback-based iBGP sessions

Fix

Add fall-over bfd multi-hop for iBGP peers using loopback addresses.

Key Takeaways

Default BGP convergence is 3+ minutes. For most production environments, this is too slow.
BFD is the recommended solution - sub-second detection, hardware-assisted, no CPU impact.
PIC (Prefix-Independent Convergence) eliminates FIB update time by pre-installing backup paths.
Set advertisement-interval to 0 on critical sessions for immediate UPDATE propagation.
Layer your approach: BFD for detection + PIC for FIB convergence + low MRAI for propagation = sub-second total convergence.

BGP Convergence: Timers, BFD, and Reducing Failover Time

Default Timers and Their Impact

Option 1: Aggressive Timers

Option 2: BFD (Bidirectional Forwarding Detection)

BFD Advantages over Aggressive Timers

BFD for iBGP (Multihop)

Option 3: BGP Fast External Fallover

Option 4: Prefix-Independent Convergence (PIC)

Option 5: Tuning MRAI

Complete Fast Convergence Configuration

Verification

Troubleshooting

Key Takeaways

J

BGP Convergence: Timers, BFD, and Reducing Failover Time

Default Timers and Their Impact

Option 1: Aggressive Timers

Option 2: BFD (Bidirectional Forwarding Detection)

BFD Advantages over Aggressive Timers

BFD for iBGP (Multihop)

Option 3: BGP Fast External Fallover

Option 4: Prefix-Independent Convergence (PIC)

Option 5: Tuning MRAI

Complete Fast Convergence Configuration

Verification

Troubleshooting

Key Takeaways

J

BGP Configuration on Cisco IOS XE: eBGP and iBGP

BGP Weight: The Cisco-Only Path Attribute (and When to Use It)

BGP Looking Glass: What It Is, Public Servers, and Hosting Your Own