BGP · · 4 min read

BGP Convergence: Timers, BFD, and Reducing Failover Time

Default BGP convergence is slow by design. With a hold time of 180 seconds and keepalive of 60, a failed peer isn't detected for up to 3 minutes. For internet-facing links, that might be acceptable. For enterprise or data center BGP — where BGP is the routing protocol, not just a peering protocol — 3 minutes of downtime is catastrophic. This article covers every knob available on IOS XE to reduce BGP failover time.

Default Timers and Their Impact

TimerDefaultImpact
Keepalive60 secondsSent to confirm liveness
Hold time180 secondsSession declared dead after 3 missed keepalives
ConnectRetry60 secondsRetry interval after TCP connection failure
MRAI (Min Route Advertisement Interval)30s eBGP, 5s iBGPMinimum time between UPDATE messages to same peer

Worst-case detection time is the hold timer: 180 seconds. Add UPDATE processing, best path recalculation, and MRAI delay, and total convergence can exceed 3-4 minutes.

Option 1: Aggressive Timers

R1-HQ(config-router)# neighbor 172.16.0.2 timers 3 9

Keepalive 3 seconds, hold time 9 seconds. Detection in ~9 seconds. The lower of both peers' hold times is negotiated.

Tradeoffs:

For a handful of critical eBGP sessions, aggressive timers are fine. For hundreds of iBGP peers, the CPU overhead is significant.

Option 2: BFD (Bidirectional Forwarding Detection)

BFD is the recommended approach for sub-second BGP failover. It runs independently of BGP at the data plane level, detecting link or path failures in milliseconds.

! Enable BFD on the interface
R1-HQ(config)# interface GigabitEthernet0/0
R1-HQ(config-if)# bfd interval 100 min_rx 100 multiplier 3

! Tie BGP to BFD
R1-HQ(config)# router bgp 65001
R1-HQ(config-router)# neighbor 172.16.0.2 fall-over bfd

This configures BFD with 100ms transmit/receive intervals and a multiplier of 3 — detection in 300ms. When BFD detects a failure, BGP immediately tears down the session without waiting for the hold timer.

BFD Advantages over Aggressive Timers

BFD for iBGP (Multihop)

For iBGP sessions over loopbacks (multihop), use multihop BFD:

R1-HQ(config-router)# neighbor 2.2.2.2 fall-over bfd multi-hop

Multihop BFD runs over the IP path between loopbacks. It detects failures along the entire path, not just a single link.

Option 3: BGP Fast External Fallover

Enabled by default on IOS XE. When the interface toward an eBGP peer goes down (link-down event), BGP immediately tears down the session without waiting for timers:

! Verify it's enabled (default)
R1-HQ# show ip bgp neighbors 172.16.0.2 | include fast
  External BGP neighbor may be up to 1 hop away, connected check is enabled
  Fast external fallover enabled

This works for directly connected eBGP peers when the local interface goes down. It does NOT help if the remote end fails (your interface stays up) or if there's an intermediate device.

Option 4: Prefix-Independent Convergence (PIC)

In a large BGP table (950K+ prefixes), even after detecting a failure, the router must re-evaluate best path for every affected prefix and update the FIB. PIC pre-calculates backup paths so the FIB can switch immediately when the primary fails:

R1-HQ(config-router)# bgp additional-paths install
R1-HQ(config-router-af)# bgp additional-paths select best 2

With PIC, the first (primary) and second-best paths are both installed in the FIB. When the primary fails, the FIB switches to the backup in constant time regardless of table size — convergence goes from O(n) to O(1).

Option 5: Tuning MRAI

The Minimum Route Advertisement Interval limits how frequently BGP sends UPDATEs to a peer. Default is 30 seconds for eBGP — meaning after detecting a failure, you might wait up to 30 seconds before the withdrawal is sent:

R1-HQ(config-router)# neighbor 172.16.0.2 advertisement-interval 0

Setting to 0 sends UPDATEs immediately. This reduces convergence time but increases UPDATE volume during instability. Use on critical sessions where fast propagation matters.

Complete Fast Convergence Configuration

! Interface-level BFD
interface GigabitEthernet0/0
 bfd interval 100 min_rx 100 multiplier 3
!
router bgp 65001
 ! BFD-backed eBGP session
 neighbor 172.16.0.2 fall-over bfd
 neighbor 172.16.0.2 advertisement-interval 0
 neighbor 172.16.0.2 timers 10 30
 !
 ! BFD-backed iBGP session
 neighbor 2.2.2.2 fall-over bfd multi-hop
 !
 address-family ipv4 unicast
  bgp additional-paths select best 2
  bgp additional-paths install

This gives: ~300ms failure detection (BFD), immediate UPDATE propagation (MRAI 0), and instant FIB switchover (PIC). Total convergence under 1 second for most scenarios.

Verification

R1-HQ# show bfd neighbors
NeighAddr     LD/RD    RH/RS    State    Int
172.16.0.2    1/2      Up       Up       Gi0/0

R1-HQ# show bfd neighbors detail
NeighAddr: 172.16.0.2
  LD/RD: 1/2
  RH/RS: Up
  Session state: UP
  Holddown (hits): 0(0)
  Interval: 100ms, Multiplier: 3
  Registered protocols: BGP

R1-HQ# show ip bgp neighbors 172.16.0.2 | include BFD
  Using BFD to detect fast fallover
  BFD session state: UP

Troubleshooting

SymptomCauseFix
BFD configured but not detecting failuresBFD not enabled on the interface, or platform doesn't support hardware BFDVerify bfd interval on the interface. Check show bfd neighbors for session state.
BGP session flapping with aggressive timersTimers too low for the platform's CPU capacity — keepalives delayed during route processingIncrease timers or switch to BFD (hardware-assisted, immune to CPU spikes).
Fast failover not working for iBGPfall-over bfd requires multi-hop keyword for loopback-based iBGP sessionsAdd fall-over bfd multi-hop for iBGP peers using loopback addresses.

Key Takeaways

Read next

© 2025 Ping Labz. All rights reserved.