BGP

BGP Convergence: Timers, BFD, and Reducing Failover Time

Default BGP convergence can take 3 minutes to notice a dead peer. That is fine on the public internet and unacceptable in enterprise or data center BGP. Every knob for reducing BGP failover time on Cisco IOS XE, including BFD.
BGP Convergence: Timers, BFD, and Reducing Failover Time
Table of Contents
In: BGP

Default BGP convergence is slow by design. With a hold time of 180 seconds and keepalive of 60, a failed peer isn't detected for up to 3 minutes. For internet-facing links, that might be acceptable. For enterprise or data center BGP - where BGP is the routing protocol, not just a peering protocol - 3 minutes of downtime is catastrophic. This article covers every knob available on IOS XE to reduce BGP failover time.

Default Timers and Their Impact

Keepalive
Default60 seconds
Impact
Sent to confirm liveness
Hold time
Default180 seconds
Impact
Session declared dead after 3 missed keepalives
ConnectRetry
Default60 seconds
Impact
Retry interval after TCP connection failure
MRAI (Min Route Advertisement Interval)
Default30s eBGP, 5s iBGP
Impact
Minimum time between UPDATE messages to same peer

Worst-case detection time is the hold timer: 180 seconds. Add UPDATE processing, best path recalculation, and MRAI delay, and total convergence can exceed 3-4 minutes.

Option 1: Aggressive Timers

R1-HQ(config-router)# neighbor 172.16.0.2 timers 3 9

Keepalive 3 seconds, hold time 9 seconds. Detection in ~9 seconds. The lower of both peers' hold times is negotiated.

Tradeoffs:

  • More control-plane traffic (keepalive every 3s per peer)
  • Higher CPU usage - significant with many peers
  • Risk of false positives during CPU spikes or brief congestion

For a handful of critical eBGP sessions, aggressive timers are fine. For hundreds of iBGP peers, the CPU overhead is significant.

Option 2: BFD (Bidirectional Forwarding Detection)

BFD is the recommended approach for sub-second BGP failover. It runs independently of BGP at the data plane level, detecting link or path failures in milliseconds.

! Enable BFD on the interface
R1-HQ(config)# interface GigabitEthernet0/0
R1-HQ(config-if)# bfd interval 100 min_rx 100 multiplier 3

! Tie BGP to BFD
R1-HQ(config)# router bgp 65001
R1-HQ(config-router)# neighbor 172.16.0.2 fall-over bfd

This configures BFD with 100ms transmit/receive intervals and a multiplier of 3 - detection in 300ms. When BFD detects a failure, BGP immediately tears down the session without waiting for the hold timer.

BFD Advantages over Aggressive Timers

  • Sub-second detection: 50-300ms typical, vs minimum ~9 seconds with timers
  • Hardware-assisted: On supported platforms, BFD runs in the forwarding ASIC, not the CPU
  • Protocol-independent: One BFD session can serve BGP, OSPF, and static routes simultaneously
  • No false positives from CPU spikes: BFD runs at the data plane, unaffected by route processor load

BFD for iBGP (Multihop)

For iBGP sessions over loopbacks (multihop), use multihop BFD:

R1-HQ(config-router)# neighbor 2.2.2.2 fall-over bfd multi-hop

Multihop BFD runs over the IP path between loopbacks. It detects failures along the entire path, not just a single link.

Option 3: BGP Fast External Fallover

Enabled by default on IOS XE. When the interface toward an eBGP peer goes down (link-down event), BGP immediately tears down the session without waiting for timers:

! Verify it's enabled (default)
R1-HQ# show ip bgp neighbors 172.16.0.2 | include fast
  External BGP neighbor may be up to 1 hop away, connected check is enabled
  Fast external fallover enabled

This works for directly connected eBGP peers when the local interface goes down. It does NOT help if the remote end fails (your interface stays up) or if there's an intermediate device.

Option 4: Prefix-Independent Convergence (PIC)

In a large BGP table (950K+ prefixes), even after detecting a failure, the router must re-evaluate best path for every affected prefix and update the FIB. PIC pre-calculates backup paths so the FIB can switch immediately when the primary fails:

R1-HQ(config-router)# bgp additional-paths install
R1-HQ(config-router-af)# bgp additional-paths select best 2

With PIC, the first (primary) and second-best paths are both installed in the FIB. When the primary fails, the FIB switches to the backup in constant time regardless of table size - convergence goes from O(n) to O(1).

Option 5: Tuning MRAI

The Minimum Route Advertisement Interval limits how frequently BGP sends UPDATEs to a peer. Default is 30 seconds for eBGP - meaning after detecting a failure, you might wait up to 30 seconds before the withdrawal is sent:

R1-HQ(config-router)# neighbor 172.16.0.2 advertisement-interval 0

Setting to 0 sends UPDATEs immediately. This reduces convergence time but increases UPDATE volume during instability. Use on critical sessions where fast propagation matters.

Complete Fast Convergence Configuration

! Interface-level BFD
interface GigabitEthernet0/0
 bfd interval 100 min_rx 100 multiplier 3
!
router bgp 65001
 ! BFD-backed eBGP session
 neighbor 172.16.0.2 fall-over bfd
 neighbor 172.16.0.2 advertisement-interval 0
 neighbor 172.16.0.2 timers 10 30
 !
 ! BFD-backed iBGP session
 neighbor 2.2.2.2 fall-over bfd multi-hop
 !
 address-family ipv4 unicast
  bgp additional-paths select best 2
  bgp additional-paths install

This gives: ~300ms failure detection (BFD), immediate UPDATE propagation (MRAI 0), and instant FIB switchover (PIC). Total convergence under 1 second for most scenarios.

Verification

R1-HQ# show bfd neighbors
NeighAddr     LD/RD    RH/RS    State    Int
172.16.0.2    1/2      Up       Up       Gi0/0

R1-HQ# show bfd neighbors detail
NeighAddr: 172.16.0.2
  LD/RD: 1/2
  RH/RS: Up
  Session state: UP
  Holddown (hits): 0(0)
  Interval: 100ms, Multiplier: 3
  Registered protocols: BGP

R1-HQ# show ip bgp neighbors 172.16.0.2 | include BFD
  Using BFD to detect fast fallover
  BFD session state: UP

Troubleshooting

BFD configured but not detecting failures
Cause
BFD not enabled on the interface, or platform doesn't support hardware BFD
Fix
Verify bfd interval on the interface. Check show bfd neighbors for session state.
BGP session flapping with aggressive timers
Cause
Timers too low for the platform's CPU capacity - keepalives delayed during route processing
Fix
Increase timers or switch to BFD (hardware-assisted, immune to CPU spikes).
Fast failover not working for iBGP
Cause
fall-over bfd requires multi-hop keyword for loopback-based iBGP sessions
Fix
Add fall-over bfd multi-hop for iBGP peers using loopback addresses.

Key Takeaways

  • Default BGP convergence is 3+ minutes. For most production environments, this is too slow.
  • BFD is the recommended solution - sub-second detection, hardware-assisted, no CPU impact.
  • PIC (Prefix-Independent Convergence) eliminates FIB update time by pre-installing backup paths.
  • Set advertisement-interval to 0 on critical sessions for immediate UPDATE propagation.
  • Layer your approach: BFD for detection + PIC for FIB convergence + low MRAI for propagation = sub-second total convergence.
Written by
More from Ping Labz
Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to Ping Labz.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.