Default BGP convergence is slow by design. With a hold time of 180 seconds and keepalive of 60, a failed peer isn't detected for up to 3 minutes. For internet-facing links, that might be acceptable. For enterprise or data center BGP — where BGP is the routing protocol, not just a peering protocol — 3 minutes of downtime is catastrophic. This article covers every knob available on IOS XE to reduce BGP failover time.
Default Timers and Their Impact
| Timer | Default | Impact |
|---|---|---|
| Keepalive | 60 seconds | Sent to confirm liveness |
| Hold time | 180 seconds | Session declared dead after 3 missed keepalives |
| ConnectRetry | 60 seconds | Retry interval after TCP connection failure |
| MRAI (Min Route Advertisement Interval) | 30s eBGP, 5s iBGP | Minimum time between UPDATE messages to same peer |
Worst-case detection time is the hold timer: 180 seconds. Add UPDATE processing, best path recalculation, and MRAI delay, and total convergence can exceed 3-4 minutes.
Option 1: Aggressive Timers
R1-HQ(config-router)# neighbor 172.16.0.2 timers 3 9Keepalive 3 seconds, hold time 9 seconds. Detection in ~9 seconds. The lower of both peers' hold times is negotiated.
Tradeoffs:
- More control-plane traffic (keepalive every 3s per peer)
- Higher CPU usage — significant with many peers
- Risk of false positives during CPU spikes or brief congestion
For a handful of critical eBGP sessions, aggressive timers are fine. For hundreds of iBGP peers, the CPU overhead is significant.
Option 2: BFD (Bidirectional Forwarding Detection)
BFD is the recommended approach for sub-second BGP failover. It runs independently of BGP at the data plane level, detecting link or path failures in milliseconds.
! Enable BFD on the interface
R1-HQ(config)# interface GigabitEthernet0/0
R1-HQ(config-if)# bfd interval 100 min_rx 100 multiplier 3
! Tie BGP to BFD
R1-HQ(config)# router bgp 65001
R1-HQ(config-router)# neighbor 172.16.0.2 fall-over bfdThis configures BFD with 100ms transmit/receive intervals and a multiplier of 3 — detection in 300ms. When BFD detects a failure, BGP immediately tears down the session without waiting for the hold timer.
BFD Advantages over Aggressive Timers
- Sub-second detection: 50-300ms typical, vs minimum ~9 seconds with timers
- Hardware-assisted: On supported platforms, BFD runs in the forwarding ASIC, not the CPU
- Protocol-independent: One BFD session can serve BGP, OSPF, and static routes simultaneously
- No false positives from CPU spikes: BFD runs at the data plane, unaffected by route processor load
BFD for iBGP (Multihop)
For iBGP sessions over loopbacks (multihop), use multihop BFD:
R1-HQ(config-router)# neighbor 2.2.2.2 fall-over bfd multi-hopMultihop BFD runs over the IP path between loopbacks. It detects failures along the entire path, not just a single link.
Option 3: BGP Fast External Fallover
Enabled by default on IOS XE. When the interface toward an eBGP peer goes down (link-down event), BGP immediately tears down the session without waiting for timers:
! Verify it's enabled (default)
R1-HQ# show ip bgp neighbors 172.16.0.2 | include fast
External BGP neighbor may be up to 1 hop away, connected check is enabled
Fast external fallover enabledThis works for directly connected eBGP peers when the local interface goes down. It does NOT help if the remote end fails (your interface stays up) or if there's an intermediate device.
Option 4: Prefix-Independent Convergence (PIC)
In a large BGP table (950K+ prefixes), even after detecting a failure, the router must re-evaluate best path for every affected prefix and update the FIB. PIC pre-calculates backup paths so the FIB can switch immediately when the primary fails:
R1-HQ(config-router)# bgp additional-paths install
R1-HQ(config-router-af)# bgp additional-paths select best 2With PIC, the first (primary) and second-best paths are both installed in the FIB. When the primary fails, the FIB switches to the backup in constant time regardless of table size — convergence goes from O(n) to O(1).
Option 5: Tuning MRAI
The Minimum Route Advertisement Interval limits how frequently BGP sends UPDATEs to a peer. Default is 30 seconds for eBGP — meaning after detecting a failure, you might wait up to 30 seconds before the withdrawal is sent:
R1-HQ(config-router)# neighbor 172.16.0.2 advertisement-interval 0Setting to 0 sends UPDATEs immediately. This reduces convergence time but increases UPDATE volume during instability. Use on critical sessions where fast propagation matters.
Complete Fast Convergence Configuration
! Interface-level BFD
interface GigabitEthernet0/0
bfd interval 100 min_rx 100 multiplier 3
!
router bgp 65001
! BFD-backed eBGP session
neighbor 172.16.0.2 fall-over bfd
neighbor 172.16.0.2 advertisement-interval 0
neighbor 172.16.0.2 timers 10 30
!
! BFD-backed iBGP session
neighbor 2.2.2.2 fall-over bfd multi-hop
!
address-family ipv4 unicast
bgp additional-paths select best 2
bgp additional-paths installThis gives: ~300ms failure detection (BFD), immediate UPDATE propagation (MRAI 0), and instant FIB switchover (PIC). Total convergence under 1 second for most scenarios.
Verification
R1-HQ# show bfd neighbors
NeighAddr LD/RD RH/RS State Int
172.16.0.2 1/2 Up Up Gi0/0
R1-HQ# show bfd neighbors detail
NeighAddr: 172.16.0.2
LD/RD: 1/2
RH/RS: Up
Session state: UP
Holddown (hits): 0(0)
Interval: 100ms, Multiplier: 3
Registered protocols: BGP
R1-HQ# show ip bgp neighbors 172.16.0.2 | include BFD
Using BFD to detect fast fallover
BFD session state: UPTroubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| BFD configured but not detecting failures | BFD not enabled on the interface, or platform doesn't support hardware BFD | Verify bfd interval on the interface. Check show bfd neighbors for session state. |
| BGP session flapping with aggressive timers | Timers too low for the platform's CPU capacity — keepalives delayed during route processing | Increase timers or switch to BFD (hardware-assisted, immune to CPU spikes). |
| Fast failover not working for iBGP | fall-over bfd requires multi-hop keyword for loopback-based iBGP sessions | Add fall-over bfd multi-hop for iBGP peers using loopback addresses. |
Key Takeaways
- Default BGP convergence is 3+ minutes. For most production environments, this is too slow.
- BFD is the recommended solution — sub-second detection, hardware-assisted, no CPU impact.
- PIC (Prefix-Independent Convergence) eliminates FIB update time by pre-installing backup paths.
- Set advertisement-interval to 0 on critical sessions for immediate UPDATE propagation.
- Layer your approach: BFD for detection + PIC for FIB convergence + low MRAI for propagation = sub-second total convergence.