BGP

BGP Convergence: Timers, BFD, and Reducing Failover Time

Default BGP convergence can take 3 minutes to notice a dead peer. That is fine on the public internet and unacceptable in enterprise or data center BGP. Every knob for reducing BGP failover time on Cisco IOS XE, including BFD.
BGP Convergence: Timers, BFD, and Reducing Failover Time
In: BGP

Default BGP convergence is slow by design. With a hold time of 180 seconds and keepalive of 60, a failed peer isn't detected for up to 3 minutes. For internet-facing links, that might be acceptable. For enterprise or data center BGP — where BGP is the routing protocol, not just a peering protocol — 3 minutes of downtime is catastrophic. This article covers every knob available on IOS XE to reduce BGP failover time.

Default Timers and Their Impact

TimerDefaultImpact
Keepalive60 secondsSent to confirm liveness
Hold time180 secondsSession declared dead after 3 missed keepalives
ConnectRetry60 secondsRetry interval after TCP connection failure
MRAI (Min Route Advertisement Interval)30s eBGP, 5s iBGPMinimum time between UPDATE messages to same peer

Worst-case detection time is the hold timer: 180 seconds. Add UPDATE processing, best path recalculation, and MRAI delay, and total convergence can exceed 3-4 minutes.

Option 1: Aggressive Timers

R1-HQ(config-router)# neighbor 172.16.0.2 timers 3 9

Keepalive 3 seconds, hold time 9 seconds. Detection in ~9 seconds. The lower of both peers' hold times is negotiated.

Tradeoffs:

  • More control-plane traffic (keepalive every 3s per peer)
  • Higher CPU usage — significant with many peers
  • Risk of false positives during CPU spikes or brief congestion

For a handful of critical eBGP sessions, aggressive timers are fine. For hundreds of iBGP peers, the CPU overhead is significant.

Option 2: BFD (Bidirectional Forwarding Detection)

BFD is the recommended approach for sub-second BGP failover. It runs independently of BGP at the data plane level, detecting link or path failures in milliseconds.

! Enable BFD on the interface
R1-HQ(config)# interface GigabitEthernet0/0
R1-HQ(config-if)# bfd interval 100 min_rx 100 multiplier 3

! Tie BGP to BFD
R1-HQ(config)# router bgp 65001
R1-HQ(config-router)# neighbor 172.16.0.2 fall-over bfd

This configures BFD with 100ms transmit/receive intervals and a multiplier of 3 — detection in 300ms. When BFD detects a failure, BGP immediately tears down the session without waiting for the hold timer.

BFD Advantages over Aggressive Timers

  • Sub-second detection: 50-300ms typical, vs minimum ~9 seconds with timers
  • Hardware-assisted: On supported platforms, BFD runs in the forwarding ASIC, not the CPU
  • Protocol-independent: One BFD session can serve BGP, OSPF, and static routes simultaneously
  • No false positives from CPU spikes: BFD runs at the data plane, unaffected by route processor load

BFD for iBGP (Multihop)

For iBGP sessions over loopbacks (multihop), use multihop BFD:

R1-HQ(config-router)# neighbor 2.2.2.2 fall-over bfd multi-hop

Multihop BFD runs over the IP path between loopbacks. It detects failures along the entire path, not just a single link.

Option 3: BGP Fast External Fallover

Enabled by default on IOS XE. When the interface toward an eBGP peer goes down (link-down event), BGP immediately tears down the session without waiting for timers:

! Verify it's enabled (default)
R1-HQ# show ip bgp neighbors 172.16.0.2 | include fast
  External BGP neighbor may be up to 1 hop away, connected check is enabled
  Fast external fallover enabled

This works for directly connected eBGP peers when the local interface goes down. It does NOT help if the remote end fails (your interface stays up) or if there's an intermediate device.

Option 4: Prefix-Independent Convergence (PIC)

In a large BGP table (950K+ prefixes), even after detecting a failure, the router must re-evaluate best path for every affected prefix and update the FIB. PIC pre-calculates backup paths so the FIB can switch immediately when the primary fails:

R1-HQ(config-router)# bgp additional-paths install
R1-HQ(config-router-af)# bgp additional-paths select best 2

With PIC, the first (primary) and second-best paths are both installed in the FIB. When the primary fails, the FIB switches to the backup in constant time regardless of table size — convergence goes from O(n) to O(1).

Option 5: Tuning MRAI

The Minimum Route Advertisement Interval limits how frequently BGP sends UPDATEs to a peer. Default is 30 seconds for eBGP — meaning after detecting a failure, you might wait up to 30 seconds before the withdrawal is sent:

R1-HQ(config-router)# neighbor 172.16.0.2 advertisement-interval 0

Setting to 0 sends UPDATEs immediately. This reduces convergence time but increases UPDATE volume during instability. Use on critical sessions where fast propagation matters.

Complete Fast Convergence Configuration

! Interface-level BFD
interface GigabitEthernet0/0
 bfd interval 100 min_rx 100 multiplier 3
!
router bgp 65001
 ! BFD-backed eBGP session
 neighbor 172.16.0.2 fall-over bfd
 neighbor 172.16.0.2 advertisement-interval 0
 neighbor 172.16.0.2 timers 10 30
 !
 ! BFD-backed iBGP session
 neighbor 2.2.2.2 fall-over bfd multi-hop
 !
 address-family ipv4 unicast
  bgp additional-paths select best 2
  bgp additional-paths install

This gives: ~300ms failure detection (BFD), immediate UPDATE propagation (MRAI 0), and instant FIB switchover (PIC). Total convergence under 1 second for most scenarios.

Verification

R1-HQ# show bfd neighbors
NeighAddr     LD/RD    RH/RS    State    Int
172.16.0.2    1/2      Up       Up       Gi0/0

R1-HQ# show bfd neighbors detail
NeighAddr: 172.16.0.2
  LD/RD: 1/2
  RH/RS: Up
  Session state: UP
  Holddown (hits): 0(0)
  Interval: 100ms, Multiplier: 3
  Registered protocols: BGP

R1-HQ# show ip bgp neighbors 172.16.0.2 | include BFD
  Using BFD to detect fast fallover
  BFD session state: UP

Troubleshooting

SymptomCauseFix
BFD configured but not detecting failuresBFD not enabled on the interface, or platform doesn't support hardware BFDVerify bfd interval on the interface. Check show bfd neighbors for session state.
BGP session flapping with aggressive timersTimers too low for the platform's CPU capacity — keepalives delayed during route processingIncrease timers or switch to BFD (hardware-assisted, immune to CPU spikes).
Fast failover not working for iBGPfall-over bfd requires multi-hop keyword for loopback-based iBGP sessionsAdd fall-over bfd multi-hop for iBGP peers using loopback addresses.

Key Takeaways

  • Default BGP convergence is 3+ minutes. For most production environments, this is too slow.
  • BFD is the recommended solution — sub-second detection, hardware-assisted, no CPU impact.
  • PIC (Prefix-Independent Convergence) eliminates FIB update time by pre-installing backup paths.
  • Set advertisement-interval to 0 on critical sessions for immediate UPDATE propagation.
  • Layer your approach: BFD for detection + PIC for FIB convergence + low MRAI for propagation = sub-second total convergence.
Written by
More from Ping Labz
MPLS L3VPN with MP-BGP and VPNv4
MPLS

MPLS L3VPN with MP-BGP and VPNv4

MPLS L3VPN architecture: VRFs, Route Distinguishers, Route Targets, MP-BGP for VPNv4, the two-label stack, PE-CE routing, and the Cisco IOS XE configuration.
Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to Ping Labz.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.