C9800 High Availability (SSO) Configuration Guide

C9800 High Availability (SSO) Configuration Guide

Introduction: Why High Availability Matters for Wireless Controllers

When you deploy a Cisco Catalyst 9800 series wireless controller, controller availability directly impacts your entire wireless network. A controller failure means APs lose their central management point, clients disconnect, and your network goes dark. Cisco addresses this with two redundancy models: Stateful Switchover (SSO) and N+1 redundancy. This guide walks you through understanding and configuring SSO, the preferred approach for most enterprises because it preserves active client sessions during failover events.

Whether you're protecting a single site controller, a distributed architecture, or an embedded controller on a Catalyst switch, the principles remain consistent. You'll learn the topology requirements, physical connections, CLI configuration, and verification commands needed to build a reliable HA pair.

High Availability Options: SSO vs. N+1

Before diving into SSO configuration, you need to understand how it differs from N+1 redundancy. Both protect your network, but with different trade-offs.

Feature Stateful Switchover (SSO) N+1 Redundancy
Topology Active-Standby pair (2 controllers) N Active controllers + 1 Spare
State Preservation Active controller state syncs to standby; failover preserves client sessions Stateless; clients reconnect after failover
Failover Time Sub-second (typically 100-500ms) Clients wait for reconnection window (can be seconds)
Capacity Utilization Standby sits idle; uses only 50% of hardware capacity All controllers share load; spare only activates on failure
Physical Links Required Redundancy Port (L2) + Redundancy Management Interface (L3) Network connectivity only
Best For Enterprise sites, mission-critical deployments Stateless deployments, branch offices with limited budgets

SSO is the more common choice for production networks because it minimizes user disruption. Your active controller continuously synchronizes its state—session tables, certificates, licenses, AP configurations—to the standby. When the active fails, the standby assumes the role immediately without clients needing to re-authenticate.

SSO Architecture and Component Overview

An SSO pair consists of two 9800 controllers (or embedded controllers on Catalyst switches) configured in active-standby mode. The two controllers communicate over dedicated and shared links.

The Active Controller

The active controller processes all client connections, WLAN traffic, and management requests. It continuously replicates its state to the standby peer. This synchronization happens in near real-time for critical data (sessions, certificates) and periodically for larger data sets (AP configurations, statistics).

The Standby Controller

The standby controller receives all state information from the active but does not process client traffic. It monitors the active's health via keepalive messages. The moment it detects the active has failed (based on missed keepalives or explicit trigger), it takes control: it becomes the new active, starts accepting client connections, and synchronizes state to the old active (now the new standby).

The Redundancy Port (RP)

The Redundancy Port is a dedicated Layer 2 link between the two controllers. It carries high-frequency state synchronization data. Think of it as the heartbeat and data artery between the pair. The RP uses a direct cable (typically a copper GigabitEthernet or SFP link) between the controllers. Because it's Layer 2, no IP configuration is required on the RP itself. The RP is exclusively for controller-to-controller communication; no other devices connect to it.

The Redundancy Management Interface (RMI)

The RMI is a Layer 3 reachable interface used as a secondary communication path and for management traffic. It's typically a routed VLAN interface on each controller. The RMI allows the active and standby to reach each other over your network infrastructure, not just over a direct cable. This is critical for deployments where the controllers sit in different buildings or data centers.

The RMI also serves as the management IP for out-of-band access. When you SSH to the controller's management IP, you're connecting to the RMI. During failover, the management IP migrates from the old active to the new active automatically.

Prerequisites and Planning

Before you configure SSO, ensure you have the right hardware and design in place.

Hardware Requirements

  • Two matching 9800 controllers: Both must be the same model (e.g., both C9800-L-F-K9 or both C9800-80-K9) and running the same software version. Hardware compatibility is stricter than you might expect—mismatched models will not join an HA pair.
  • Redundancy Port (RP) link: A dedicated Layer 2 cable directly between the two controllers. This cannot be shared with other traffic or switch infrastructure. Typical configuration: GigabitEthernet port directly to GigabitEthernet port, or SFP to SFP for longer distances. Some controllers support multi-gigabit RP links for higher throughput.
  • RMI connectivity: Network-routed path between the two controllers' RMI interfaces. This could be a dedicated VLAN or shared infrastructure, but latency should be low (sub-20ms ideally) and jitter minimal.
  • Uplink ports on both controllers: Each controller needs WAN/LAN uplinks for client traffic and management (these are separate from the RP and RMI).

Software and Licensing

  • Both controllers must run matching IOS XE software versions. SSO is supported on recent IOS XE releases (17.6+), but verify your specific version in release notes.
  • Both controllers must have active licenses. The standby controller's licenses are synchronized from the active, but both need valid entitlements.
  • FIPS mode and non-FIPS controllers cannot be paired in SSO.

Network Design

  • RP cable runs: Plan for a direct Layer 2 cable. If your controllers are in different buildings, use fiber optic runs to avoid electrical noise.
  • RMI VLAN: Decide which VLAN will carry RMI traffic. It should have low latency and high reliability. Some designs put RMI on a dedicated VLAN; others share it with management traffic.
  • Gateway monitoring: Plan which uplink gateway(s) the controllers will monitor for liveliness. If the active loses connectivity to all gateways, it may trigger a failover or go into a failed state.

Configuring SSO: Step-by-Step CLI Setup

Now let's build an SSO pair. The configuration follows a specific order: first, prepare the redundancy port and RMI; second, enable SSO; third, elect and synchronize roles.

Step 1: Configure the Redundancy Port on Both Controllers

On both the active-to-be and standby-to-be controllers, you must bring up the physical RP link. First, identify which interface will be the RP (often GigabitEthernet0/0 on some models, or you can use any unused port).

! Configuration on BOTH controllers
controller-mode enable
redundancy
 port local Gi0/0
 port peer Gi0/0

The port local command tells this controller which of its interfaces is the RP. The port peer command tells this controller which interface on the peer is the RP. Once you commit this configuration on both controllers, the RP link comes up and you should see a direct Layer 2 connection established between them.

Verify the RP is up:

show redundancy
show platform software redundancy interface summary

Step 2: Configure the Redundancy Management Interface (RMI)

The RMI needs IP addresses on both controllers so they can reach each other over the network. Create a VLAN interface (or use an existing one) and assign addresses:

! On Controller A (Active-to-be)
interface Vlan200
 ip address 10.200.1.10 255.255.255.0
 no shut
exit

redundancy
 management-interface Vlan200 10.200.1.10 10.200.1.11
! On Controller B (Standby-to-be)
interface Vlan200
 ip address 10.200.1.11 255.255.255.0
 no shut
exit

redundancy
 management-interface Vlan200 10.200.1.11 10.200.1.10

Notice that each controller's management-interface command lists its own IP first, then the peer's IP. The controllers use these IPs to verify connectivity and synchronize state.

Step 3: Enable SSO and Set the Role

Once the RP is up and RMI is reachable, enable SSO on both controllers and designate roles:

! On the controller that will be ACTIVE
redundancy
 mode sso
 role active
! On the controller that will be STANDBY
redundancy
 mode sso
 role standby

After you commit these commands, the controllers begin the SSO pairing process. They synchronize licenses, certificates, AP configurations, and client session data. This can take several minutes depending on the size of your network.

Step 4: Verify SSO is Operational

Run these commands to confirm the pair is healthy:

show redundancy
show platform software redundancy state summary
show platform software redundancy interface status
show redundancy state peer summary

You should see output like:

Redundancy operational.
This is the Active unit.
Redundancy state: Synchronized.

If you see "Not Synchronized," wait a few more minutes. If the status remains not synchronized after 10 minutes, check the RMI connectivity and RP link status.

Deep Dive: The Redundancy Management Interface (RMI)

The RMI deserves special attention because it's where many SSO deployments run into trouble.

RMI Topology Considerations

You have two main RMI deployment models:

Model 1: Direct IP Connectivity (Same Subnet)

Both controllers' RMI interfaces sit on the same VLAN and subnet. This is the simplest and most common approach. The controllers reach each other with single-hop ARP resolution. Configuration is straightforward—just assign IPs on the same subnet and ensure the VLAN is trunked to both controllers. Latency is typically very low because the frames traverse at most a few switch hops.

Model 2: Routed Connectivity (Different Subnets)

The RMI interfaces sit on different VLANs or subnets, and the network routes traffic between them. This works when controllers are in different buildings or data centers. Latency increases slightly, but as long as latency is under 100ms and jitter is stable, SSO functions normally. Ensure the routing path is symmetric (same latency and loss in both directions) and redundant if possible.

Most enterprises use Model 1 for controllers in the same facility and Model 2 for geographically distributed HA pairs.

RMI Bandwidth and Traffic

The RMI carries state synchronization, keepalive messages, and management traffic. A typical RMI link sees 1-5 Mbps of traffic during normal operation, with peaks during large-scale client events (mass reconnection, new AP joins). A 1 Gbps link is more than sufficient; you don't need a dedicated high-speed interface for RMI.

However, ensure the RMI VLAN is never congested. If your network is overprovisioned for other VLANs but the RMI is bottlenecked, SSO synchronization will lag and failover may be delayed or incomplete.

Keepalive and Peer Timeout Configuration

The controllers exchange keepalive messages over the RP and RMI. If the active doesn't send keepalives for a configured period, the standby detects failure and takes over.

redundancy
 keepalive 5 3
 peer-timeout 30

The keepalive 5 3 command means: send keepalives every 5 seconds, and declare the peer dead after 3 consecutive missed keepalives (15 seconds total). The peer-timeout 30 adds a secondary timeout—if the active doesn't send any RMI traffic for 30 seconds, failover is triggered even if keepalives are arriving.

Tune these values based on your network stability. For WAN-based RMI links with occasional packet loss, increase the timeout. For LAN-based RMI with stable latency, the defaults are fine.

Gateway Monitoring and Failover Triggers

SSO can monitor external gateways (firewalls, core routers) and trigger failover if the active controller loses connectivity to all configured gateways. This prevents scenarios where the active remains "up" but is isolated from the network.

Configuring Gateway Monitoring

redundancy
 gateway-monitor
  primary 10.1.1.1 ! Your primary uplink gateway
  secondary 10.1.1.2 ! Backup gateway
  icmp-count 3
  icmp-timeout 2

The icmp-count parameter sets how many consecutive ICMP echo requests must fail before the gateway is considered unreachable. The icmp-timeout is the wait time for each probe response.

With the above configuration, if the active cannot reach 10.1.1.1 or 10.1.1.2 for 3 consecutive probes, it considers itself isolated. It transitions to a failed state or triggers failover, allowing the standby to become active.

Understanding Failover Triggers

An SSO failover occurs when any of these conditions are met:

  • Active controller process crashes or is restarted.
  • Standby detects 3 consecutive missed keepalives from the active (RMI-based).
  • Active loses connectivity to all configured gateway monitors.
  • Administrator manually forces failover using redundancy force-role standby.
  • Active detects a critical hardware failure (PSU, thermal, fan) and gracefully hands over.

Once failover is triggered, the standby becomes active within sub-second timeframes (typically 100-500ms). During this window, clients may see a brief traffic interruption, but their sessions are preserved because state was synchronized continuously.

ISSU and Rolling Upgrades with SSO

One of the major benefits of SSO is the ability to upgrade software without a complete network outage. In-Service Software Upgrade (ISSU) allows you to upgrade the standby controller while the active continues serving clients, then perform a graceful failover to the upgraded standby.

The process looks like this:

  1. Verify the standby is fully synchronized with the active.
  2. Access the standby controller's console or management port.
  3. Perform a software upgrade on the standby using the standard IOS XE upgrade procedure.
  4. The standby reboots with new software and rejoins the pair.
  5. Once synchronized, perform a controlled failover: redundancy force-role standby on the active. This makes the active become standby and vice versa.
  6. Now the "new" active (formerly standby, now running new code) serves clients while the "new" standby (formerly active, running old code) is ready for upgrade.
  7. Repeat steps 2-5 on the other controller.

The entire upgrade window sees no client disconnections or traffic loss because the active is always present and synchronized.

Feature Interactions: Mobility MAC, LAG, and Multi-Chassis Configurations

When you deploy SSO, some controller features interact with redundancy in non-obvious ways.

Mobility MAC Address

Each SSO pair shares a single Mobility MAC address. This is a virtual MAC used for client roaming. When a client roams between APs associated with the same SSO pair, it sees the same MAC address on the network, enabling seamless Layer 2 roaming without new DHCP assignments.

The mobility MAC is configured on both controllers and automatically synchronized during SSO pairing. After failover, the mobility MAC remains the same; clients experience transparent roaming.

redundancy
 mobility-mac 0a:1b:2c:3d:4e:5f

You can bond multiple uplink ports into a single LAG for increased throughput and redundancy. With SSO, LAG is supported, but the configuration must be identical on both controllers. Each controller's share of the LAG is controlled independently, and the LAG automatically rebalances across the controllers during failover.

Multi-Chassis LAG (MC-LAG) with Catalyst switches is also supported, allowing you to spread APs across multiple switches while maintaining a single controller pair. This is advanced topology, but the SSO pairing itself works the same way.

Embedded Wireless Controller (EWC) Deployments

Some enterprises embed the wireless controller on a Catalyst 9600 or 9400 series switch (called EWC-SW). You can configure SSO on embedded controllers just as you would on standalone 9800s. The RP and RMI configuration remains the same. However, because the embedded controller shares hardware with the switch, any switch hardware issues impact controller availability.

Similarly, you can run controllers on Catalyst access points (EWC-AP), though this is less common. SSO principles apply, but latency and bandwidth constraints are tighter.

Verification Commands and Health Checks

Regular health checks ensure your SSO pair remains ready for failover. Use these commands in your monitoring and troubleshooting workflow.

Basic Status

show redundancy
show redundancy state peer summary
show platform software redundancy state summary

RP and RMI Health

show platform software redundancy interface status
show redundancy events
show platform software redundancy interface timestamp

Synchronization Status

show platform software redundancy sync-status
show platform software redundancy sync-database table-name

Peer Connectivity and Latency

ping 10.200.1.11 ! Ping the RMI IP of the standby
traceroute 10.200.1.11

AP and Client Counts

show ap count all
show client count all
show wlan

During failover events, run these commands on the new active to verify state was preserved:

show ap count all ! Client counts should match pre-failover
show redundancy ! Should show "Synchronized"
show redundancy events ! Review failover log

Troubleshooting Common SSO Issues

Problem: SSO pair is "Not Synchronized"

Causes: RMI link down, RP link down, or large configuration mismatch between controllers. Fix: Verify RMI is reachable with ping; check RP link status with show platform software redundancy interface status; ensure both controllers run identical software versions and have the same licenses.

Problem: Keepalive timeouts, frequent failovers

Causes: High latency or jitter on RMI link, RP cable flapping, or network congestion. Fix: Check RMI latency with ping; verify RP cable is seated and undamaged; run packet capture on RMI link to detect loss or reordering; increase keepalive timeout if RMI is on a WAN link.

Problem: Standby will not synchronize AP list

Causes: AP database is very large, or RP link is congested. Fix: Monitor show platform software redundancy sync-status and let synchronization complete (can take 10+ minutes for large networks); prioritize RP link bandwidth; consider offloading non-essential traffic from the RP VLAN.

Problem: After failover, clients experience long reconnection time

Causes: AP state was not fully synchronized, or new active is discovering APs slowly. Fix: Verify sync-status is "Synchronized" before failover; ensure APs have layer 3 reachability to the new active; check if DHCP/DNS is causing delays.

Key Takeaways and Next Steps

Building an SSO pair protects your wireless network from controller failure. The architecture—Redundancy Port, RMI, active-standby roles, and keepalive mechanisms—ensures fast, state-preserving failover with minimal user impact. Key points to remember:

  • Plan your topology first: Decide on RP cable runs, RMI VLAN, and gateway monitoring before purchasing hardware.
  • Match hardware and software exactly: Model and IOS XE version must be identical on both controllers.
  • Bring up the RP before enabling SSO: The Redundancy Port is the primary sync path; without it, state synchronization will stall.
  • Monitor RMI latency and loss: Sub-20ms latency and zero loss is the target for LAN-based RMI; adjust keepalive timeouts for WAN links.
  • Verify synchronization regularly: Use show platform software redundancy sync-status in your monitoring workflow to catch desynchronization early.
  • Test failover in a maintenance window: Force a failover using redundancy force-role standby to confirm client session preservation and recovery time.
  • Plan for ISSU upgrades: SSO enables zero-downtime software upgrades when you follow the rolling upgrade procedure.

With SSO configured and tested, your network can tolerate controller failures without interrupting client connectivity. This is the HA posture that most enterprise wireless networks require.

Read next

© 2025 Ping Labz. All rights reserved.