C9800 Roaming Issues: Diagnosing and Fixing Fast Roaming Problems
Roaming is the lifeblood of enterprise wireless networks. When a client transitions from one access point to another, that handoff needs to be seamless—ideally completing in under 150 milliseconds to avoid dropped calls and degraded application performance. On the Cisco Catalyst 9800 platform, roaming is supposed to work reliably, but when it doesn't, you're left with sticky clients, dropped VoIP calls, and users wondering why mobility isn't "mobile."
This article walks you through the most common C9800 roaming failures, shows you how to diagnose them using controller logs and CLI commands, and provides concrete fixes. Whether you're dealing with 802.11r Fast Roaming (FT) breakdowns, PMKID caching failures, or clients that refuse to roam away from a degraded access point, you'll find the tools and techniques to isolate and resolve the problem.
Understanding Roaming Types on the C9800
The C9800 supports multiple roaming mechanisms. Each offers different speed and security trade-offs, and your troubleshooting approach depends on which one your network is using.
| Roaming Type | Protocol/Method | Roam Speed | Security Model | When to Use |
|---|---|---|---|---|
| Full Re-Authentication (Slow Roaming) | 802.1X/EAP full handshake | 300–800 ms | Complete key generation on new AP | Legacy devices; when FT is unavailable |
| 802.11r Fast Roaming (FT) | FT-PSK (Pre-Shared Key) or FT-802.1X with PMK caching | 50–150 ms | PMK pre-shared; PMKID from prior AP | VoIP, real-time apps; most enterprises |
| Opportunistic Key Caching (OKC) | PMKID caching without FT announcement | 100–250 ms | PMKID pre-calculated; full auth skipped | Legacy 802.11 devices that don't support FT |
| CCKM (Cisco Centralized Key Management) | Proprietary Cisco lightweight controller protocol | 50–100 ms | Controller-managed key derivation | Cisco APN Aironet devices on legacy deployments |
On modern C9800 deployments (especially cloud-managed or converged networks), 802.11r FT is the standard. However, OKC remains a fallback for clients that claim FT support but don't actually implement it properly. CCKM is rarely used in new greenfield networks but appears in hybrid AireOS-to-C9800 migrations.
The C9800 Mobility Architecture and Roaming
Before diving into troubleshooting, you need to understand how the C9800 handles roaming across multiple controllers. Unlike lightweight APs that always tunnel traffic through their join controller, the C9800 supports distributed and centralized forwarding modes. In distributed mode, traffic can flow directly from the AP to the WLAN gateway; in centralized mode, traffic tunnels back to the anchor controller.
Roaming within the same controller is straightforward—the controller simply moves the client state from one AP to another. Inter-controller roaming is more complex. It involves mobility tunnels (CAPWAP-based, running over UDP 16666/16667) that secure the handoff between controllers and synchronize client state. If these tunnels break or misconfigure, you'll see clients disconnect during roaming or fail to re-authenticate on the target AP.
Auto anchoring is another key component. When a client first associates, the C9800 designates one controller as the anchor (the PMK holder and security reference point). The client's traffic may flow from any controller in the mobility group, but the anchor controller is the source of truth for the client's encryption keys. If the anchor controller is unreachable or if DTLS encryption on the mobility tunnel is broken, roaming authentication can fail silently.
802.11r Fast Roaming (FT) Failures: Diagnosis and Fix
FT is enabled in the WLAN profile's security settings. When you configure "802.11r Fast Roaming: Enabled" on the C9800, the controller advertises FT capability in beacons and probes. A client that supports FT will use the FT Initial Mobility Domain (IMD) handshake during association, allowing it to skip full re-authentication on subsequent roams.
Common FT failure modes:
- Client does not claim FT support: The client advertises no FT capability in its association request, even though FT is enabled on the WLAN. This is a driver or OS issue, not a controller problem.
- FT Reassociation timeout: The client initiates an FT Reassociation (not full re-auth) but the target AP never responds or responds with a failure code.
- PMKID mismatch: The client provides a PMKID to the target AP, but that PMKID is not in the target AP's cache or the anchor controller's PMK database.
- FT Element validation failure: The FT Reassociation Response contains an FT Element (IE) that fails validation—checksum mismatch, invalid MIC, or key derivation error.
To diagnose FT failures, enable detailed roaming logs on the C9800:
C9800# debug wireless client-roaming enable
C9800# debug wireless client state enable
C9800# debug wireless client detail enable
Then trigger a roam (move the client closer to a target AP while moving away from the current AP). Watch the logs in real-time:
C9800# show log | include FT|PMKID|roaming|reassocA successful FT roam produces logs like this:
Wed Apr 3 14:22:15.432 UTC: %WIRELESS-3-CLIENT_STATE: Client (MAC: 001a.2b3c.4d5e, VLAN: 10, AP-name: AP01)
state change: ROAM_INIT -> ROAM_AUTH_REQ
Wed Apr 3 14:22:15.445 UTC: %WIRELESS-6-FT_ROAMING: Client (001a.2b3c.4d5e) FT Reassociation
initiated on AP02. PMKID: 0x1a2b3c4d. FT Element MIC validated.
Wed Apr 3 14:22:15.452 UTC: %WIRELESS-3-CLIENT_STATE: Client (001a.2b3c.4d5e)
state change: ROAM_AUTH_REQ -> AUTHENTICATED
Wed Apr 3 14:22:15.460 UTC: %WIRELESS-6-MOBILITY_TUNNEL: Mobility sync message sent
to anchor controller 10.0.1.50 for client 001a.2b3c.4d5e (DTLS encrypted).
A failed FT roam looks different:
Wed Apr 3 14:25:32.118 UTC: %WIRELESS-3-CLIENT_STATE: Client (MAC: 001a.2b3c.4d5e, VLAN: 10, AP-name: AP01)
state change: ROAM_INIT -> ROAM_AUTH_REQ
Wed Apr 3 14:25:32.265 UTC: %WIRELESS-3-FT_ROAMING: Client (001a.2b3c.4d5e) FT Reassociation
timeout on AP02. No response from AP after 150ms retry window.
Wed Apr 3 14:25:32.275 UTC: %WIRELESS-6-ROAMING_FALLBACK: Client (001a.2b3c.4d5e)
attempting full re-authentication (slow roam) as fallback.
Wed Apr 3 14:25:32.890 UTC: %WIRELESS-3-CLIENT_STATE: Client (001a.2b3c.4d5e)
state change: ROAM_AUTH_REQ -> AUTHENTICATED
Key commands to verify FT status:
C9800# show wireless wlan-config summary | include Test-WLAN|802.11r|Fast-Roaming
WLAN Name : Test-WLAN
Fast-Roaming (802.11r) : Enabled
FT Type : FT-PSK or FT-802.1X
FT Over-The-DS : Enabled
To view per-client roaming metrics:
C9800# show wireless client detail 001a.2b3c.4d5e | include "Roaming|Protocol|FT|PMKID"
Client MAC Address : 001a.2b3c.4d5e
Roaming Protocol : FT (802.11r)
PMKID List (last 5) : 0x1a2b3c4d, 0x5e6f7g8h, 0x9i0j1k2l, 0x3m4n5o6p, 0x7q8r9s0t
Last Roam Duration : 87 ms
Last Roam Type : FT Reassociation
Fixing FT failures:
First, verify that FT is actually enabled on all APs in your mobility group. A common mistake is enabling FT on the controller policy but forgetting that individual RF profiles or per-AP overrides may disable it. Check the RF profile:
C9800# show wireless profile rf-policy Test-RF-Profile | include "FT|Fast-Roaming|802.11r"
FT Support : Enabled
FT Over-The-DS : Enabled
FT Key Derivation : SHA-256
If FT is truly enabled but clients still don't use it, the problem is driver-side. Update the client's wireless driver. Many Windows and macOS drivers from 2015–2017 have incomplete FT implementations. Newer drivers (2018+) handle FT reliably. If you can't update drivers, disable FT for those clients and fall back to OKC or full re-authentication.
If FT is working but roaming is slow (150+ ms), check DTLS encryption on the mobility tunnel. DTLS overhead can add 20–50 ms. If latency is critical, disable DTLS (if your mobility group is on a trusted internal network only):
C9800# configure terminal
C9800(config)# wireless mobility anchor dtls disable
C9800(config)# end
If clients consistently get "FT Element validation failure," verify that all controllers in the mobility group are using the same key derivation algorithm (SHA-256 or SHA-384). A mismatch causes FT element validation to fail:
C9800# show wireless mobility summary | include "FT Key Derivation|SHA"
FT Key Derivation Algorithm : SHA-256
PMKID Caching and OKC Issues
PMKID (Pairwise Master Key Identifier) caching is the foundation of fast roaming. When a client first authenticates to a WLAN, the controller generates a PMKID—a hash of the PMK (Pairwise Master Key) and other derivation data. The client caches this PMKID locally. On a future roam to another AP, the client includes the cached PMKID in its association request, signaling that it has a pre-calculated key and can skip full re-authentication.
Two roaming mechanisms use PMKID caching:
- 802.11r FT with PMK caching: Uses PMKID inside the FT handshake; the controller validates the PMKID and proceeds with FT reassociation.
- OKC (Opportunistic Key Caching): Uses PMKID without FT; the controller sees the PMKID and skips the full 802.1X handshake, deriving a new Pairwise Transient Key (PTK) from the PMK.
PMKID caching fails when:
- The client's PMKID is not in the AP's PMKID cache (expired, cleared, or AP cache overflow).
- The controller's PMK database is out of sync with the PMKID—this happens when the anchor controller is different from the current controller, or after controller failover.
- The client's clock is significantly skewed, causing PMKID derivation to produce different checksums.
- OKC is disabled on the WLAN profile when a legacy client needs it.
To verify PMKID caching is working:
C9800# show wireless client detail 001a.2b3c.4d5e | include "PMKID"
PMKID (cached) : 0x1a2b3c4d5e6f7g8h
PMKID Refresh Interval : 43200 seconds (12 hours)
To check AP-level PMKID cache utilization:
C9800# show ap summary | include "AP01|Slot|Cache"
AP01 (Controller: C9800-01, Slot 0) PMKID Cache: 87/256 entries
If an AP's PMKID cache is full (256/256 or 512/512 depending on memory), new clients won't get cached PMKIDs, forcing them to do full re-authentication on every roam. This is a performance killer in large deployments. Increase AP PMKID cache limits via the RF profile or perform a strategic AP reboot to flush old entries.
To enable OKC on a WLAN (for legacy clients):
C9800# configure terminal
C9800(config)# wireless wlan Test-WLAN
C9800(config-wlan)# security wpa akm opportunistic-key-caching
C9800(config-wlan)# end
Verify it's active:
C9800# show wireless wlan-config summary | include "Test-WLAN|OKC|Opportunistic"
WLAN Name : Test-WLAN
Opportunistic Key Caching: Enabled
Inter-Controller Roaming and Mobility Tunnels
When a client roams from an AP on Controller A to an AP on Controller B, the handoff must cross a mobility tunnel. This tunnel is a CAPWAP-based encrypted channel that synchronizes client state, passes keys, and ensures the anchor controller can validate the roaming client's identity.
Mobility tunnel failures are less common than FT failures but far more critical—they cause complete client drops and inability to re-authenticate on the new controller. Common causes include:
- Firewall blocking UDP 16666/16667 (CAPWAP mobility).
- Network latency or jitter on the controller-to-controller link exceeding CAPWAP timeouts (typically 3–5 seconds).
- DTLS negotiation failure due to certificate mismatch or clock skew between controllers.
- Mobility group misconfiguration—the two controllers don't recognize each other as peers.
- Controller software version mismatch (e.g., one on 17.3.4, the other on 17.6.2) causing protocol incompatibility.
To verify mobility tunnel health:
C9800-01# show wireless mobility summary
Mobility Group Name : enterprise-group
Group Member Count : 3
Anchor Controller : 10.0.1.50
Local Controller IP : 10.0.1.51
Mobility Peer Status:
Peer IP : 10.0.1.52 (C9800-02)
CAPWAP Status : UP
DTLS Status : UP
Last Heartbeat : 2 seconds ago
Tunnels Active : 12
Peer IP : 10.0.1.53 (C9800-03)
CAPWAP Status : UP
DTLS Status : UP
Last Heartbeat : 1 second ago
Tunnels Active : 8
If any peer shows "CAPWAP Status: DOWN" or "DTLS Status: DOWN," the mobility tunnel is broken. Check the network path between controllers:
C9800-01# ping 10.0.1.52 repeat 10
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echoes to 10.0.1.52, timeout is 2 seconds:
!!!!!!!!!!
Success rate is 100 percent (10/10), round-trip min/avg/max = 8/9/12 ms
Ping works. Next, verify CAPWAP connectivity explicitly:
C9800-01# debug wireless mobility capwap enable
C9800-01# debug wireless mobility dtls enable
Attempt a roam and check for CAPWAP or DTLS errors:
C9800-01# show log | include "CAPWAP|DTLS|Tunnel"A successful inter-controller roam generates logs like:
Wed Apr 3 14:45:22.890 UTC: %WIRELESS-6-MOBILITY_TUNNEL: Opening tunnel to peer
10.0.1.52 for client 001a.2b3c.4d5e. Tunnel ID: 0x1a2b3c4d.
Wed Apr 3 14:45:22.905 UTC: %WIRELESS-6-CAPWAP_CONNECT: CAPWAP session established
to 10.0.1.52:16666. DTLS encrypted.
Wed Apr 3 14:45:22.920 UTC: %WIRELESS-6-MOBILITY_SYNC: Client state synchronized to
peer 10.0.1.52. PMK, PMKID, and QoS policies delivered.
Wed Apr 3 14:45:22.940 UTC: %WIRELESS-3-CLIENT_STATE: Client 001a.2b3c.4d5e
authenticated on AP02 (C9800-02).
If you see DTLS errors, verify controller certificates are in sync. Each C9800 has an identity certificate for DTLS. If one controller regenerates its certificate (e.g., after a reboot or certificate renewal), the peer may reject the connection. Check certificate status:
C9800-01# show crypto certificate summary
Certificate Type : Identity
Subject : CN=C9800-01.example.com
Issuer : CN=PingLabz-CA
Valid From : 2024-03-15 10:00:00 UTC
Valid Until : 2026-03-15 10:00:00 UTC
Serial Number : 0x1a2b3c4d
If a certificate is expired or about to expire, renew it immediately. Controller reboots can also desynchronize certificates; if you've recently rebooted a controller and mobility tunnels are failing, investigate certificate misalignment.
Firewall checklist for mobility tunnels:
- Verify UDP 16666 (CAPWAP data) and UDP 16667 (CAPWAP control) are open between all controller IPs in the mobility group.
- Verify controllers can reach each other's management IP (typically TCP 8443 for REST API, used for synchronization).
- If using Inter-Release Controller Mobility (IRCM) to coexist with legacy AireOS controllers, also open UDP 5246/5247 (LWAPP) between C9800 and AireOS.
- Verify no ACLs or stateful firewall rules are timing out CAPWAP connections. CAPWAP is connectionless (UDP), but firewalls may track flows with a 30–60 second idle timeout. Long idle periods cause tunnel closures.
Sticky Clients: When Roaming Won't Happen
A sticky client is one that refuses to roam away from a degraded AP, even when a nearby AP has superior signal strength. The client maintains its association despite RSSI below -80 dBm and packet loss exceeding 20%. This isn't a C9800 bug—it's a client-side decision. Most wireless adapters use hysteresis thresholds: they won't roam unless the new AP's signal exceeds the current AP's signal by 15–20 dB AND the current AP's signal falls below a minimum threshold (typically -75 to -85 dBm).
Some clients also implement "stickiness"—a behavioral preference to avoid roaming for a configurable period (30–60 seconds) after the last roam, even if the current AP becomes degraded. This reduces roaming storms in high-density deployments but causes problems when an AP fails abruptly.
You cannot force a client to roam. The C9800 cannot push a roam command to a client. However, you can nudge them in several ways:
Method 1: Aggressive Coverage Hole Detection (CCD)
The C9800 runs a coverage hole detection algorithm that watches each client's packet loss and retransmission rate. If loss exceeds a configured threshold (default 20%), CCD triggers a "poor coverage" event. The controller then issues a Channel Switch Announcement (CSA) and 802.11k Neighbor Reports to suggest better APs. Some clients respect these hints and roam.
To enable and tune CCD:
C9800# configure terminal
C9800(config)# wireless coverage-hole-detection
C9800(config-chd)# minimum-rxsop threshold -70
C9800(config-chd)# packet-loss-threshold 20
C9800(config-chd)# action send-neighbor-report
C9800(config-chd)# end
Method 2: Disable the Sticky AP
If a particular AP consistently harbors sticky clients, disable that AP's WLAN temporarily. Clients will be forced to roam to a neighbor AP. Once roamed, they're less likely to return if the disabled AP's signal drops below their hysteresis threshold.
C9800# configure terminal
C9800(config)# interface Wireless1
C9800(config-if)# wireless wlan-disable Test-WLAN
C9800(config-if)# end
Re-enable the WLAN after a few minutes:
C9800# configure terminal
C9800(config)# interface Wireless1
C9800(config-if)# no wireless wlan-disable Test-WLAN
C9800(config-if)# end
Method 3: Use 802.11k/v/w Directives
802.11k (Neighbor Reports), 802.11v (BSS Transition Management), and 802.11w (Management Frame Protection) enable the controller to suggest or force roaming. Enable these in the WLAN security policy:
C9800# configure terminal
C9800(config)# wireless wlan Test-WLAN
C9800(config-wlan)# security dot11k neighbor-list
C9800(config-wlan)# security dot11v bss-transition
C9800(config-wlan)# security dot11w management-frame-protection required
C9800(config-wlan)# end
Not all clients support 802.11k/v/w, but those that do will respect BSS Transition Management requests and roam to suggested APs more readily.
Method 4: Client-Specific Configuration
If a particular device model or driver version is sticky, apply a device-specific RF profile that disables FT (forces OKC instead) or reduces TPC (transmit power) to weaken the sticky AP's signal:
C9800# configure terminal
C9800(config)# wireless device-classifier mac-address 00:1a:2b:3c:4d:5e
C9800(config-device)# rf-profile Aggressive-Roaming
C9800(config-device)# end
Slow Roaming (Full Re-Authentication) Issues
Slow roaming occurs when a client cannot or will not use FT or OKC, forcing the C9800 to perform a complete 802.1X EAP authentication on every roam. This takes 300–800 ms and can cause voice call drops, video stream interruptions, and poor user experience.
Causes of slow roaming:
- FT is disabled on the WLAN.
- The client does not support FT.
- FT is supported but the client's PMKID cache is empty or expired.
- The authentication server (RADIUS) is slow or experiencing latency.
- The client's EAP implementation is buggy and cannot complete handshakes in parallel.
To identify slow roaming in logs:
C9800# show log | include "802.1X|EAP|Re-Authentication|Full Auth"Sample output from a slow roam:
Wed Apr 3 14:55:11.220 UTC: %WIRELESS-6-CLIENT_ROAMING: Client 001a.2b3c.4d5e
roaming from AP01 to AP02. Roaming protocol: 802.1X (FT not available).
Wed Apr 3 14:55:11.340 UTC: %WIRELESS-6-EAP_START: Client 001a.2b3c.4d5e
EAP transaction initiated. Identity: user@example.com
Wed Apr 3 14:55:11.450 UTC: %WIRELESS-6-RADIUS_REQUEST: RADIUS request sent to
RADIUS server 10.0.2.10:1812. Method: PEAP (Tunneled TLS).
Wed Apr 3 14:55:11.890 UTC: %WIRELESS-6-RADIUS_RESPONSE: RADIUS Access-Accept
received for client 001a.2b3c.4d5e. Session Key delivered.
Wed Apr 3 14:55:11.920 UTC: %WIRELESS-3-CLIENT_STATE: Client 001a.2b3c.4d5e
state change: AUTHENTICATING -> AUTHENTICATED. Duration: 700 ms.
700 ms is slow. For VoIP networks, targets are 150 ms or less. To optimize slow roaming:
1. Verify RADIUS server response times:
C9800# debug radius enable
C9800# show log | include "RADIUS"Look for request-to-response times. If RADIUS responses take 150+ ms, the RADIUS server or network path is slow. Consider adding a local RADIUS proxy or caching service.
2. Enable RADIUS request pipelining:
By default, the C9800 sends one RADIUS request, waits for response, then sends the next. For high-density APs with many simultaneous roams, this serialization causes queuing delay. Enable pipelining:
C9800# configure terminal
C9800(config)# radius-server max-retransmit 3
C9800(config)# radius-server timeout 2
C9800(config)# aaa server radius dynamic-author
C9800(config-da-radius)# client 10.0.2.10 key "shared-secret-here"
C9800(config-da-radius)# end
3. Verify RADIUS load balancing:
If you have multiple RADIUS servers, ensure the C9800 is load-balancing requests across them. Uneven load (e.g., one server receiving 90% of traffic) causes the overloaded server to slow down:
C9800# show radius statistics | include "Access-Request|Server|Count"4. Lower EAP timeout thresholds:
If a RADIUS server is unresponsive, the C9800 waits up to 30 seconds before timing out. For roaming, this is too long. Reduce the timeout:
C9800# configure terminal
C9800(config)# radius-server timeout 3
C9800(config)# radius-server retransmit 2
C9800(config)# end
Now the C9800 waits 3 seconds and retries twice (6 seconds total) before failing over to another RADIUS server.
Client Driver Incompatibilities and FT
Even when FT is correctly configured on the C9800, client-side driver bugs can prevent FT from functioning. Common culprits:
| Vendor | Driver Version(s) | Issue | Workaround |
|---|---|---|---|
| Intel (Windows) | 21.x–22.x (2016–2017) | FT Element MIC validation fails; client falls back to slow roam | Update to 24.x+ (2020+); disable FT for affected users |
| Broadcom (macOS) | 15.x–17.x (2015–2018) | Does not advertise FT in association request; OKC works | Update to 20.x+ (2021+); use OKC as fallback |
| Qualcomm (Android) | Most versions before 2019 | Incomplete FT handshake; drops frames during roam | Device OS update; corporate-only Android devices (Knox) work better |
| MediaTek (Chromebook) | 2015–2017 Chromebooks | FT OTA (Over-The-Air) not supported; FT Over-DS (Distribution Service) only | Use FT Over-DS or update ChromeOS; most recent Chromebooks support both |
To detect driver incompatibilities:
Enable client-level roaming debug and capture the roaming event:
C9800# debug wireless client detail enable
C9800# debug wireless client state enableTrigger a roam and grep for FT-related messages:
C9800# show log | include "FT|OTA|Over-DS|MIC|validation"If you see "FT Element validation failure" or "FT OTA not supported," the driver is incompatible. Your options are:
- Update the driver/OS: Strongly preferred. Test in a pilot program first.
- Disable FT for that client: Use MAC address-based policies to disable FT and fall back to OKC or full re-auth.
- Use a per-device RF profile: Assign a stricter RF profile that disables advanced roaming features.
- Isolate to a legacy WLAN: Create a separate WLAN with legacy security settings (WPA2-PSK, no FT) for incompatible devices.
Diagnosing Roaming with Radioactive Traces
For complex roaming issues that don't surface in standard logs, the C9800 offers radioactive tracing—a per-client packet and state-change capture that records every frame sent/received and every internal state transition during a roaming event.
To enable radioactive tracing for a specific client MAC:
C9800# debug wireless packet client 001a.2b3c.4d5e enable
C9800# debug wireless packet filter association enable
C9800# debug wireless packet filter authentication enable
C9800# debug wireless packet filter eapol enable
Trigger a roam, then capture the output:
C9800# show log | redirect flash:/radioactive-trace-client.txtThe output is verbose—hundreds of lines—but extremely detailed. Look for:
- Deauthentication frames: "MAC 001a.2b3c.4d5e deauthentication sent from AP01. Reason code: 3 (disassociated because of inactivity)."
- Reassociation requests/responses: "FT Reassociation Request received. PMKID: 0x1a2b3c4d. FT Element present."
- EAPOL handshakes: "EAPOL-Key frame (2/4) received from client. Installation of PTK initiated."
- 4-way handshake timing: "EAPOL (1/4) sent. Waiting for (2/4). Timeout window: 1000 ms."
Timing information is particularly valuable. If the 4-way handshake consistently takes 400+ ms on APs in certain areas, RF interference is likely the culprit (retransmissions are slowing the handshake).
Key Takeaways and Troubleshooting Checklist
Before you start troubleshooting roaming issues, verify these fundamentals:
- Enable FT on the WLAN: Verify "802.11r Fast Roaming: Enabled" in the WLAN security profile and in the RF policy applied to all APs.
- Verify mobility group configuration: All controllers in the same location/network should be in the same mobility group. Check with "show wireless mobility summary."
- Confirm CAPWAP and DTLS tunnels are UP: Use "show wireless mobility summary" to check tunnel status between all peers.
- Check RADIUS server health: Slow roaming is often a slow RADIUS server problem, not a controller problem. Test RADIUS latency directly: "debug radius enable."
- Inspect client logs and driver version: Many roaming issues are client-side. Update wireless drivers, especially on Windows and macOS machines from 2015–2018.
- Use radioactive traces for packet-level detail: When standard logs don't reveal the problem, capture raw frame exchanges for the roaming client.
Roaming issue quick-reference:
- Roaming takes 300+ ms: Check if FT is actually enabled. Check RADIUS server response times. Enable RADIUS pipelining.
- Client roams erratically (every 5–10 seconds): Reduce transmit power (TPC) to stabilize signal. Enable hysteresis in RF profile. Check for RF interference (Spectrum Intelligence).
- Client won't roam away from degraded AP: Client-side sticky behavior. Use 802.11k/v/w to suggest better APs. Disable the sticky AP's WLAN temporarily.
- Inter-controller roaming fails (client drops): Check firewall (UDP 16666/16667). Verify CAPWAP and DTLS tunnels are UP. Check controller certificate dates.
- FT works sometimes, fails other times: PMKID cache on AP or controller is overflowed or out of sync. Increase PMKID cache limits. Verify all controllers use the same FT key derivation (SHA-256).
- New client cannot roam at all: Client driver doesn't support FT (check driver version). Enable OKC as fallback.
Roaming is a critical pillar of wireless network performance. These diagnostics and fixes will help you isolate and resolve nearly every roaming failure you encounter on the C9800. When in doubt, enable detailed client and mobility logging, capture a radioactive trace, and compare the output to the examples in this article. The C9800 logs are your best friend—they tell the complete story of what went wrong and why.