A working Catalyst 9800 deployment and a good Catalyst 9800 deployment are two very different things. The platform is forgiving — you can mis-tag APs, leave RRM on defaults, skip SSO, and the WLAN will still serve clients. The problem shows up later: a failover that reloads APs, a site survey that drifts out of alignment with DCA, a tag model that makes every change a regression test, or an upgrade window that runs long because the design never planned for it.
This guide pulls together the design decisions that matter most on the C9800 — the ones that are expensive to unwind once you've committed. It's organised the way you'll actually make the decisions: pick a platform, plan the HA model, design the tag and profile hierarchy, set up RF, lock down security, and plan the upgrade path. Every section includes the CLI you should be running to validate your choices on a live controller.
Start with the Right Platform
Platform choice is the first design decision, and it constrains everything after it. The C9800 family gives you five form factors — three appliances (C9800-L, C9800-40, C9800-80), a virtual form (C9800-CL), and two embedded variants (EWC-AP and EWC-SW) — and they are not interchangeable when it comes to scale, HA behaviour, or supported features.
When you size, don't just look at AP count. Look at expected client count, tunnels to terminate (FlexConnect central switching vs. local), throughput at the wireless management interface, and — critically — whether you intend to pair two units for SSO. A C9800-80 can only pair with another C9800-80, a C9800-L-C can only pair with another C9800-L-C, and on the C9800-CL both peers must be the same OVA template size. If you provision a Small C9800-CL today because that's all the APs you have, and the environment doubles next year, you cannot simply redeploy one side as Medium and keep SSO.
| Form factor | Typical role | Max APs (reference) | HA notes |
|---|---|---|---|
| C9800-L (Copper / Fiber) | Branch, small campus | Up to 250 APs | Must pair with identical sub-model (L-C with L-C, L-F with L-F) |
| C9800-40 | Mid-size campus | Up to 2,000 APs | Pairs only with another C9800-40; EPAs must match |
| C9800-80 | Large campus, data centre | Up to 6,000 APs | Pairs only with another C9800-80; EPAs must match |
| C9800-CL (virtual) | Private cloud, lab, DR | Small/Medium/Large OVAs | Both peers must use the same OVA size and same hypervisor family |
| EWC-AP / EWC-SW | Micro-site, retail | ~200 APs (EWC-AP) | EWC-AP does not support SSO — uses VRRP |
Two things to verify before you order hardware:
Confirm the form factor with show inventory and, on the CL, the resource envelope with show platform software system all. If the two peers disagree on either, the HA pairing will fail at the version-mismatch stage:
C9800#show inventory
NAME: "Chassis", DESCR: "Cisco Catalyst 9800-80 Wireless Controller"
PID: C9800-80-K9 , VID: V02, SN: FXS2345Q0XX
C9800#show redundancy config-sync failures bemIf you're paying for virtual infrastructure, verify that both C9800-CL VMs carry identical vCPU, memory, disk, and vNIC counts. Cisco enforces parity — a mismatched VM will refuse to peer.
Design High Availability Up Front, Not Later
HA on the C9800 is not a runtime toggle you add after go-live. The RP link, the RMI subnet, the chassis numbering, and even the active-chassis priority all need to be decided before you paste in a single WLAN profile. Retrofitting HA after a site is in production typically means scheduled downtime for a chassis renumber and a reload.
You have two HA models and you should know which one you're building:
| Model | What it gives you | When to choose it |
|---|---|---|
| SSO (RP + RMI) | Sub-second failover, zero AP reload, zero client drop for RUN-state clients, single virtual WLC IP | Any campus or branch where a controller outage means measurable business impact — the default choice |
| N+1 | Stateless failover, many-to-one backup, APs reload and rejoin on primary loss | Geographically distributed sites, or when you want a single DR controller to back up several production ones |
For SSO, the best practice since 17.1 is RP+RMI, not RP-only. RP-only SSO cannot detect gateway loss on the active, which leads to split-brain scenarios where both chassis declare themselves active. RMI adds a gateway reachability check and a dual-active detection channel over the uplinks. If you inherit an RP-only deployment on 17.x, migrating to RP+RMI is one of the highest-value changes you can make.
Design rules for an SSO pair that you should not compromise on:
- Both chassis must be the same form factor, same EPAs, and the same software version (including maintenance rebuilds). Starting with 17.5, the active can auto-upgrade the standby, but only if both are in Install mode.
- The RP link must have ≤ 80 ms RTT, ≥ 60 Mbps bandwidth, and standard 1500-byte MTU (jumbo frames are not supported on the RP).
- The RMI must sit in the same subnet as the WMI, with a unique IP per chassis. You do not configure a separate gateway — it borrows from the WMI.
- The RP link's dedicated VLAN must be unroutable and not filtered by any port ACL. The RP IPs are auto-derived as 169.254.x.y, where x.y mirrors the last two octets of the RMI.
- Chassis numbering matters. The C9800 defaults to chassis 1 on both sides — you must renumber one side to 2 before pairing, and the chassis you want to be active should carry the higher priority (2 vs. default 1).
Verification you should have in your HA commissioning runbook:
C9800#show chassis
Chassis MAC address : xxxx.xxxx.xxxx
Local Redundancy Port Type : Twisted Pair
Chassis# Role Mac Address Priority Version State IP
------------------------------------------------------------------------
*1 Active xxxx.xxxx.aaaa 2 V02 Ready 169.254.1.15
2 Standby xxxx.xxxx.bbbb 1 V02 Ready 169.254.1.17
C9800#show redundancy states
my state = 13 -ACTIVE
peer state = 8 -STANDBY HOT
Mode = Duplex
Unit = Primary
Unit ID = 1
Redundancy Mode (Operational) = sso
Redundancy Mode (Configured) = sso
Redundancy State = ssoIf you see "STANDBY COLD" or "STANDBY BULK" persist longer than a few minutes after a reload, stop and investigate before putting the pair into production — a pair that never reaches STANDBY HOT will double-fault on the next planned maintenance.
The Tag and Profile Hierarchy Is Your Real Design
The configuration model is where most C9800 deployments go wrong, usually quietly. The C9800 abandoned the AireOS "WLAN + AP group" style in favour of a three-tag model: each AP carries a Site Tag, a Policy Tag, and an RF Tag. Profiles hang off those tags, and policies hang off profiles. That extra layer of indirection is deliberate — it lets you change the behaviour of a single room, a building, or a region without touching every AP.
The design trap is over-tagging. A team that creates one Policy Tag per WLAN, one RF Tag per AP model, and one Site Tag per building quickly ends up with hundreds of tags that all differ in tiny ways. Future changes require editing every one of them, and no one on the team knows which ones are still in use.
A practical tag design discipline:
| Tag type | Design rule | Typical count in a well-designed deployment |
|---|---|---|
| Policy Tag | One per unique combination of SSIDs that share a site. If two buildings need the same SSIDs, they share a Policy Tag. | 2–5 for most campuses |
| Site Tag | One per operational boundary — not per building. A boundary is where you want a distinct AP Join Profile, Flex Profile, or local switching behaviour. | 1 for central mode; 1 per WAN-isolated branch for Flex |
| RF Tag | One per RF environment, not per AP model. Open office and warehouse are different RF environments; two floors of the same building are usually not. | 2–4 is normal; more than 6 is a smell |
Use naming that encodes intent rather than location. PT-Corp-Guest tells you what it does; PT-Bldg12-Floor3 tells you where it happens to be applied today, which is information you can get from the AP list. Good naming pays off every time you run show ap tag summary:
C9800#show ap tag summary
Number of APs: 412
AP Name AP Mac Site Tag Name Policy Tag Name RF Tag Name Misconfigured
-------------------------------------------------------------------------------------------
AP-HQ-001 aaaa.bbbb.0001 ST-HQ-Central PT-Corp-Guest RF-Office No
AP-HQ-002 aaaa.bbbb.0002 ST-HQ-Central PT-Corp-Guest RF-Office No
AP-WH-101 aaaa.bbbb.0101 ST-WH-Flex PT-Warehouse RF-HighCeil NoPay attention to that last column. Misconfigured = Yes means the AP is running on a locally-created tag that doesn't match its assignment — a sign that someone configured tags on the AP directly rather than via the controller. Fix those before they drift further.
One more design rule: avoid editing the default-site-tag, default-policy-tag, and default-rf-tag. Leave them as safety nets for unclaimed APs. Create named tags for everything else. When an AP joins with no tag assignment, it falls back to defaults — and if you've kept defaults clean, you can spot the unassigned AP immediately.
RF Design: Let RRM Work, Don't Fight It
The most common RF design mistake on a C9800 is treating RRM like AireOS — cranking the default TX power limits, manually pinning channels, or setting DCA to run once a day at 3 a.m. because "that's what we did before." On modern IOS-XE and with current RRM algorithms, the controller is better at this than you are, provided you give it accurate inputs.
The inputs that matter:
- AP placement from a site survey, not from a floor plan guess. RRM optimises the cells you give it; it cannot fix a deployment with too few APs or APs in the wrong places.
- Correct country code and regulatory domain on every AP. An AP stuck in an incorrect domain will refuse to use channels your design depends on, particularly in 5 GHz UNII-2 and UNII-2-extended.
- Honest TPC minimum and maximum values. The default range (−10 to 30 dBm) is almost always too wide. Set TPC min to the lowest power that still gives edge clients a usable signal (typically 8–11 dBm for office) and let TPC decide the rest.
- DCA channel list pruned to what you actually want used. If DFS channels are operationally painful in your building (for example, radar events during business hours), exclude them — don't let DCA select them and then override manually.
| Setting | Default | Recommended starting point | Why |
|---|---|---|---|
| DCA interval | 10 minutes (anchor time) | Keep default; enable "anchor time" during off-hours | Frequent evaluation, disruption only when clients are few |
| TPC min (5 GHz) | −10 dBm | 8–11 dBm | Prevents APs from collapsing to near-zero power during co-channel storms |
| TPC max (5 GHz) | 30 dBm | 17–20 dBm | Keeps cells sized to the survey, not to the AP's maximum EIRP |
| 2.4 GHz radios | All enabled | Disable half via FRA or admin-down | Reduces co-channel interference; FRA converts unused 2.4 radios to monitor or 5 GHz |
| Coverage Hole Detection | Enabled | Keep enabled, tune thresholds to deployment | Surfaces real coverage holes rather than client misbehaviour |
Validate that RRM is actually operating the way you think:
C9800#show ap dot11 5ghz summary
AP Name Subband Radio Mac Admin State Oper State Channel Width TxPwr
------------------------------------------------------------------------------------
AP-HQ-001 All aaaa.bbbb.0010 Enabled Up 44* 40 4/8 (17 dBm)
AP-HQ-002 All aaaa.bbbb.0020 Enabled Up 149* 40 3/8 (20 dBm)
C9800#show ap dot11 5ghz channel
Leader Automatic Channel Assignment
Channel Assignment Mode : AUTO
Channel Update Interval : 600 seconds
Anchor time (Hour of the day) : 3
DCA Sensitivity Level : MEDIUM : 15 dBThe asterisk next to the channel means DCA has assigned it. The TxPwr column shows current level out of the TPC range — if every AP is sitting at 1/8 (maximum), your TPC range is too wide or your AP density is too low.
Security: Defaults Are Not a Design
C9800 security defaults are sensible but they are not a posture. A design checklist for security that you should run against every new deployment:
- Disable management over wireless unless you have an explicit reason to allow it. This is off by default, and it should stay off — a client on the corporate SSID has no business SSHing to the WLC.
- Use a locally significant certificate (LSC) for AP DTLS rather than the manufacturer-installed certificate, particularly in regulated environments. The MIC works forever but gives you no control over trust.
- Put RADIUS servers behind dead-server detection and load-balancing. On a busy controller, a single hung RADIUS server can back up authentications across every SSID — the
aaa-dead-criteriaandradius-server load-balanceconfigurations let the WLC fail away from a wedged server quickly. - Standardise on WPA3-Enterprise where clients support it, with WPA2/WPA3 transition mode for mixed fleets. Do not leave legacy WPA (TKIP) enabled on new SSIDs — it disables high-throughput rates on the whole BSS.
- Enable rogue detection and auto-containment policies carefully. Auto-containment is a blunt instrument that can attack a neighbour's network in a shared building. Use it only where legal and physical isolation make it safe.
- Use ACLs on WMI and service ports. The controller exposes management on the WMI by default and there is no reason anything except your management VLAN should be able to reach TCP/22, 443, or 830.
A quick sanity check for the most common misconfiguration — an AP join profile left on the default certificate:
C9800#show ap join stats summary
Number of APs: 412
Base MAC Phy MAC AP Name IP Address Status Last Failure Phase Last Disconnect Reason
---------------------------------------------------------------------------------------------------------
aaaa.bbbb.0001 aaaa.bbbb.0002 AP-HQ-001 10.10.10.51 Joined Join Tag modified
C9800#show wireless certification config
LSC Provision State : Enabled
Trustpoint : lsc-trust
Subject Country : US
...If LSC shows Disabled and your environment requires certificate control, that's design debt worth paying off.
Plan the Upgrade Path Before the First Upgrade
The C9800 supports Install mode, Bundle mode, ISSU, SMU, and N+1 hitless rolling AP upgrades. Every one of these has implications for how you designed HA, tags, and AP join profiles, and you should pick your upgrade strategy on Day 0 rather than on the first maintenance night.
| Upgrade method | What it's for | Design prerequisite |
|---|---|---|
| Install mode (standard) | Full image upgrade via install add file … activate commit | Must already be booting packages.conf; not Bundle mode |
| ISSU (In-Service Software Upgrade) | Upgrade between compatible releases with no data-plane outage | SSO HA pair, both in Install mode, same EPAs, same software major |
| SMU (Software Maintenance Upgrade) | Hot patch for a specific defect | Install mode; SMUs are release-specific |
| N+1 Hitless Rolling AP Upgrade | Upgrade APs in waves across an N+1 pair with no site-wide outage | N+1 deployment, matching target image pre-downloaded to the secondary |
| AP image pre-download | Stages the new AP image over CAPWAP before the cutover | Enough bootflash on APs; scheduled ahead of the maintenance window |
Two non-obvious rules:
First, always run the controller in Install mode, not Bundle mode. Bundle mode will work but it locks you out of ISSU, SMU, and reliable rolling upgrades. If you've inherited a Bundle-mode controller, the conversion is a one-time procedure using install add file … activate commit — do it during a maintenance window and never look back.
Second, use AP image pre-download for every upgrade where AP reboots will happen. Pre-download pushes the new AP image to every AP over CAPWAP ahead of the window, so the only action during the window is the reboot itself. Without pre-download, every AP downloads the image during the cutover and your window stretches to match the slowest WAN link.
C9800#show version | include Installation
Installation mode is INSTALL
C9800#show ap image
Total number of APs : 412
Number of APs
Initiated : 0
Downloading : 0
Predownloading : 412
Completed predownloading : 398
Not Supported : 0
Failed to Predownload : 0
C9800#ap image predownload
Initiating predownload on all APsFinally, decide your rollback plan before the upgrade, not after. On Install mode, install abort within the commit timer restores the previous image cleanly. On ISSU, an abort during the activation phase rolls back automatically. Know which one applies to your chosen method, and rehearse it in a lab before you need it on a production weekend.
Day-2 Observability Is Part of the Design
A design is only as good as your ability to notice it drifting. Put at least these four telemetry sources in place on Day 0:
- Model-driven telemetry streaming RRM, client, and AP operational data to a collector. The WLC publishes rich YANG models; there is no reason to still be polling SNMP for this on a new 9800.
- Syslog to a central server, with informational and above captured. The HA-related messages (gateway reachability, RMI state, RP link flaps) are your early warning for a pair drifting out of sync.
- Netflow / AVC if you care about who is using bandwidth and for what. Auto-QoS without AVC visibility is a guess.
- show tech wireless baselined once per quarter. It's large, but a diff between two quarterly captures will surface tag sprawl, unexpected profile additions, and stale AAA servers faster than any dashboard.
Key Takeaways
Good C9800 design is less about clever configuration and more about deciding early and documenting why. Pick the platform with HA pairing and five-year scale in mind (not today's AP count). Commit to RP+RMI SSO for any deployment where an outage costs money, and put the RP, RMI, chassis number, and chassis priority into your design doc before touching the hardware. Keep your tag hierarchy small and named by intent — if your tag count grows linearly with your AP count, the model is wrong. Trust RRM but feed it a tight TPC range and a pruned DCA channel list. Treat security defaults as a starting point, not a finish line. Run everything in Install mode from Day 0 and pre-download AP images before every window. And put telemetry in place before you need it, not after the first incident.
The C9800 is a powerful platform, but the decisions that determine whether a deployment ages well are almost all made in the first two weeks. Spend that time on design, and the next five years get much easier.