Skip to content

RADIUS Server Unreachable in 802.1X: Causes and Fixes

J
RADIUS Server Unreachable in 802.1X: Causes and Fixes

Impact and Urgency

When the RADIUS server is unreachable, the impact on a production access layer switch is immediate and broad. Every port configured with authentication port-control auto is affected. New authentication attempts fail. Re-authentication timers that fire during the outage fail. Depending on how the switch is configured to handle server-dead conditions, endpoints may lose network access entirely, remain in their current VLAN, or be placed in a Critical VLAN.

In a campus building switch with 48 access ports, a RADIUS outage affecting all ports simultaneously is a P1 incident. The diagnostic sequence needs to be fast and methodical.


Confirming the Switch Has Marked ISE Dead

SW9300# show aaa servers

Sample output — ISE marked dead:

RADIUS: id 1, priority 1, host 10.0.0.10, auth-port 1812, acct-port 1813
     State: current DEAD, duration 00:04:22, previous duration 3d14h
     Dead: total time 262s, count 3
     Platform State from SMD: current DEAD, duration 00:04:22, previous duration 3d14h
     Platform Dead: total time 262s, count 3
     UP/DOWN: #times 4, #failed transitions 0
     Authentication: #sent 1876, #received 1614
     Retransmission: #sent 262, #late responses 0, #bad responses 5
     Estimated Outstanding Access Transactions: 0

Key fields:

State: current DEAD — ISE is currently unreachable from this switch's perspective.

Dead: total time 262s, count 3 — the server has been declared dead three times total, for a combined 262 seconds. count 3 in a single shift suggests an intermittent connectivity issue rather than a clean outage.

Retransmission: #sent 262 — the switch retransmitted 262 RADIUS packets before giving up. A ratio of retransmissions to total sent (262/1876 = 14%) indicates a significant packet loss or latency issue.

#bad responses 5 — ISE sent responses that failed authentication validation (likely shared secret mismatch, since a wrong shared secret causes HMAC verification to fail).

SW9300# show aaa dead-criteria radius 10.0.0.10 auth-port 1812

Sample output:

RADIUS: id 1
Dead Criteria Details for Server 10.0.0.10/1812:
     Configured Retransmits:        3
     Configured Timeout:            5
     Estimated Outstanding Transactions:        0
     Dead Detect Interval:        7
     Computed Dead Detect Moves:        4
     Dead Detect Interval:        30
     Current State:        DEAD
     Time Server has been Dead:        4 min 22 sec

Configured Retransmits: 3 and Configured Timeout: 5 — the switch retries 3 times with 5-second timeouts before declaring the server dead. Total failover time per server: up to 15 seconds. With two RADIUS servers configured, total failover from primary to secondary takes up to 15 seconds before traffic redirects.


Step 1: Basic Connectivity Test

The first question is whether the switch can reach ISE at all.

SW9300# ping 10.0.0.10 source Vlan99 repeat 10

Sample output — reachable:

Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 10.0.0.10, timeout is 2 seconds:
Packet sent with a source address of 10.0.99.1
!!!!!!!!!!!!!
Success rate is 100 percent (10/10), round-trip min/avg/max = 2/3/5 ms

Sample output — unreachable:

Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 10.0.0.10, timeout is 2 seconds:
Packet sent with a source address of 10.0.99.1
..........
Success rate is 0 percent (0/10)

Source Vlan99 is critical. The switch sends RADIUS packets from whatever interface is specified in ip radius source-interface. If you ping without specifying source, the ping uses the routing table's best source, which may succeed even though RADIUS-sourced packets from Vlan99 cannot reach ISE. Always source the ping from the same interface as RADIUS traffic.

If ping fails:

Check routing:

SW9300# show ip route 10.0.0.10

Verify there is a route to ISE. In a campus topology, ISE at 10.0.0.10 should be reachable via the default route or a specific route through the core switch.

SW9300# show ip route 0.0.0.0

If there is no default route and no specific route to the ISE subnet, RADIUS packets are black-holed. Add the route or verify the routing protocol is distributing the ISE subnet correctly.

Check the source interface:

SW9300# show running-config | include source-interface
ip radius source-interface Vlan99

Verify Vlan99 is up:

SW9300# show interface Vlan99
Vlan99 is up, line protocol is up
  Hardware is Ethernet SVI, address is 1c6a.7ae0.1234 (bia 1c6a.7ae0.1234)
  Internet address is 10.0.99.1/24

If Vlan99 is down, the switch has no source IP for RADIUS packets. Fix the VLAN 99 interface first.


Step 2: Test UDP Reachability on RADIUS Ports

ICMP success does not guarantee UDP 1812/1813 reachability. A firewall may pass ICMP but block RADIUS ports.

SW9300# test aaa group ISE-SERVERS test-user badpassword new-code

Sample output when RADIUS is working:

Attempting authentication test to server-group ISE-SERVERS using radius
User authentication request was rejected by server.

"Rejected by server" is a success in this context — it means ISE received the request and responded. Even though the test credentials are wrong, the RADIUS exchange completed.

Sample output when RADIUS port is blocked:

Attempting authentication test to server-group ISE-SERVERS using radius
User was not authenticated.

No response from ISE — the test times out. This means the RADIUS packet left the switch but ISE never responded. Either the firewall blocked UDP 1812, ISE's RADIUS service is down, or the packet never reached ISE.

To differentiate between a firewall block and an ISE service failure, check ISE directly.


Step 3: Check ISE Health

Navigation: Administration > System > Deployment

Check the status of all ISE nodes. Each PSN (Policy Service Node) should show "In Service." A node showing "Out of Service" or with a red indicator has a service failure.

Navigation: Administration > System > Health Summary

This dashboard shows CPU, memory, and disk utilization for each ISE node. ISE marking RADIUS requests as failed — while technically reachable — can happen when ISE is overloaded:

  • CPU > 90% sustained: ISE cannot process RADIUS requests fast enough, causing switch timeouts
  • Memory > 90%: ISE services may crash or become unresponsive
  • Disk > 80% on /opt/CSCOcpm: ISE logs fill disk, causing service instability

Navigation: Operations > Troubleshoot > Diagnostic Tools > Network Device RADIUS Diagnostic

Enter the switch's IP address (10.0.99.1) and click "Run." This tool checks whether ISE has a matching Network Device entry for that IP, whether the shared secret is configured, and whether the RADIUS service is able to receive requests.

Check the ISE RADIUS service status via CLI on ISE (if you have CLI access):

show application status ise

Look for radius service status. If the RADIUS service is stopped, restart it:

application start ise

Step 4: Shared Secret Mismatch

A shared secret mismatch is one of the most common causes of RADIUS failures that present as "unreachable" — ISE receives the packets but cannot validate them, so it silently drops them. The switch gets no response, retransmits, and eventually marks ISE dead.

The tell-tale sign: #bad responses in show aaa servers output. RADIUS bad responses occur when ISE sends a response but the switch cannot verify the Message-Authenticator attribute (because the shared secret is wrong on one side).

Also check ISE: if ISE drops packets due to shared secret mismatch, it logs:

Navigation: Operations > RADIUS > Live Logs

Look for entries with Failure Reason: "11001 - RADIUS packet already in the process" or entries from the switch IP that show no matching Network Device (Failure Reason: "11007 - Could not locate Network Device or AAA Client").

On the switch, re-enter the shared secret:

SW9300(config)# radius server ISE-PRIMARY
SW9300(config-radius-server)# key ISEsecret123

On ISE: Administration > Network Resources > Network Devices > [Device] > Authentication Settings > Shared Secret — retype the shared secret. Save.

After changing on both ends, test immediately:

SW9300# test aaa group ISE-SERVERS test-user badpassword new-code

Step 5: Verify Network Device Definition in ISE

ISE only responds to RADIUS requests from devices explicitly listed in its Network Device database. If the switch's source IP is not in ISE's Network Device list, ISE drops all RADIUS requests from that switch.

Navigation: Administration > Network Resources > Network Devices

Verify:

  1. A Network Device entry exists for this switch
  2. The IP address in the entry matches the switch's ip radius source-interface address (10.0.99.1 in this lab)
  3. The RADIUS Authentication Settings are enabled with the correct shared secret
  4. The entry is not associated with a Network Device Group that has been excluded from policy

If the switch's management IP recently changed (common after re-IP projects), the old IP in ISE's Network Device entry will cause all RADIUS requests to be silently dropped.


Step 6: Critical VLAN Behavior During Outage

While troubleshooting the RADIUS outage, confirm that connected endpoints are in an appropriate state. If the switch is configured for Critical VLAN fallback:

SW9300(config-if)# authentication event server dead action reinitialize vlan 50
SW9300(config-if)# authentication event server alive action reinitialize

Endpoints on ports with this configuration move to VLAN 50 (Critical VLAN) when ISE is unreachable. This is intentional — it gives endpoints limited connectivity (typically DNS and DHCP only) while the RADIUS issue is resolved.

Check current port states:

SW9300# show authentication sessions

Sample output during RADIUS outage with Critical VLAN configured:

Interface  MAC Address     Method  Domain  Status        Fg  Session ID
Gi1/0/1    a4b1.c2d3.e4f5  N/A     UNKNOWN Auth-Critical     0A0063010000002C
Gi1/0/2    b5c2.d4e5.f6a7  N/A     UNKNOWN Auth-Critical     0A0063010000002D
Gi1/0/3    c6d3.e5f6.a7b8  dot1x   DATA    Authorized         0A0063010000002E

Auth-Critical status means the endpoint is in Critical VLAN mode (the switch-initiated fallback). The Authorized entry on Gi1/0/3 is a session that was already authenticated before the outage — the existing session remains active.

When ISE comes back up, ports with authentication event server alive action reinitialize trigger a re-authentication automatically. Monitor this with:

SW9300# debug authentication all

Step 7: Deadtime and Recovery Tuning

After marking a RADIUS server dead, the switch enters a deadtime period during which it does not attempt to contact that server. This prevents the switch from flooding a recovering ISE with pent-up requests.

SW9300# show running-config | include deadtime
radius-server deadtime 15

Default deadtime is 0 (no deadtime — the switch retries the dead server immediately). This can cause thundering herd problems when ISE recovers after a long outage. A deadtime of 15 minutes gives ISE time to stabilize before the switch resumes sending requests.

To configure:

SW9300(config)# radius-server deadtime 15

During the deadtime period, the switch does not send authentication requests to the dead server. If a secondary RADIUS server is configured, all traffic goes to the secondary. If there is no secondary, new authentication attempts fail until the deadtime expires.

For redundant ISE deployments, see Article 28 — RADIUS Redundancy and Failover in 802.1X Deployments for the full configuration with multiple PSNs and tuned failover parameters.


Troubleshooting

Symptom: Switch shows ISE as DEAD but ISE reports normal operation and other switches are authenticating fine

Cause: The specific switch cannot reach ISE due to a port ACL, VLAN ACL (VACL), or route issue affecting only this switch's management VLAN. Other switches on different management subnets are unaffected.

Fix: Narrow the scope. Confirm the management VLAN and source interface for this switch:

SW9300# show running-config | include source-interface
ip radius source-interface Vlan99
SW9300# show interface Vlan99 | include Internet
  Internet address is 10.0.99.1/24

Check for any ACL applied to Vlan99:

SW9300# show ip interface Vlan99 | include access list
  Inbound  access list is MGMT-ACL-IN
  Outbound access list is not set

Review the ACL for rules blocking UDP 1812/1813 outbound or UDP responses inbound:

SW9300# show ip access-lists MGMT-ACL-IN

If the management VLAN ACL blocks RADIUS traffic, add a permit statement before the deny-all:

SW9300(config)# ip access-list extended MGMT-ACL-IN
SW9300(config-ext-nacl)# permit udp host 10.0.0.10 any eq 1812
SW9300(config-ext-nacl)# permit udp host 10.0.0.10 any eq 1813

Symptom: RADIUS works initially after a switch reload but fails after several hours — ISE shows the switch making authentication requests from an unexpected IP

Cause: The ip radius source-interface Vlan99 is configured correctly, but a DHCP-assigned IP on another interface (perhaps a routed port or Out-of-Band management port) becomes the preferred source for RADIUS packets after routing table changes during uptime. ISE's Network Device entry does not include this alternate IP, so ISE drops the requests.

Fix: Verify the source interface is consistently used. Check show aaa servers and note the IP ISE is seeing requests from. Compare to the switch's ip radius source-interface setting. If routing changes are causing the source IP to shift, make the management VLAN interface the only viable source by ensuring ip radius source-interface is set and the VLAN99 interface is always up. Also check for any no ip radius source-interface in the configuration that might be overriding the global setting.

As a defensive measure, add the alternate IP to ISE's Network Device entry as a secondary address, or use a network device group with a subnet range:

Navigation: Administration > Network Resources > Network Devices > [Device] > IP Address

Change from a single IP to an IP range or add the alternate IP as an additional entry.


Symptom: ISE is reachable and shared secret is correct, but show aaa servers shows high retransmission count (>5%) and intermittent DEAD declarations

Cause: ISE is overloaded or the network path between the switch and ISE has packet loss. The switch sends RADIUS requests, ISE processes them but responds slowly, and the switch times out and retransmits before the response arrives.

Fix: First check ISE health at Administration > System > Health Summary. If CPU or memory is high, investigate which ISE service is consuming resources. Profiling, posture, and pxGrid can all spike CPU during large endpoint onboarding events.

Second, check the network path for packet loss:

SW9300# ping 10.0.0.10 source Vlan99 repeat 500 size 512

A 500-ping test with larger payload (512 bytes, closer to RADIUS packet size) reveals intermittent packet loss that a 10-ping test misses.

Third, tune the RADIUS timeout and retransmit on the switch to be more tolerant of slower ISE responses:

SW9300(config)# radius server ISE-PRIMARY
SW9300(config-radius-server)# timeout 10
SW9300(config-radius-server)# retransmit 3

Increasing timeout from 5 to 10 seconds gives ISE more time to respond before the switch marks it as non-responsive.


Verifying Recovery

After fixing the underlying issue, confirm ISE transitions from DEAD back to UP:

SW9300# show aaa servers

The server state should show current UP. The UP/DOWN: #times counter increments with each state transition — this counter tells you the total number of times ISE has been declared dead since the switch last reloaded.

Force re-authentication on affected ports to clear Critical VLAN state:

SW9300# clear authentication sessions

This clears all sessions and forces re-authentication. Use with caution in production — it briefly disrupts all authenticated endpoints on the switch.

To clear only the Critical VLAN sessions without disrupting already-authenticated sessions:

SW9300# clear authentication sessions interface GigabitEthernet1/0/1

Repeat for each port showing Auth-Critical status. Ports will re-authenticate through ISE and receive their correct VLAN assignment.


What's Next: Article 23 — Dynamic VLAN Assignment Not Working in 802.1X: Troubleshooting Guide — covers the specific failure scenarios where RADIUS and ISE are reachable and authentication succeeds, but the endpoint lands in the wrong VLAN or the native VLAN instead of the assigned one.

© 2025 Ping Labz. All rights reserved.