Voice and Video QoS on Cisco IOS XE: LLQ, CBWFQ, IP Phone Trust

Voice and video are the use cases QoS exists for. Without QoS, a 1-percent packet loss rate on a TCP file transfer is invisible to the user; on a VoIP call, it is a robotic-sounding artifact that ends the meeting. Real-time media has tight requirements that the rest of your traffic does not, and those requirements are what drive every QoS design decision.

This article walks through the latency, jitter, and loss budgets for voice and video, the standard DSCP markings, the LLQ configuration that protects them, the IP phone trust pattern, and the bandwidth math you use to size the priority queue. If you are deploying VoIP for the first time or troubleshooting a video conferencing complaint, this is the reference.

The Latency, Jitter, and Loss Budgets

Application	One-way latency	Jitter	Packet loss
VoIP (toll quality)	< 150 ms	< 30 ms	< 1 percent (preferably < 0.1 percent)
VoIP (acceptable)	150-300 ms	< 50 ms	< 3 percent
Video conferencing	< 200 ms	< 50 ms	< 1 percent
Streaming video (one-way)	< 5 s buffer	tolerated by jitter buffer	< 5 percent

The numbers come from ITU-T Recommendation G.114 (one-way transmission time) and various enterprise media studies. The 150 ms VoIP latency target is one-way - so the round trip should be under 300 ms. Most domestic networks comfortably meet this; international circuits can be tight.

Jitter is variation in latency between consecutive packets. A jitter buffer at the receiver smooths small jitter (typically 30-50 ms of buffer); jitter beyond the buffer causes audible artifacts. The QoS goal is to keep jitter below the receiver's buffer.

Packet loss matters more for voice than for streaming because voice has no retransmission - a lost packet is gone. Codecs do packet loss concealment up to a point but quality degrades fast above 1 percent.

Voice Marking and Bandwidth

The standard DSCP markings for voice and video:

Traffic	DSCP	Decimal	Treatment
VoIP RTP audio (voice payload)	EF	46	LLQ priority queue
Voice signaling (SIP, H.323)	CS3	24	CBWFQ guaranteed bandwidth
Video conferencing (Zoom, Teams, Webex)	AF41	34	CBWFQ with WRED
Streaming video (one-way)	CS5 or AF31	40 or 26	CBWFQ

VoIP bandwidth math. A single VoIP call with G.711 codec consumes:

G.711 payload: 64 kbps
RTP/UDP/IP overhead: ~16 kbps
Layer 2 overhead (Ethernet/MPLS): ~16 kbps
Total per call: ~96 kbps

For G.729 codec, the payload drops to 8 kbps but the overhead is fixed; total is ~24 kbps per call. For Opus (modern), the payload varies (6-510 kbps); typical voice usage is 24-48 kbps.

Sizing the priority queue: 100 simultaneous G.711 calls = 9.6 Mbps. Reserve 10-15 percent of WAN bandwidth for voice priority queue, which covers typical concurrent call counts comfortably.

IP Phone Trust Pattern

Cisco IP phones mark their own voice traffic with DSCP EF and CoS 5 by default. The access port should trust this marking via Cisco's conditional trust mechanism:

interface GigabitEthernet1/0/5
 switchport mode access
 switchport access vlan 10                ! Data VLAN (PC behind phone)
 switchport voice vlan 20                 ! Voice VLAN (phone)
 mls qos trust device cisco-phone         ! Conditional: trust if Cisco phone detected
 spanning-tree portfast
 spanning-tree bpduguard enable

The conditional trust uses CDP. If the switch sees a Cisco phone via CDP, trust applies. If the phone is unplugged, trust drops automatically and the port re-marks all incoming DSCP to 0.

For non-Cisco IP phones (which do not speak CDP), use absolute trust within the voice VLAN only:

interface GigabitEthernet1/0/5
 switchport mode access
 switchport access vlan 10
 switchport voice vlan 20
 mls qos trust dscp                       ! Absolute trust; relies on voice VLAN segregation

This is less secure - the PC behind the phone can theoretically mark its own DSCP - but works for non-CDP phones. Some platforms support voice-VLAN-only trust mechanisms; check your switch documentation.

Voice Signaling: CS3

SIP, SDP, H.323, and other voice control-plane traffic gets DSCP CS3 (24). Why a separate class? Voice signaling is bursty (call setup and teardown) and CPU-intensive on the call manager but not latency-critical the way RTP audio is. It belongs in a guaranteed-bandwidth CBWFQ class, not the priority queue.

If signaling were placed in the EF priority queue, a flood of misconfigured signaling (a call manager hiccup, an attack, a SIP loop) could starve the voice payload itself. Keep them separate.

Full Cisco IOS XE LLQ Configuration

! Classification
class-map match-any VOICE-RTP
 match dscp ef                            ! Trust the marking
class-map match-any VOICE-SIGNALING
 match dscp cs3
class-map match-any VIDEO-CONF
 match dscp af41
class-map match-any STREAMING-VIDEO
 match dscp cs5 af31
class-map match-any TRANSACTIONAL
 match dscp af21
class-map match-any SCAVENGER
 match dscp cs1

! Egress queueing policy
policy-map WAN-EGRESS
 class VOICE-RTP
  priority percent 10                     ! 10% strict priority
 class VOICE-SIGNALING
  bandwidth percent 5
 class VIDEO-CONF
  bandwidth percent 25
  random-detect dscp-based
 class STREAMING-VIDEO
  bandwidth percent 15
  random-detect
 class TRANSACTIONAL
  bandwidth percent 20
  random-detect
 class SCAVENGER
  bandwidth percent 1
 class class-default
  bandwidth percent 24
  fair-queue
  random-detect

! Apply on WAN egress
interface GigabitEthernet0/0/0
 description WAN to ISP
 service-policy output WAN-EGRESS

Verify with:

Router# show policy-map interface GigabitEthernet0/0/0
 GigabitEthernet0/0/0
  Service-policy output: WAN-EGRESS
    Class-map: VOICE-RTP (match-any)
      876543 packets, 124589376 bytes
      30 second offered rate 96000 bps, drop rate 0 bps
      Match: dscp ef (46)
      Priority: 10% (10000 kbps), burst bytes 250000, b/w exceed drops: 0
        Conform: 876543 packets / 124589376 bytes
        Exceed:  0 packets / 0 bytes

The "Exceed: 0 packets" line is what you want for the voice class. Anything above zero means the priority queue's built-in policer is dropping packets - your voice budget is too small or you have a runaway sender.

Codec Choice and Bandwidth Implications

Codec	Payload bandwidth	Total per call (with overhead)	Quality
G.711 (PCMU/PCMA)	64 kbps	~96 kbps	Toll quality; high CPU on transcoder
G.722	64 kbps	~96 kbps	Wideband (HD voice); same bandwidth as G.711
G.729	8 kbps	~24 kbps	Acceptable; significant compression artifacts
Opus (narrowband)	6-24 kbps	~22-40 kbps	Configurable; modern default
Opus (wideband)	20-48 kbps	~36-64 kbps	HD voice; default for WebRTC

The bandwidth-quality trade-off: G.711 is the safest for compatibility (every endpoint speaks it) but expensive on bandwidth. G.729 saves bandwidth significantly but requires DSP licenses on transcoders and quality is noticeably worse. Modern WebRTC apps default to Opus, which auto-tunes between narrowband and wideband based on link conditions.

Practical implication for QoS sizing: don't assume your priority queue size based on G.711 if your endpoints negotiate G.729 or Opus most of the time. Monitor actual usage; tune the percent allocation.

Video Conferencing Specifics

Modern video conferencing (Zoom, Teams, Webex) uses adaptive video that scales bandwidth based on link conditions. A typical Zoom 1080p call:

Audio: 16-64 kbps (Opus)
Video: 1.2-2.4 Mbps for HD, drops to 600-1200 kbps under congestion
Screen share: variable, can spike to 4-6 Mbps for high-resolution screens

Mark video conferencing as AF41 (DSCP 34) and place it in a guaranteed-bandwidth CBWFQ class with WRED. The class should have enough headroom for typical concurrent video calls but not so much that other traffic starves.

Sizing example: 50 simultaneous HD Teams calls at 2 Mbps each = 100 Mbps. On a 200 Mbps WAN that's 50 percent of capacity - allocate 50 percent to video conferencing class. On a 1 Gbps WAN that's 10 percent.

Voice in the Cloud Era: SaaS UC

Modern UC platforms (Microsoft Teams, Zoom Phone, Webex Calling) place all the call control in the cloud. The branch handles only the media (RTP) directly between endpoints. Implications for QoS:

Mark RTP at the access port (or the application client marks itself)
Internet break-out matters: the SD-WAN should steer voice over the lowest-jitter path to the cloud
Cloud-direct paths (Microsoft Direct Routing, Cisco Webex Calling) bypass much of the WAN; QoS focus shifts to the local edge
The cloud provider does not honor your DSCP markings; QoS ends at the SD-WAN cloud on-ramp

For SD-WAN voice QoS, see (article forthcoming on QoS in modern networks) and the SD-WAN cluster pillar.

Wireless Voice: WMM and 802.11 Access Categories

Voice on Wi-Fi uses 802.11e WMM (Wi-Fi Multimedia) with four access categories:

WMM Access Category	Maps to DSCP	Used for
AC_VO (Voice)	EF (46)	VoIP RTP
AC_VI (Video)	AF41 (34), CS5 (40)	Video conferencing, streaming
AC_BE (Best Effort)	BE (0)	Default
AC_BK (Background)	CS1 (8)	Scavenger

The Catalyst 9800 maps DSCP from incoming wired traffic into WMM categories on the air, and CoS-marked frames from wireless clients into DSCP for upstream forwarding. Auto QoS handles most of this automatically. See C9800 QoS Configuration: Auto QoS, DSCP Mapping, and Wireless Profiles for the wireless-specific configuration.

Troubleshooting Voice Quality

The diagnostic chain for "voice sounds bad":

Verify markings. Capture at the source. Is RTP actually marked DSCP EF? If not, classification is broken.
Verify trust. show mls qos interface on the access port. Is conditional trust working? If the port is untrusted, the marking is being stripped.
Verify queueing. show policy-map interface on each WAN egress. Is the priority class hitting? Is the class dropping (Exceed counter)?
Verify bandwidth math. How many concurrent calls? How much WAN does that consume? Is the priority percent allocation enough?
Verify path latency. traceroute and ping. Is the round-trip under 300 ms? Is jitter (variance) reasonable?

The most common cause of voice quality complaints: a misconfigured trust boundary letting non-voice traffic into the priority queue, starving real voice. Audit the access ports.

Summary

Voice and video QoS comes down to LLQ for voice (strict priority + built-in policer), CBWFQ for video and signaling (guaranteed bandwidth + WRED), and rigorous trust boundary discipline at the access edge. Standard markings (DSCP EF for RTP, CS3 for signaling, AF41 for video conferencing, CS5 for streaming) and standard bandwidth allocations (~10 percent for voice, ~25 percent for video conferencing) cover most enterprise scenarios.

The configuration is well-understood; the trust discipline is what fails in production. Audit access ports periodically, monitor priority queue drops, and verify codec choice matches your bandwidth math. Bookmark this article alongside the QoS cluster pillar and lab any change before pushing to production.

Voice and Video QoS on Cisco IOS XE