Voice and video are the use cases QoS exists for. Without QoS, a 1-percent packet loss rate on a TCP file transfer is invisible to the user; on a VoIP call, it is a robotic-sounding artifact that ends the meeting. Real-time media has tight requirements that the rest of your traffic does not, and those requirements are what drive every QoS design decision.
This article walks through the latency, jitter, and loss budgets for voice and video, the standard DSCP markings, the LLQ configuration that protects them, the IP phone trust pattern, and the bandwidth math you use to size the priority queue. If you are deploying VoIP for the first time or troubleshooting a video conferencing complaint, this is the reference.
The Latency, Jitter, and Loss Budgets
| Application | One-way latency | Jitter | Packet loss |
|---|---|---|---|
| VoIP (toll quality) | < 150 ms | < 30 ms | < 1 percent (preferably < 0.1 percent) |
| VoIP (acceptable) | 150-300 ms | < 50 ms | < 3 percent |
| Video conferencing | < 200 ms | < 50 ms | < 1 percent |
| Streaming video (one-way) | < 5 s buffer | tolerated by jitter buffer | < 5 percent |
The numbers come from ITU-T Recommendation G.114 (one-way transmission time) and various enterprise media studies. The 150 ms VoIP latency target is one-way - so the round trip should be under 300 ms. Most domestic networks comfortably meet this; international circuits can be tight.
Jitter is variation in latency between consecutive packets. A jitter buffer at the receiver smooths small jitter (typically 30-50 ms of buffer); jitter beyond the buffer causes audible artifacts. The QoS goal is to keep jitter below the receiver's buffer.
Packet loss matters more for voice than for streaming because voice has no retransmission - a lost packet is gone. Codecs do packet loss concealment up to a point but quality degrades fast above 1 percent.
Voice Marking and Bandwidth
The standard DSCP markings for voice and video:
| Traffic | DSCP | Decimal | Treatment |
|---|---|---|---|
| VoIP RTP audio (voice payload) | EF | 46 | LLQ priority queue |
| Voice signaling (SIP, H.323) | CS3 | 24 | CBWFQ guaranteed bandwidth |
| Video conferencing (Zoom, Teams, Webex) | AF41 | 34 | CBWFQ with WRED |
| Streaming video (one-way) | CS5 or AF31 | 40 or 26 | CBWFQ |
VoIP bandwidth math. A single VoIP call with G.711 codec consumes:
- G.711 payload: 64 kbps
- RTP/UDP/IP overhead: ~16 kbps
- Layer 2 overhead (Ethernet/MPLS): ~16 kbps
- Total per call: ~96 kbps
For G.729 codec, the payload drops to 8 kbps but the overhead is fixed; total is ~24 kbps per call. For Opus (modern), the payload varies (6-510 kbps); typical voice usage is 24-48 kbps.
Sizing the priority queue: 100 simultaneous G.711 calls = 9.6 Mbps. Reserve 10-15 percent of WAN bandwidth for voice priority queue, which covers typical concurrent call counts comfortably.
IP Phone Trust Pattern
Cisco IP phones mark their own voice traffic with DSCP EF and CoS 5 by default. The access port should trust this marking via Cisco's conditional trust mechanism:
interface GigabitEthernet1/0/5
switchport mode access
switchport access vlan 10 ! Data VLAN (PC behind phone)
switchport voice vlan 20 ! Voice VLAN (phone)
mls qos trust device cisco-phone ! Conditional: trust if Cisco phone detected
spanning-tree portfast
spanning-tree bpduguard enableThe conditional trust uses CDP. If the switch sees a Cisco phone via CDP, trust applies. If the phone is unplugged, trust drops automatically and the port re-marks all incoming DSCP to 0.
For non-Cisco IP phones (which do not speak CDP), use absolute trust within the voice VLAN only:
interface GigabitEthernet1/0/5
switchport mode access
switchport access vlan 10
switchport voice vlan 20
mls qos trust dscp ! Absolute trust; relies on voice VLAN segregationThis is less secure - the PC behind the phone can theoretically mark its own DSCP - but works for non-CDP phones. Some platforms support voice-VLAN-only trust mechanisms; check your switch documentation.
Voice Signaling: CS3
SIP, SDP, H.323, and other voice control-plane traffic gets DSCP CS3 (24). Why a separate class? Voice signaling is bursty (call setup and teardown) and CPU-intensive on the call manager but not latency-critical the way RTP audio is. It belongs in a guaranteed-bandwidth CBWFQ class, not the priority queue.
If signaling were placed in the EF priority queue, a flood of misconfigured signaling (a call manager hiccup, an attack, a SIP loop) could starve the voice payload itself. Keep them separate.
Full Cisco IOS XE LLQ Configuration
! Classification
class-map match-any VOICE-RTP
match dscp ef ! Trust the marking
class-map match-any VOICE-SIGNALING
match dscp cs3
class-map match-any VIDEO-CONF
match dscp af41
class-map match-any STREAMING-VIDEO
match dscp cs5 af31
class-map match-any TRANSACTIONAL
match dscp af21
class-map match-any SCAVENGER
match dscp cs1
! Egress queueing policy
policy-map WAN-EGRESS
class VOICE-RTP
priority percent 10 ! 10% strict priority
class VOICE-SIGNALING
bandwidth percent 5
class VIDEO-CONF
bandwidth percent 25
random-detect dscp-based
class STREAMING-VIDEO
bandwidth percent 15
random-detect
class TRANSACTIONAL
bandwidth percent 20
random-detect
class SCAVENGER
bandwidth percent 1
class class-default
bandwidth percent 24
fair-queue
random-detect
! Apply on WAN egress
interface GigabitEthernet0/0/0
description WAN to ISP
service-policy output WAN-EGRESSVerify with:
Router# show policy-map interface GigabitEthernet0/0/0
GigabitEthernet0/0/0
Service-policy output: WAN-EGRESS
Class-map: VOICE-RTP (match-any)
876543 packets, 124589376 bytes
30 second offered rate 96000 bps, drop rate 0 bps
Match: dscp ef (46)
Priority: 10% (10000 kbps), burst bytes 250000, b/w exceed drops: 0
Conform: 876543 packets / 124589376 bytes
Exceed: 0 packets / 0 bytesThe "Exceed: 0 packets" line is what you want for the voice class. Anything above zero means the priority queue's built-in policer is dropping packets - your voice budget is too small or you have a runaway sender.
Codec Choice and Bandwidth Implications
| Codec | Payload bandwidth | Total per call (with overhead) | Quality |
|---|---|---|---|
| G.711 (PCMU/PCMA) | 64 kbps | ~96 kbps | Toll quality; high CPU on transcoder |
| G.722 | 64 kbps | ~96 kbps | Wideband (HD voice); same bandwidth as G.711 |
| G.729 | 8 kbps | ~24 kbps | Acceptable; significant compression artifacts |
| Opus (narrowband) | 6-24 kbps | ~22-40 kbps | Configurable; modern default |
| Opus (wideband) | 20-48 kbps | ~36-64 kbps | HD voice; default for WebRTC |
The bandwidth-quality trade-off: G.711 is the safest for compatibility (every endpoint speaks it) but expensive on bandwidth. G.729 saves bandwidth significantly but requires DSP licenses on transcoders and quality is noticeably worse. Modern WebRTC apps default to Opus, which auto-tunes between narrowband and wideband based on link conditions.
Practical implication for QoS sizing: don't assume your priority queue size based on G.711 if your endpoints negotiate G.729 or Opus most of the time. Monitor actual usage; tune the percent allocation.
Video Conferencing Specifics
Modern video conferencing (Zoom, Teams, Webex) uses adaptive video that scales bandwidth based on link conditions. A typical Zoom 1080p call:
- Audio: 16-64 kbps (Opus)
- Video: 1.2-2.4 Mbps for HD, drops to 600-1200 kbps under congestion
- Screen share: variable, can spike to 4-6 Mbps for high-resolution screens
Mark video conferencing as AF41 (DSCP 34) and place it in a guaranteed-bandwidth CBWFQ class with WRED. The class should have enough headroom for typical concurrent video calls but not so much that other traffic starves.
Sizing example: 50 simultaneous HD Teams calls at 2 Mbps each = 100 Mbps. On a 200 Mbps WAN that's 50 percent of capacity - allocate 50 percent to video conferencing class. On a 1 Gbps WAN that's 10 percent.
Voice in the Cloud Era: SaaS UC
Modern UC platforms (Microsoft Teams, Zoom Phone, Webex Calling) place all the call control in the cloud. The branch handles only the media (RTP) directly between endpoints. Implications for QoS:
- Mark RTP at the access port (or the application client marks itself)
- Internet break-out matters: the SD-WAN should steer voice over the lowest-jitter path to the cloud
- Cloud-direct paths (Microsoft Direct Routing, Cisco Webex Calling) bypass much of the WAN; QoS focus shifts to the local edge
- The cloud provider does not honor your DSCP markings; QoS ends at the SD-WAN cloud on-ramp
For SD-WAN voice QoS, see (article forthcoming on QoS in modern networks) and the SD-WAN cluster pillar.
Wireless Voice: WMM and 802.11 Access Categories
Voice on Wi-Fi uses 802.11e WMM (Wi-Fi Multimedia) with four access categories:
| WMM Access Category | Maps to DSCP | Used for |
|---|---|---|
| AC_VO (Voice) | EF (46) | VoIP RTP |
| AC_VI (Video) | AF41 (34), CS5 (40) | Video conferencing, streaming |
| AC_BE (Best Effort) | BE (0) | Default |
| AC_BK (Background) | CS1 (8) | Scavenger |
The Catalyst 9800 maps DSCP from incoming wired traffic into WMM categories on the air, and CoS-marked frames from wireless clients into DSCP for upstream forwarding. Auto QoS handles most of this automatically. See C9800 QoS Configuration: Auto QoS, DSCP Mapping, and Wireless Profiles for the wireless-specific configuration.
Troubleshooting Voice Quality
The diagnostic chain for "voice sounds bad":
- Verify markings. Capture at the source. Is RTP actually marked DSCP EF? If not, classification is broken.
- Verify trust.
show mls qos interfaceon the access port. Is conditional trust working? If the port is untrusted, the marking is being stripped. - Verify queueing.
show policy-map interfaceon each WAN egress. Is the priority class hitting? Is the class dropping (Exceed counter)? - Verify bandwidth math. How many concurrent calls? How much WAN does that consume? Is the priority percent allocation enough?
- Verify path latency. traceroute and ping. Is the round-trip under 300 ms? Is jitter (variance) reasonable?
The most common cause of voice quality complaints: a misconfigured trust boundary letting non-voice traffic into the priority queue, starving real voice. Audit the access ports.
Summary
Voice and video QoS comes down to LLQ for voice (strict priority + built-in policer), CBWFQ for video and signaling (guaranteed bandwidth + WRED), and rigorous trust boundary discipline at the access edge. Standard markings (DSCP EF for RTP, CS3 for signaling, AF41 for video conferencing, CS5 for streaming) and standard bandwidth allocations (~10 percent for voice, ~25 percent for video conferencing) cover most enterprise scenarios.
The configuration is well-understood; the trust discipline is what fails in production. Audit access ports periodically, monitor priority queue drops, and verify codec choice matches your bandwidth math. Bookmark this article alongside the QoS cluster pillar and lab any change before pushing to production.