QoS

QoS Queueing, Policing, and Shaping Compared

LLQ, CBWFQ, policing, and shaping each do something different. The three primitives, when to use which, hierarchical shaping for sub-rate WAN handoffs, and the production patterns.
QoS Queueing, Policing, and Shaping Compared
Table of Contents
In: QoS

Queueing, policing, and shaping are the three traffic-management primitives every Cisco QoS deployment uses. They sound similar in casual conversation - "they limit traffic, right?" - but each does something different and they are not interchangeable. Confusing them is the second most common QoS mistake (after trust-boundary failures), and the symptoms are subtle: drops in unexpected places, queues growing without bound, traffic that gets policed before it gets shaped.

This article walks through what each primitive does, where it applies, when to use which, the Cisco IOS XE configuration, and the standard production patterns. If you are configuring QoS for the first time or auditing a deployment that "feels off," this is the reference.

The Three Primitives

Queueing
What it does
Decides the order packets leave a congested interface
Excess traffic
Stays in queues until served (or dropped if queue full)
Where it applies
Egress only (output direction)
Policing
What it does
Enforces a hard rate limit; drops or re-marks excess
Excess trafficDropped (or re-marked)
Where it appliesIngress or egress
Shaping
What it does
Smooths bursts to fit a target rate; buffers excess
Excess trafficBuffered and delayed
Where it appliesEgress only

Queueing answers "in what order do these waiting packets leave?" Policing and shaping both answer "how do we keep traffic at or below rate X?" but with very different mechanics: policing drops, shaping buffers.

Queueing: Scheduling on Congested Interfaces

An interface is "congested" when more packets want to leave than the interface can transmit in a given time slice. Without congestion, queueing does not matter - packets transmit in arrival order. With congestion, the queueing scheduler chooses what leaves next.

Cisco supports several queueing schedulers, in order of historical introduction:

FIFO (First-In-First-Out)
Behavior
Single queue; no priority
Use in 2026
Default for uncongested interfaces; never on production WAN
Priority Queueing (PQ)
Behavior
Four queues; higher always served first; can starve lower
Use in 2026Legacy; deprecated
Custom Queueing (CQ)
Behavior
16 queues with byte-count round-robin
Use in 2026Legacy; deprecated
Weighted Fair Queueing (WFQ)
Behavior
Per-flow queues with proportional service based on weight
Use in 2026
Default on slow serial; rarely tuned in 2026
Class-Based Weighted Fair Queueing (CBWFQ)
Behavior
Per-class queues with configured bandwidth guarantees
Use in 2026
Modern default for non-real-time classes
Low-Latency Queueing (LLQ)
Behavior
CBWFQ plus a strict-priority queue with built-in policer
Use in 2026
Modern default when voice or video shares with data

LLQ is the dominant queueing strategy in modern Cisco deployments. It gives voice a strict priority queue (lowest latency, lowest jitter) but applies a built-in policer to prevent the priority queue from starving everything else if voice traffic explodes. The remaining bandwidth divides among other classes via CBWFQ proportional service.

LLQ Configuration

policy-map WAN-EGRESS
 class VOICE
  priority percent 10                  ! Strict priority, capped at 10%
 class VIDEO
  bandwidth percent 30                 ! Guaranteed 30%
  random-detect dscp-based             ! WRED
 class TRANSACTIONAL
  bandwidth percent 25                 ! Guaranteed 25%
  random-detect
 class SCAVENGER
  bandwidth percent 1                  ! Tiny guarantee
 class class-default
  bandwidth percent 34
  fair-queue
  random-detect

interface GigabitEthernet0/0/0
 service-policy output WAN-EGRESS

The percentages must sum to no more than 100. The priority statement implicitly counts against the total. If you only specify priority without a percentage, the priority queue is unbounded - never do this in production; a misbehaving SIP gateway can starve everything else.

Bandwidth Statements

Three forms exist:

bandwidth percent X
X percent of the interface bandwidth (or the parent shaper rate in hierarchical policies)
bandwidth X (kbps)
Absolute kbps guarantee
bandwidth remaining percent X
X percent of bandwidth remaining after priority queues are accounted for

The remaining-percent form is useful when priority bandwidth varies (e.g. voice scales with call count). The classes that take "remaining" do not need to be re-tuned every time the priority allocation changes.

WRED: Graceful Degradation

WRED (Weighted Random Early Detection) randomly drops packets from a queue as it approaches full, biased toward higher drop precedence. The benefit: TCP flows respond to dropped packets by slowing down, which empties the queue gracefully rather than letting it fill and tail-drop everything.

WRED works best on classes with lots of TCP traffic. Voice and video (UDP) do not benefit because UDP does not back off; for those classes, just rely on the priority queue's policer.

class TRANSACTIONAL
 bandwidth percent 25
 random-detect dscp-based             ! Drop AF23 first, then AF22, then AF21

Policing: Hard Rate Limits

A policer enforces a maximum rate. Packets above the rate are dropped immediately or re-marked to a lower-priority DSCP. Cisco implementations use a token-bucket algorithm:

  • Tokens accumulate in a bucket at the configured rate (e.g. 10 Mbps).
  • Each arriving packet consumes tokens proportional to its size.
  • If enough tokens are available, the packet conforms (passes).
  • If not enough tokens, the packet exceeds (drop or re-mark, depending on config).
  • Bucket has a burst size that allows short bursts above rate; refills constantly at the rate.

Configuration:

policy-map RATE-LIMIT-INBOUND
 class class-default
  police 10000000 1500000               ! 10 Mbps with 1.5 MB burst
   conform-action transmit
   exceed-action drop

interface GigabitEthernet0/0/0
 service-policy input RATE-LIMIT-INBOUND

Policing's superpower: instant rate enforcement without buffering. Use it for:

  • SLA enforcement at network boundaries (carriers police their customers)
  • Subscriber rate plans (ISP enforcing a customer's contracted rate)
  • Protection against traffic floods (per-source-IP policers)
  • The built-in policer on LLQ priority queues (prevents voice from running away)

Policing's weakness: dropping perfectly good packets. TCP responds by retransmitting and reducing congestion window. The throughput of TCP traffic against a policer is significantly lower than the policed rate. If you can shape instead of police for non-bursty traffic, do.

Re-marking Instead of Dropping

A common pattern: instead of dropping excess, re-mark to a lower-priority class:

class SCAVENGER
 police 5000000
   conform-action set-dscp-transmit cs1   ! Confirm: keep CS1
   exceed-action set-dscp-transmit cs0    ! Exceed: re-mark to BE

Excess scavenger traffic is not dropped - just demoted. If the network has spare capacity, the demoted traffic still gets through. If the network is congested, the demoted traffic is the first to drop. This "soft policing" gives you the rate enforcement without the hard cliff.

Shaping: Smoothing Bursts

A shaper buffers excess traffic and releases it at the configured rate. Like a policer, it uses a token-bucket algorithm; unlike a policer, the action for excess is "buffer for later" instead of "drop."

Configuration:

policy-map SHAPE-WAN
 class class-default
  shape average 50000000               ! Shape to 50 Mbps average

interface GigabitEthernet0/0/0
 service-policy output SHAPE-WAN

Shape average smooths bursts to the configured rate. Shape peak (rarely used) allows bursting to a higher rate based on accumulated credit.

The classic shaping use case: your branch has a 1 Gbps physical interface but a 50 Mbps contracted Metro Ethernet handoff. Without shaping, you send bursts at 1 Gbps and the carrier polices the excess (drops). With shaping, you smooth output to 50 Mbps and the carrier never has to police; no drops, no packet loss.

Shaping has one obvious cost: latency. Buffered packets wait. For voice and other latency-sensitive traffic, this is a problem. The solution is hierarchical shaping.

Hierarchical Shaping: The Production Pattern

Hierarchical shaping wraps a queueing policy inside a shaping policy. The shaper smooths to the contracted rate; the inner queueing policy applies LLQ + CBWFQ within the shaped pipe.

policy-map CHILD-WAN-EGRESS
 class VOICE
  priority percent 30
 class VIDEO
  bandwidth percent 30
 class class-default
  bandwidth percent 40
  fair-queue

policy-map PARENT-SHAPER
 class class-default
  shape average 50000000               ! Shape to 50 Mbps
  service-policy CHILD-WAN-EGRESS      ! Apply child queueing inside

interface GigabitEthernet0/0/0
 service-policy output PARENT-SHAPER

This pattern dominates production WAN edges. The parent shapes to the contracted rate (so the carrier never polices). The child applies LLQ inside the shaped pipe, so voice still gets priority queueing - but inside the 50 Mbps shaped pipe, not in the 1 Gbps physical interface.

The percentages in the child policy are percentages of the parent shape rate, not the physical interface. priority percent 30 in the child means 30 percent of 50 Mbps = 15 Mbps reserved for the priority queue.

When to Use Each: A Decision Matrix

Voice and other real-time on a congested WAN
Use
LLQ with built-in policer
Why
Strict priority + protection against priority abuse
Multiple business apps competing for bandwidth
Use
CBWFQ with bandwidth guarantees
Why
Proportional service across classes
Sub-rate WAN handoff (1 Gbps interface, 50 Mbps contract)
UseHierarchical shaping
Why
Avoid carrier policing; preserve LLQ inside shaped pipe
SLA enforcement at network boundary
UsePolicing
Why
Hard rate limit with no buffering
Per-subscriber rate plans
Use
Policing or hierarchical shaping per subscriber
WhySimple at scale
TCP traffic with bursty sources
UseShaping (egress)
Why
Smooths bursts; preserves TCP throughput
Protection against UDP floods
UsePolicing
Why
Hard cap; UDP does not back off
Demote-not-drop excess scavenger traffic
Use
Policing with re-mark action
Why
Soft enforcement; uses spare capacity when available

Verification

! See the policy structure and counters
Router# show policy-map interface GigabitEthernet0/0/0
 GigabitEthernet0/0/0
  Service-policy output: PARENT-SHAPER
    Class-map: class-default
      shape (average) cir 50000000, bc 200000, be 200000
        target shape rate 50000000
        Service-policy : CHILD-WAN-EGRESS
          Class-map: VOICE (match-any)
            12345 packets, 1234567 bytes
            30 second offered rate 12000 bps, drop rate 0 bps
            Match: dscp ef (46)
            Priority: 30% (15000 kbps), burst bytes 375000, b/w exceed drops: 0

! See queue depth (TX-ring) on a specific interface
Router# show interfaces GigabitEthernet0/0/0
  Output queue: 0/40 (size/max)            ! Healthy
  Output queue: 38/40 (size/max)           ! Bordering on tail-drops

The "drop rate" line per class is the most important diagnostic. Zero drops with significant offered rate = the class has enough bandwidth. Drops growing = class is starved.

Anti-Patterns

  • Bare priority in LLQ. Always use priority percent X or priority X in kbps. The unbounded form lets a runaway flow destroy the rest of the policy.
  • Policing where shaping would do. If the source is under your control, shape (preserve TCP throughput). Police only at boundaries you cannot trust.
  • Forgetting hierarchical shaping for sub-rate handoffs. Without it, carrier polices, you get drops, voice quality suffers.
  • WRED on UDP-only classes. UDP does not back off; WRED becomes random drops with no benefit. Tail-drop is fine for UDP-heavy classes.
  • Bandwidth percentages summing to over 100. Cisco may accept the config but behavior is unpredictable.

Summary

Queueing schedules order on congested interfaces. Policing drops excess. Shaping buffers excess. The three are complementary, not interchangeable. LLQ is the modern queueing default; hierarchical shaping with LLQ inside is the production WAN pattern; policing is for SLA enforcement and protecting priority queues from abuse.

Master the LLQ + CBWFQ + hierarchical shaping triple, and you have covered 90 percent of production QoS configurations. Bookmark this article alongside the QoS cluster pillar and the Cisco MQC walkthrough; lab every change before pushing to production. The penalty for misconfigured queueing/policing/shaping is voice quality issues that only show up under load.

Written by
More from Ping Labz
QoS for VoIP across MPLS WAN feature image, PingLabz
QoS

QoS for VoIP Across an MPLS WAN

End-to-end QoS for VoIP across an MPLS L3VPN: DSCP marking, the LLQ + CBWFQ policy at the CE edge, and the DSCP-to-MPLS-EXP mapping the carrier needs.
Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to Ping Labz.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.