What Is QoS (Quality of Service)?

Updated 8 hours ago

Every network makes implicit promises. When you start a video call, your device expects packets to arrive within milliseconds, in order, without gaps. When you download a file, your device expects all the bytes to eventually show up, but doesn't much care when.

These promises conflict. A network carrying both a video call and a large download can't always honor both—not because it lacks the technology, but because it lacks the capacity. Something has to give.

Quality of Service (QoS) is how networks decide which promises to keep and which to break.

The Triage Problem

Without QoS, networks use first-come, first-served. Packets queue up and get processed in arrival order. This seems fair until you realize it treats a voice packet—which becomes useless if delayed 200 milliseconds—exactly the same as a backup file chunk that nobody will notice arriving a second later.

The result: your voice call gets choppy because it's waiting behind a bulk transfer that doesn't care about waiting.

Different applications have genuinely different needs:

Voice and video conferencing need packets to arrive within about 150 milliseconds, with minimal variation in timing (jitter), and almost no loss. They use relatively little bandwidth but are destroyed by delay.

Streaming video needs consistent bandwidth but tolerates latency because players buffer several seconds ahead. A 4K stream might need 25 Mbps sustained, but 200ms of delay is invisible.

File transfers and backups need bandwidth but don't care about latency or jitter. Whether a transfer takes 10 seconds or 15 seconds rarely matters.

Online gaming needs extremely low latency (under 50ms) but uses almost no bandwidth—often less than 100 Kbps.

QoS is triage. When demand exceeds capacity, something has to wait. QoS decides what.

How Triage Works

QoS isn't one technology—it's a continuous process with four stages:

1. Classification: What Is This?

First, the network must identify what each packet is. Classification examines:

Source and destination addresses and ports
Protocol type
DSCP values already marked in the packet header
Application signatures (deep packet inspection)
VLAN tags

A packet heading to port 5060 is probably SIP signaling (voice). A packet from your backup server is probably bulk data. A packet already marked with DSCP 46 is claiming to be voice traffic.

2. Marking: Tag It

Once classified, packets get tagged with priority indicators that downstream devices will recognize. The standard mechanism is DSCP (Differentiated Services Code Point)—6 bits in the IP header providing 64 possible values.

In practice, most networks use a handful of classes:

EF (Expedited Forwarding, DSCP 46): Voice. Guaranteed low latency.
AF4x (Assured Forwarding class 4): Video conferencing.
AF3x: Critical business applications.
AF2x: General traffic.
Default (DSCP 0): Best effort. No promises.

Marking typically happens at the network edge. Your enterprise router marks traffic as it enters, and every internal device honors those marks.

3. Queuing: Separate the Lines

Instead of one queue where everything waits together, QoS creates multiple queues:

A priority queue for voice (always served first)
A high-priority queue for video
A normal queue for business traffic
A low-priority queue for bulk transfers

Packets go into queues based on their marks. Now voice packets aren't stuck behind backup traffic—they're in a different line entirely.

4. Scheduling: Who Goes Next?

With multiple queues, something must decide which queue gets served when. This is where the actual prioritization happens.

Strict Priority always empties higher-priority queues before touching lower ones. Voice always goes first. This guarantees minimal latency for priority traffic but can starve everything else if priority traffic is heavy.

Weighted Fair Queuing allocates bandwidth proportionally. High priority might get 50% of capacity, medium 30%, low 20%. Everything gets something, but priority traffic gets more.

Class-Based Weighted Fair Queuing combines both: strict priority for the most latency-sensitive traffic (voice), fair queuing for everything else. Voice always goes first, but video and data share the remaining bandwidth proportionally.

Policing and Shaping: Rate Limits

What happens when a traffic class exceeds its allocation?

Policing drops the excess immediately. If video is limited to 10 Mbps and tries to send 15 Mbps, the extra 5 Mbps is dropped. Harsh but simple.

Shaping buffers the excess and releases it when capacity allows. This smooths traffic bursts instead of dropping packets, trading slightly higher latency for fewer lost packets.

Policing is typically used at network boundaries to enforce contracts. Shaping is used internally to smooth traffic flows.

Layer 2 vs. Layer 3

QoS exists at multiple layers:

Layer 2 (Ethernet) uses 802.1p priority tags—3 bits providing 8 priority levels. Switches use these within a local network.

Layer 3 (IP) uses DSCP markings that survive across routed networks. This is the primary QoS mechanism for anything beyond a single switch.

Layer 4-7 examines ports and application signatures to classify traffic that wasn't marked by the source. This catches traffic from devices that don't mark their own packets.

The End-to-End Problem

Here's the uncomfortable truth: your QoS configuration only controls your network.

Once packets leave your edge router, they enter your ISP's network—and ISPs typically wipe your DSCP markings at the door. Your carefully marked voice traffic becomes generic best-effort the moment it leaves your premises.

This is like giving someone a VIP wristband that only works at your venue. The next venue doesn't recognize it.

End-to-end QoS requires cooperation across the entire path, which is why it works well within enterprise networks but poorly across the public Internet. Some ISPs offer QoS-aware services for business customers, but consumer Internet is essentially best-effort everywhere.

What QoS Cannot Do

QoS cannot create bandwidth. If your link is 100 Mbps and you're trying to send 150 Mbps, QoS determines which 100 Mbps gets through and which 50 Mbps waits or drops. It's still 100 Mbps.

QoS is also nearly invisible when you have spare capacity. If your network typically runs at 40% utilization, there's always room for everything. QoS only matters during congestion—which might be 5% of the time, but that 5% is when users notice.

Implementation Reality

Identify what matters. What applications are critical? What are their actual requirements for latency, jitter, and loss?

Measure what exists. Flow monitoring reveals what traffic you actually have and how much bandwidth each type consumes.

Design classes simply. Four to six classes usually suffice: voice, video, critical data, normal data, bulk. More classes means more complexity with diminishing returns.

Mark at the edge. Trust markings from your own devices. Remark or ignore markings from untrusted sources.

Monitor continuously. Watch queue depths and drops. If priority queues are frequently full, you don't have a QoS problem—you have a capacity problem.

The Scavenger Class

One underused QoS strategy: the scavenger class. This is the opposite of priority—traffic that explicitly gets bandwidth only when nothing else wants it.

Windows updates. Backup jobs. Peer-to-peer traffic. Software downloads. These can happen whenever, so mark them as scavenger and let them fill in the gaps. They'll still complete, just without competing with anything that matters.

Frequently Asked Questions About QoS

Was this page helpful?

😔

🤨

😃