Blog

What Is Low Latency Streaming? LL-HLS, WebRTC & CMAF

Q: What is the difference between low latency HLS and standard HLS?

Standard HLS packages video into segments of 6–10 seconds, creating a minimum latency of around 10–30 seconds end-to-end. LL-HLS uses partial segments of 200–400ms and preload hints to reduce this to 1–3 seconds, while maintaining CDN compatibility and full adaptive bitrate support.

Q: Is WebRTC better than LL-HLS?

It depends on the use case. WebRTC achieves sub-500ms latency — significantly faster than LL-HLS. But WebRTC does not work with standard CDN infrastructure, requires SFU servers to scale beyond a few hundred viewers, and lacks flexible ABR support. LL-HLS is almost always the better choice for large-scale broadcasts. WebRTC is better for small, interactive sessions where sub-second latency is a hard requirement.

Q: What is glass-to-glass latency?

Glass-to-glass latency refers to the total delay from a camera capturing an event to a viewer seeing it on their screen. It includes encode latency, packaging latency, CDN propagation, and player buffer time combined.

Q: Can LL-HLS work with a standard CDN?

Yes. LL-HLS is fully compatible with standard CDN delivery — this is one of its primary advantages over WebRTC. The CDN must support HTTP/2 (for preload hints) and must not introduce additional buffering at the edge. Most major CDNs support LL-HLS delivery natively.

Q: What is CMAF and how is it different from HLS?

CMAF (Common Media Application Format) is a packaging standard, not a delivery protocol. It defines a single container format that both HLS and DASH players can consume, eliminating the need to encode content twice. CMAF enables 3–5 second latency through chunked transfer encoding. LL-HLS builds on CMAF to achieve 1–3 second latency.

Q: What latency is acceptable for live sports streaming?

For live sports where viewers may also be watching on broadcast TV or following live commentary, latency above 10 seconds creates a spoiler risk from social media or second-screen apps. LL-HLS at 1–3 seconds is generally considered the minimum acceptable standard for premium sports streaming in 2026. For betting and real-time fantasy sports integrations, the bar is lower — even 3–5 seconds is often sufficient.

When millions of viewers watched the 2022 World Cup final through streaming platforms, some experienced the winning goal 30 seconds after others saw it on broadcast TV. That gap — invisible to most viewers until they are spoiled by a notification — is the latency problem that the streaming industry has spent years trying to solve.

Low latency streaming is no longer a niche technical concern. With FIFA World Cup 2026 approaching, live sports betting, real-time auctions, interactive commerce, and social live streaming all creating audiences that demand near-real-time delivery, the question of which protocol to use — and how to architect your live streaming infrastructure around it — has direct business consequences.

This guide explains what latency is in a streaming context, how it is measured, what causes it, and how the three main low latency protocols — LL-HLS, WebRTC, and CMAF — compare across latency, scale, CDN compatibility, and use case fit.

What Is Latency in Live Streaming?

In streaming, latency refers to the delay between an event happening at the camera and that event appearing on a viewer’s screen. This is commonly called glass-to-glass latency — the time from the camera lens to the viewer’s display.

Latency is measured in seconds or milliseconds and falls into four broadly understood tiers:

Latency Tier	Range	Typical Technology	Suitable For
Real-time	< 500ms	WebRTC	Video calls, interactive gaming, live auctions
Low Latency	1–3 seconds	LL-HLS, LL-DASH	Live sports, live commerce, social streaming
Reduced Latency	3–5 seconds	CMAF	Broadcast OTT, premium live events
Standard	6–30 seconds	Traditional HLS/DASH	VOD, replay, SVOD premieres

The right latency tier depends entirely on your use case. Not every platform needs sub-second delivery — and as we will cover, chasing sub-second latency when your use case only requires 3 seconds creates unnecessary architectural complexity and cost.

What Causes Latency in a Live Streaming Pipeline?

Latency does not come from a single source — it accumulates across every stage of the streaming pipeline. Understanding where it builds up is the only way to reduce it meaningfully.

Live streaming latency stack diagram — where delay is added from encoder to viewer screen — **The Latency Stack Where Delay Is Added in Live Streaming**

1. Encode Latency

The encoder converts raw video from a camera into a compressed digital stream. Modern encoders using H.264 or H.265 introduce 50–500ms of latency depending on configuration. Encoders optimized for low latency use faster preset settings at the expense of some compression efficiency.

2. Packaging / Segmentation Latency

The packager divides the encoded stream into segments that the CDN and player can request. Traditional HLS uses segments of 6–10 seconds — meaning the packager must wait for a full segment to complete before sending it. This is the single largest contributor to latency in standard HLS setups. LL-HLS and CMAF address this by introducing partial segments of 200–400ms that are transmitted as they are produced.

3. CDN Propagation Latency

Content must travel from your origin server to a CDN edge node close to the viewer. This is largely determined by network distance and the CDN’s PoP coverage. A well-configured CDN with an origin shield layer minimizes this to tens of milliseconds for most regions.

4. Player Buffer

The video player maintains a buffer — a small reserve of pre-downloaded segments — to protect against brief network interruptions. A larger buffer means more stable playback but higher latency. Most players default to 3–10 seconds of buffer. For low latency streaming, players like HLS.js and Shaka Player must be explicitly configured to reduce buffer targets, or the latency gains from the protocol are lost at the final step.

Key point: Glass-to-glass latency is the sum of all four stages. Optimizing only the protocol without tuning the encoder, packager, and player buffer often produces no real-world improvement.

What Is LL-HLS?

Low-Latency HLS (LL-HLS) is an extension of Apple’s HTTP Live Streaming protocol, standardized in 2019. It reduces the latency of standard HLS from 6–30 seconds down to 1–3 seconds by addressing the primary cause of HLS latency: large segment sizes.

Standard HLS requires the packager to finish an entire segment (typically 6 seconds) before delivering it to the CDN and player. LL-HLS introduces two mechanisms that break this constraint:

Partial segments (CMAF chunks): The stream is divided into chunks of 200–400ms that are published as they are produced, not when a full segment is complete.
Preload hints: The manifest tells the player which chunk to request next before it exists, using HTTP/2 to hold the connection open. The chunk is delivered the moment it is ready.

The result is that a viewer’s player is always requesting content that is only milliseconds from being produced — rather than waiting for a full segment to be packaged and delivered.

LL-HLS low latency HLS architecture diagram — how CMAF partial segments reduce streaming latency — **LL HLS Architecture How Partial Segments Reduce Latency**

Where LL-HLS works well

Large-scale live events: sports, concerts, awards shows with hundreds of thousands to millions of viewers
Live commerce and social live streaming where chat synchronization matters
OTT platforms that already use HLS and want to reduce latency without a full infrastructure overhaul
Any scenario where standard CDN delivery is required — LL-HLS is fully CDN-compatible

Where LL-HLS falls short

Sub-second latency requirements — LL-HLS reliably achieves 1–3s but rarely below 1s in production
Interactive applications where bidirectional communication is needed — LL-HLS is one-way broadcast only

What Is WebRTC?

WebRTC (Web Real-Time Communications) is an open standard developed for real-time peer-to-peer audio and video communication, built natively into all modern web browsers. It is the technology behind video conferencing tools, browser-based gaming, and interactive live events.

Unlike HLS-based protocols, WebRTC does not package content into segments. It transmits a continuous stream of encoded data directly between peers — or through a relay server — with no buffering for segment completion. This is why WebRTC consistently achieves sub-500ms glass-to-glass latency in production environments.

WebRTC architecture diagram — peer-to-peer vs SFU for live streaming scalability — **WebRTC Architecture P2P vs SFU for Live Streaming**

WebRTC topology options

WebRTC scales through two primary architectures:

Peer-to-peer (P2P): Direct connection between broadcaster and viewer. Works for very small groups (under 20 participants) with minimal infrastructure cost. Not viable for large audiences.
Selective Forwarding Unit (SFU): A server receives the stream and forwards it to many recipients. Maintains low latency while scaling to thousands of viewers. Requires specialized infrastructure — standard CDN caching does not apply.

Where WebRTC works well

Interactive live events: auctions, Q&A sessions, live shopping with real-time host interaction
Video conferencing and collaborative platforms
Esports and gaming with audience participation requirements
Scenarios where sub-500ms latency is a hard requirement

Where WebRTC falls short

Mass broadcast scale — SFU infrastructure becomes expensive and complex above ~50,000 viewers
Standard CDN delivery — WebRTC streams cannot be cached at CDN edge nodes in the traditional sense
ABR flexibility — WebRTC has limited adaptive bitrate configuration compared to HLS-based protocols
Codec flexibility — browsers constrain available codec options, limiting compression efficiency

What Is CMAF?

The Common Media Application Format (CMAF) is not a protocol but an encoding and packaging standard introduced by Apple and Microsoft. It defines a single, unified container format that can be used with both HLS and MPEG-DASH, eliminating the need to package content separately for each protocol.

CMAF achieves 3–5 second latency by enabling chunked transfer encoding — the same partial segment delivery mechanism used by LL-HLS. A single CMAF-packaged stream can be delivered via both HLS and DASH players without re-encoding, making it highly efficient for platforms that need to serve a wide range of devices and players.

Where CMAF works well

Multi-format OTT delivery where content must reach HLS (Apple/iOS) and DASH (Android, Smart TVs) devices from a single source
Broadcast and premium OTT where 3–5 second latency is acceptable
Reducing encoding and storage costs — one package, two delivery formats
CDN-friendly delivery at scale

Where CMAF falls short

Not suitable for truly low latency use cases below 3 seconds without LL-HLS or WebRTC extensions
Player support can vary — not all CMAF configurations are consistently supported across older devices

Protocol Comparison: Latency Benchmarks and Key Characteristics

The following table reflects production benchmarks from 2025–2026 deployments, not theoretical minimums. Actual numbers vary based on encoder configuration, CDN setup, and player buffer settings.

	LL-HLS	WebRTC	CMAF
Typical latency	1–3 seconds	< 500ms	3–5 seconds
Best-case latency	~1 second (tuned)	80–200ms	~2.5 seconds
CDN compatible	✓ Yes — fully	✗ No (SFU required)	✓ Yes — fully
Max scale	Millions of viewers	~10K–50K (SFU)	Millions of viewers
ABR support	✓ Full ABR ladder	Limited	✓ Full ABR ladder
Browser native	✓ All modern browsers	✓ Built-in	✓ All modern browsers
Bidirectional	✗ One-way only	✓ Yes	✗ One-way only
Codec flexibility	H.264, H.265, AV1	H.264, VP8/VP9 (limited)	H.264, H.265, AV1
Infrastructure cost	Low–Medium (standard CDN)	High (SFU servers)	Low–Medium (standard CDN)
Best use case	Live sports, OTT, live commerce	Video calls, interactive events	Broadcast OTT, multi-format

LL-HLS vs WebRTC vs CMAF streaming protocol comparison — latency, scale, and CDN compatibility — **Protocol Comparison Infographic LL HLS vs WebRTC vs CMAF**

How CDN Architecture Interacts With Each Protocol

One of the most overlooked aspects of the low latency protocol decision is CDN compatibility. The protocols behave very differently when placed behind a CDN — and this affects both latency and cost at scale.

LL-HLS and CDN

LL-HLS is designed for CDN delivery. Partial segments are small HTTP objects that CDN edge nodes cache and serve just as they do with standard HLS segments. A well-configured CDN with origin shield can serve millions of LL-HLS viewers simultaneously while the origin only handles a fraction of the total requests. The key configuration requirement is that the CDN must support HTTP/2 (for preload hints) and must not buffer partial segments before forwarding them.

Pairing LL-HLS with delivery acceleration and an efficient transcoder that produces a proper ABR ladder gives you the full low latency stack — fast encode → small segments → cached edge delivery → optimized player.

WebRTC and CDN

WebRTC does not integrate with traditional CDN infrastructure. Standard CDN edge nodes cache HTTP objects — but WebRTC streams are transmitted over UDP using SRTP (not HTTP), making them uncacheable by conventional CDN architecture.

At scale, WebRTC requires a network of SFU servers rather than CDN edge nodes. These SFU servers receive the stream and forward it to many recipients, but they are not the same as a CDN — they do not cache content, they relay it in real time. This makes WebRTC infrastructure significantly more expensive to scale than LL-HLS.

CMAF and CDN

Like LL-HLS, CMAF is fully CDN-compatible. CMAF chunks are HTTP objects that edge nodes cache normally. The key advantage is that a single CMAF stream can be served to both HLS players (Apple devices) and DASH players (Android, smart TVs) from the same cached content, reducing CDN storage and egress overhead compared to maintaining two separate HLS and DASH streams.

Which Protocol Should You Use?

The answer depends on three factors: how low your latency requirement actually is, how large your audience will be, and whether your viewers need to interact or just watch.

Use Case	Protocol	Reason
Live sports — millions of viewers	LL-HLS	CDN-friendly, 1–3s acceptable, handles massive scale
Live sports betting / real-time scores	LL-HLS	1–3s usually sufficient; WebRTC too complex at this scale
Interactive live auction / Q&A (< 10K)	WebRTC	Sub-second required; audience small enough for SFU
Large interactive event (10K–50K)	WebRTC + SFU	Latency-critical; requires SFU infra investment
OTT broadcast — global multi-device	CMAF	Single package, HLS + DASH, CDN-friendly, cost efficient
Live commerce / social streaming	LL-HLS	Chat sync at 1–3s acceptable; CDN scale needed
Video conferencing / collaboration	WebRTC	Bidirectional required; sub-500ms essential
VOD with replay of live events	CMAF or standard HLS	Latency irrelevant for VOD; efficiency matters more

Low latency streaming protocol decision flowchart — when to use LL-HLS vs WebRTC vs CMAF — **Protocol Decision Flowchart Which to Use for Your Use Case**

One practical note: many large-scale live platforms now use a hybrid approach — LL-HLS for the passive broadcast audience (millions of viewers), combined with WebRTC for a small interactive layer (guest speakers, Q&A panellists, interactive elements). Use the live transcoding calculator to estimate the encoding cost for your ABR ladder.

Summary: LL-HLS vs WebRTC vs CMAF

	LL-HLS	WebRTC	CMAF
Latency	1–3 seconds	< 500ms	3–5 seconds
CDN delivery	✓ Yes	✗ No	✓ Yes
Viewer scale	Millions	Thousands	Millions
Interactive	✗ No	✓ Yes	✗ No
Infrastructure cost	Low	High	Low
Best for	Sports, OTT, commerce	Calls, auctions	Broadcast, multi-format

Frequently Asked Questions

What is the difference between low latency HLS and standard HLS?