visar.log
Technical notes from building things
← all posts

TL Provider: Soft Refresh, IndexedDB Cache & the 404 Rewrite

The Problem

The video editor’s TL provider (a live streaming aggregator) had a brutal UX issue: every time you navigated away from TL to another provider (tango, fc2, sc) and back, the entire stream list was wiped and reloaded from scratch. initialize() cleared all state — videos, streamer map, co-streamer positions, scroll position — and showed a “Loading…” flash while hitting the API again.

For a list of 50-100 live streamers with resolved metadata, this was wasteful and disorienting. The list would rebuild in a different order as API responses and co-streamer resolutions arrived asynchronously.

The Architecture Before

  • Frontend: Svelte 5 SPA with a videoListStore (reactive state) holding videos[], streamerMap<alias, TlStreamer>, processedStreamIds, liveFilenameMap
  • Backend: Express proxy that resolves HLS master playlists from tango.me, rewrites segment URLs, and serves them locally
  • Provider switch: $effect watches the route param, calls initialize(provider) which zeroes everything, then loadTlStreams fetches from the API

The key data flow for TL: fetchStreams() returns { following, recommended } arrays of TlStreamer objects, each with streamerId, alias, masterListUrl (the HLS master playlist on tango.me), and metadata. Co-streamers are discovered per-streamer via fetchMultiBroadcast(streamId).

Step 1: In-Memory Snapshot for Instant Restore

Module-level variable in a new tl-cache.ts service. saveTlSnapshot(store) captures videos, streamerMap, processedStreamIds, liveFilenameMap, and listIdentifiers. restoreTlSnapshot(store) puts them back.

The provider-switch $effect now saves before leaving TL and restores on return:

leaving TL → saveTlSnapshot(videoListStore)
entering TL (with snapshot) → initializeSoft('tl') → restoreTlSnapshot → softRefreshTlStreams
entering TL (no snapshot) → initialize('tl') → loadTlStreams (full load)

initializeSoft bumps the epoch (for stale-async detection) and sets the provider, but does NOT wipe data. The list appears instantly.

Step 2: IndexedDB for liveUrl Persistence

The most important piece of metadata is the liveUrl — the resolved 720p sub-playlist URL derived from the masterListUrl. Resolving it requires a backend round-trip that fetches the master playlist from tango.me with auth cookies, parses the HLS manifest for the 720p variant, and constructs the full URL.

New IndexedDB store (tl-cache db, streamers object store, keyed by streamerId):

{ streamerId, masterListUrl, liveUrl: string | null, cachedAt: number }

On soft refresh, the classification loop checks IDB: if a streamer has the same masterListUrl and a cached liveUrl, use it directly — no backend call needed.

Step 3: The Eager Walk — Co-Streamers + liveUrl in One Pass

Initially I had two separate fire-and-forget loops: fetchCoStreamersEagerly walked the list checking for co-streamers via fetchMultiBroadcast, and resolveAllLiveUrls walked the same list resolving liveUrls. They ran in parallel, independently.

Merged them into processStreamersEagerly — a single sequential pass that, for each streamer:

  1. Checks co-streamers (if streamId exists, gated by markStreamIdProcessed to avoid re-checking)
  2. Resolves liveUrl for each co-streamer found → stores in IDB
  3. Resolves liveUrl for the main streamer → stores in IDB
  4. 200ms delay, next streamer

One walk, everything gets cached politely.

Step 4: The 404 Bug — Don’t Check Local HLS

The scroll-triggered refreshTlStreams (fires when the user scrolls near the bottom) had a 404 HEAD check meant to detect stale proxy sessions. For each existing streamer, it did HEAD /hls/${alias}/playlist.m3u8.

The problem: that local HLS playlist only exists after the user clicks to play a streamer. Every unplayed streamer — which is most of them — returned 404. The code treated 404 as “dead stream”, removed it from its position, and re-added it at the bottom. Following streamers at the top of the list would jump to the middle/bottom on every refresh.

Replaced with a three-way classification based on masterListUrl comparison:

Fresh streamer vs. existing Action
New alias Append at bottom + eager walk
Same alias, different masterListUrl Different stream — remove old + append new
Same alias, same masterListUrl Skip — already live, liveUrl already cached

No HTTP checks. The fetchStreams() API response IS the liveness signal.

Step 5: The liveUrl Cache Principle

This was the key insight that emerged during debugging. The masterListUrl (master playlist) can 404 while the resolved liveUrl (720p sub-playlist) still serves segments. They’re different URLs with different lifecycles.

This means:

  1. Never use masterListUrl resolution as a liveness check — a null result from resolveLiveUrl doesn’t mean the stream is dead
  2. Never overwrite a cached liveUrl with null — the old liveUrl might still work
  3. Only delete IDB entries when the stream disappears from the API entirely — and even then, only after 24h

The 24h guard on removeCached and sweepOrphans protects against a stream momentarily disappearing from the API (flaky endpoint, pagination issues) while its liveUrl is still actively serving segments.

Enforcement

  • putCached checks: if the new liveUrl is null but an existing entry has a liveUrl, the write is silently skipped
  • processStreamersEagerly only calls putCached on successful resolution
  • removeCached reads cachedAt timestamp, skips if younger than 24h
  • sweepOrphans iterates all entries, only deletes orphans older than 24h

The Refresh Flow Summary (Before Rewrite)

Initial load (loadTlStreams):

fetchStreams() → build videos + streamerMap → fire-and-forget processStreamersEagerly

Soft refresh (returning from another provider):

restoreTlSnapshot → fetchStreams() → remove dead aliases → classify:
  new → append + processStreamersEagerly
  different masterListUrl → remove + append + processStreamersEagerly
  same masterListUrl + IDB cached liveUrl → use cache
  same masterListUrl + no cache → processStreamersEagerly
→ sweepOrphans (24h guard)

Scroll refresh (near bottom of list):

fetchStreams() → classify:
  new → append + processStreamersEagerly
  different masterListUrl → remove + append + processStreamersEagerly
  same masterListUrl → skip entirely

Files Changed

Backend (video-editor-backend):

  • tl-proxy.routes.ts — new POST /tl/resolve-live-url endpoint, reuses existing resolveLiveUrl() without creating proxy sessions

Frontend (video-editor-svelte):

  • tl-cache.ts (new) — IndexedDB wrapper + in-memory snapshot
  • constants.tsTL_API.RESOLVE_LIVE_URL, TL_PAGE config object
  • tl-api.tsliveUrl field on TlStreamer, resolveLiveUrl() function
  • videoList.svelte.tsinitializeSoft(), removeStreamers(), updateStreamerLiveUrl()
  • +page.svelte — soft refresh orchestration, processStreamersEagerly, snapshot save/restore, scroll refresh rewrite

Part 2: The 404 Black Screen Bug

Hours after shipping the soft refresh, a new problem surfaced: streams that had gone offline sat in the list as black screens. They weren’t being removed.

The root cause was a missing code path. The system was careful about protecting the liveUrl (never overwrite with null, 24h deletion guard), but it never checked the liveUrl. When a stream died on tango.me, the masterListUrl would 404 — but the code treated that as “transient, keep the cached liveUrl.” The cached liveUrl was also dead, but nobody was asking it.

First Fix: Wrong

Added a checkLiveUrl function — a backend endpoint that does a GET against the liveUrl on tango.me. Called it in processStreamersEagerly when resolveLiveUrl returned null: “masterListUrl failed, so check the cached liveUrl. If that also 404s, remove.”

Also added it to softRefreshTlStreams: “before restoring a cached liveUrl, verify it’s still alive.”

This was architecturally wrong for two reasons.

The Regression: Position Loss

The soft refresh liveUrl check was too aggressive. tango.me HLS endpoints may not behave identically on HEAD vs GET requests. The initial implementation used HEAD (later changed to GET), but the real problem was structural: checking every cached liveUrl on every provider switch meant any transient failure would yank followed streams from the top of the list. The removal + re-addition on the next refresh cycle put them in the middle as “new” streams.

Tried to fix it with hideStreamers() — removing from the video list but keeping the streamerMap entry to prevent re-addition. This was a band-aid over the wrong architecture.

The Correct Architecture: liveUrl as Source of Truth

The breakthrough was clarifying what “liveUrl is the source of truth” actually means in practice. It doesn’t mean “check the liveUrl during processing.” It means:

The only time a liveUrl gets checked is when it’s actually used. During video playback, the backend proxy fetches the liveUrl from tango.me to serve the m3u8 playlist. If tango.me returns 404 — a real, organic HTTP 404 — the stream is dead. That’s the signal. Not a proactive health check. Not a processing-time validation. The natural act of playing the video IS the liveness check.

The Rewrite

Ripped out the proactive checking and replaced it with three clear flows:

1. Processing (resolve + cache, never remove)

processStreamersEagerly resolves the liveUrl from the masterListUrl. If resolution succeeds, cache it. If it fails, fall back to the IDB cached liveUrl. No removal. Stream stays in the list — if it has no liveUrl, it’s unplayable until the next cycle resolves it. That’s fine.

2. 30-second refresh interval (duplicate detection)

Replaced scroll-triggered refresh with a 30s setInterval. The duplicate check is based on streamerId + masterListUrl: same pair → skip (stream stays exactly where it is), different → new stream, queue for processing. This is the fix for the position-loss bug — duplicates are never touched.

3. Video playback (organic 404 removal)

When HLS.js fires a 404 error on a TL stream, the frontend calls checkLiveUrl(liveUrl) — a real GET against tango.me. If tango.me confirms 404, the stream is removed from the video list, the streamerMap, and IndexedDB. If it’s alive (transient proxy issue), retry. The 404 has to come from tango.me, not from the local proxy.

The IDB Rules (Final)

Reduced to two sentences: store on successful resolution. Remove on 404 from tango.me (immediate) or when 24 hours old (sweep).

Part 3: The PWA Background Problem

A day after the rewrite, a new issue: leave the PWA in the background for a while, come back, and the old streams at the top of the list are black. The new ones added to the end work fine. The stale streams should be getting removed — they’re dead on tango.me — but they just sit there.

Three compounding failures

1. No visibility change detection. The browser suspends background tabs. When the tab wakes, the HLS.js instances are in a broken state, but nothing forces them to reload. The 30s setInterval resumes, but it only adds new streamers — it never re-checks existing ones.

2. loadStream early-returns on same filename. The video player caches which filename is loaded on each element via el.dataset.loadedFilename. When preloadAdjacent fires after waking, it calls loadStream for the same filenames — but loadStream sees the match and skips the reload. The broken HLS.js instances from before the sleep persist.

3. Passive player 404s are silently swallowed. The HLS error handler had if (!isActivePlayer) { hls.destroy(); return; } — preloaded videos that got 404s just destroyed their HLS instance without triggering any removal. Only the active player ran the checkLiveUrl → remove flow. The “three videos loaded at once means three videos checked at once” assumption was wrong.

Net result: video shows black, no 404 fires because nothing reloads, even if it did only the active player would act on it.

The fix: replace the timer with a processing queue

The 30s timer was fundamentally the wrong abstraction. It said “every 30 seconds, check for new streams from the API.” What we actually needed was “continuously process all streams, prioritizing new ones from the API.”

New architecture — a single async loop that runs while on TL:

while on TL:
  Phase 1: fetch endpoint → process new/changed (resolve liveUrl + co-streamers)
  Phase 2: reprocess existing (check liveUrls against tango.me)
  wait minimum 30s from cycle start
  repeat

Phase 1 is identical to the old logic: hit the endpoint, diff against the current list, process anything new or changed.

Phase 2 is entirely new. For each existing streamer not just processed in Phase 1:

  1. Check the cached liveUrl against tango.me (checkLiveUrl). If alive → done, stream is fine.
  2. If dead → try resolving a new liveUrl from masterListUrl. If the new one is alive → update cache, keep the stream.
  3. Only if both the cached liveUrl AND the freshly resolved one are confirmed 404 → remove from list, memory, and IndexedDB.

The “both must be 404” rule is critical. The cached liveUrl is the source of truth. The endpoint liveUrl (from masterListUrl) is the fallback. A resolveLiveUrl returning null (can’t parse the master playlist) is NOT a 404 — we can’t confirm death, so the stream stays for 24h via the IDB sweep.

The active player is not special

The old architecture had a separate 404 handling path in the VideoPlayer: HLS.js fires 404 → checkLiveUrlonLiveUrlDead callback → remove stream → navigate to next. Passive players silently swallowed errors. Active players had their own removal logic.

This was wrong. The queue is the single authority on stream removal. The VideoPlayer’s job is to play whatever’s in the list, not to decide what belongs there.

New error handling for TL in VideoPlayer:

HLS 404 → destroy the HLS instance. That's it.

No checkLiveUrl, no onLiveUrlDead callback, no active/passive distinction. When the queue determines a stream is dead and removes it from the list, a reactive $effect in VideoPlayer detects the current video is gone and navigates to the first remaining video.

Session restore via IDB

One more gap: the OS kills the PWA (memory pressure, phone restart). The in-memory snapshot is gone. When the app restarts, it hits the endpoint and builds a fresh list. But processNewStreamer falls back to IndexedDB when resolveLiveUrl fails — and the IDB is full of stale liveUrls from the dead session.

The fix: before doing anything else, walk every IDB entry and checkLiveUrl against tango.me. Dead entries get purged immediately. By the time the endpoint fetch runs and processNewStreamer falls back to IDB, only live liveUrls remain.

This is Phase 0 of the queue — runs once on start, before initial processing or the loop.

Let the queue breathe

The initial implementation had a 30-second floor between cycles (REFRESH_GATE_MS). It was a safety net against hammering the endpoint. But with 200ms delays between each item, the queue already paces itself. Processing 80 streams takes ~16 seconds. Add the endpoint fetch and IDB operations, and a full cycle is naturally 20-30 seconds for a typical list. A small list (5 streams) cycles faster, but the endpoint can handle that — it’s a single GET.

Removed the artificial floor entirely. The queue runs at its natural pace: as fast as the network and the 200ms per-item delay allow.

Takeaway

Four lessons.

First: identify which piece of data is most valuable and protect it. The liveUrl outlives the masterListUrl that produced it. Build the cache around that.

Second: don’t proactively check what you can check organically. The proxy already hits the liveUrl during playback — that’s the real 404 signal. Adding a separate health-check loop introduced false positives, timing bugs, and position-loss regressions.

Third: a timer is the wrong abstraction when what you need is a work queue. The 30s timer only looked forward (new streams from the API), never backward (are existing streams still alive?). A continuous processing queue naturally covers both directions. New items from the API get priority, then the queue circles back to verify existing items. No stale state accumulates because the queue eventually reaches everything.

Fourth: persistent state needs startup validation. IndexedDB survives app restarts but the data it holds may not. Walking the cache on startup — before anything depends on it — turns stale persistence from a liability into a clean slate.