YouTube Real-Time Bilingual Subtitles
A Chrome Extension and Go WebSocket backend system that seamlessly intercepts video audio and overlays real-time bilingual subtitles leveraging Deepgram STT and a Google/DeepL API failover mechanism.

Project Overview
This project is a decoupled real-time subtitle translation system designed to break down language barriers during YouTube videos and livestreams. It intercepts video audio seamlessly using the Web Audio API within a Chrome Extension, streams chunks via WebSockets to a Go backend for ultra-low-latency Speech-to-Text (STT) via Deepgram, and implements load-balancing/failover strategies across Google Translate and DeepL APIs. The translated dual-language subtitles are then rendered dynamically over the player.
Technical Challenges & Solutions
Frontend Dynamic Audio Interception and Buffered WebSocket Streaming
To achieve 'real-time' performance, the system must losslessly intercept the audio track from YouTube's HTML5 <video> tag and continuously transmit audio chunks to the backend, which is prone to memory leaks and stream stuttering.
Multi-Translation API Load Balancing and Failover
Free tiers of machine translation services (like Google Translate/DeepL) can easily hit rate limits when flooded with high-frequency short-sentence requests, causing immediate translation failures.
Dynamic Flicker-Free Subtitle Rendering
Because speech recognition results (STT) can continuously update mid-sentence (interim results), frequent DOM repainting causes severe subtitle flickering, ruining the viewing experience.
Architecture
The frontend (Manifest V3 extension) handles audio interception, WebSocket communication, and draggable, flicker-free subtitle rendering. The Go backend establishes a low-latency WebSocket connection, processes audio streams with Deepgram, and wraps multiple translation APIs using Round-Robin pooling for continuous language translation. The front-end persistently stores all translation histories.
Learnings
Developing this system thoroughly deepened my understanding of streaming media processing and the architectural difficulties of real-time applications. From solving sample rate conversions when intercepting YouTube's <video> tracks via the Web Audio API, to tuning the stability of WebSocket communication, and designing an auto-failover mechanism independent of a single translation provider. Successfully building a tool that adds lag-free bilingual subtitles to live streams was immensely rewarding.