What AI model does the CC generator use for transcription?

The tool uses OpenAI's Whisper model running locally in your browser via WebAssembly. Whisper is a state-of-the-art speech recognition model trained on 680,000 hours of multilingual audio and supports over 99 languages.

Is my audio uploaded to a server?

No. The Whisper model runs entirely in your browser. Your audio never leaves your device. Transcription happens locally using your CPU/GPU, ensuring complete privacy.

What audio formats are supported?

The tool accepts MP3, MP4, M4A, WAV, OGG, OPUS, FLAC, and WebM audio files. For video files, only the audio stream is processed.

What caption formats can I export?

You can export captions as SRT (SubRip Subtitle), which is the most widely supported format for video players and video editors, or as VTT (WebVTT), which is the standard for web video (HTML5 element) and streaming platforms.

How accurate is the automatic transcription?

Whisper is highly accurate for clear speech in major languages. Accuracy depends on audio quality, background noise, speaker accents, and the language. Clear studio-quality recordings in English typically achieve near-perfect accuracy. Noisy or accented speech may require manual correction in the caption editor after generation.

Can I edit the captions after they are generated?

Yes. After transcription, you can edit the text of each caption segment, adjust the start and end timestamps, split or merge segments, and add or remove individual lines before exporting.

Is there a file size or duration limit?

There is no server-imposed limit, but very long audio files (over an hour) may take significant time to process depending on your device's processing power. The model runs locally so performance scales with your hardware.

Free AI Closed Caption Generator - SRT & VTT Subtitles Online

Need to generate subtitles for your videos or audio files? Our AI CC Generator uses OpenAI's Whisper model to automatically create SRT and VTT closed caption files with accurate timestamps. Fine-tune your output with professional formatting controls — set characters per line and lines per cue for broadcast-ready subtitles. Everything runs locally in your browser — no uploads, no accounts, complete privacy for your media.

What is a Closed Caption Generator and How Does It Work?

A closed caption generator converts spoken audio into timed text files that can be overlaid on video content. Unlike simple transcription, caption generators produce precisely timestamped segments formatted to industry standards — ready to import into video editors, upload to YouTube, or embed in web pages.

Our tool uses Whisper, OpenAI's state-of-the-art automatic speech recognition model, trained on 680,000 hours of multilingual audio data. It processes audio in 30-second chunks, generating text with precise start and end timestamps for each sentence segment. You can watch captions appear in real-time as they are decoded, then export in SRT or VTT format.

How to Generate Closed Captions: Step-by-Step Guide

Using our free AI subtitle generator takes just a few steps:

Select the spoken language: Choose the language being spoken in the audio from the dropdown (defaults to English)
Upload a file: Drag and drop an audio or video file into the drop zone, or click to browse
Watch live generation: The AI model loads on first use (cached for future visits), then processes your media — caption text appears in real-time with a progress indicator
Configure formatting: Choose SRT or VTT format, adjust characters per line (default 42) and lines per cue (default 2) for your target platform
Review and edit: Switch to the Editor tab to correct any errors in the generated captions
Export: Copy the captions to clipboard or save as an .srt/.vtt file — the suggested filename matches your source file for automatic subtitle detection by video players like VLC

SRT vs VTT: Which Subtitle Format Should You Use?

Our tool supports the two most widely used subtitle file formats. You can switch between them instantly without reprocessing — the same timestamp data is reformatted on the fly:

SRT (SubRip Text): The most universally supported subtitle format. Uses numbered entries with comma-separated milliseconds (00:00:01,500). Compatible with virtually all video players, editors, and platforms including YouTube, Premiere Pro, DaVinci Resolve, and VLC. Choose SRT when you need maximum compatibility.
VTT (WebVTT): The web-native subtitle format designed for HTML5 video. Uses a WEBVTT header and dot-separated milliseconds (00:00:01.500). Required for HTML5 <track> elements and commonly used on web platforms. Choose VTT when embedding subtitles in web pages or web applications.

Professional Subtitle Formatting Controls

Unlike basic caption generators, our tool gives you control over how your subtitles are formatted — essential for producing readable, professional-quality captions:

Characters per line (default: 42): Controls the maximum width of each subtitle line. The broadcast standard is 42 characters — wide enough to be readable but short enough to fit on screen without obscuring the video. Long sentences are automatically wrapped at word boundaries, and if a sentence exceeds the line limit, it is split into multiple cues with interpolated timestamps.
Lines per cue (default: 2): Controls how many lines each subtitle entry can have. Two lines is the TV and streaming standard. Use 1 line for a minimal, unobtrusive look, or 3 lines for dense content like lectures. When a wrapped sentence exceeds this limit, additional cues are created automatically with proportionally calculated timing.

These settings apply instantly to both the preview and the exported file — change them at any time without reprocessing.

Key Features of Our AI Caption Generator

Real-time streaming: See caption text appear as it is decoded — no waiting for the entire file to finish processing
Instant format switching: Toggle between SRT and VTT at any time — no reprocessing needed
Smart line wrapping: Long sentences are automatically split into multiple cues with interpolated timestamps, respecting your characters-per-line and lines-per-cue settings
Built-in editor: Switch to the Editor tab to fix errors, adjust text, or fine-tune your captions before exporting
Translate to English: Enable the "Translate to English" checkbox to generate English captions from non-English audio
VLC-compatible filenames: Save dialog suggests the same filename as your source video — place the .srt file next to your video and VLC picks it up automatically
Accurate timestamps: Each caption segment includes precise start and end times derived from the Whisper model's attention-based alignment

Common Use Cases for AI Caption Generation

Content creators, educators, and businesses use AI-generated subtitles for a wide range of purposes:

YouTube Videos: Generate SRT files and upload them as custom captions via YouTube Studio (Subtitles → Add Language → Upload file → "With timing"). Custom captions replace YouTube's often inaccurate auto-captions, improving accessibility, viewer retention, and SEO — YouTube indexes caption text for search rankings.
Social Media Content: Create subtitles for Instagram Reels, TikTok, and Facebook videos. Over 80% of social media videos are watched without sound — captions are essential for engagement.
E-Learning and Training: Add captions to educational videos, online courses, lectures, and corporate training materials for accessibility compliance and improved comprehension.
Podcasts and Webinars: Generate subtitle files for video podcasts and recorded webinars to improve accessibility and discoverability.
Video Editing Workflow: Import SRT/VTT files into Premiere Pro, DaVinci Resolve, Final Cut Pro, or CapCut as a starting point — much faster than typing subtitles from scratch.
Accessibility Compliance: Meet WCAG 2.1, ADA, and Section 508 requirements by providing captions for all video content on your website or application.
Foreign Language Translation: Generate English captions from foreign-language audio using the translate feature — useful for subtitling international content.
Local Video Playback: Save the .srt file with the same name as your video file — players like VLC, MPC-HC, and mpv automatically load matching subtitle files.

How the AI Caption Generation Pipeline Works

For technically curious users, here is a breakdown of what happens when you upload a file:

Step 1: Audio Extraction and Preprocessing

The uploaded file is decoded using the Web Audio API. For video files (MP4, WebM, MOV, AVI), the audio track is automatically extracted. The audio is resampled to 16kHz mono — the format Whisper expects — and converted to a Float32Array of PCM samples.

Step 2: Chunked Processing with Streaming

Long audio is automatically split into 30-second chunks with 5-second overlapping strides. As each chunk is processed, decoded words stream to the UI in real-time via the WhisperTextStreamer, so you see text appearing as it's generated.

Step 3: Whisper Inference with Timestamps

Each audio chunk is converted to a log-Mel spectrogram and fed through the Whisper encoder-decoder transformer. The model generates text tokens autoregressively with timestamp tokens, producing both the transcribed text and precise timing information for each sentence segment.

Step 4: Caption Formatting and Cue Splitting

The raw timestamped chunks are formatted into your selected output format (SRT or VTT). Long sentences are wrapped at word boundaries respecting the characters-per-line setting. When wrapped text exceeds the lines-per-cue limit, the chunk is split into multiple cues with proportionally interpolated timestamps — ensuring each cue displays at the correct time.

Understanding the Whisper AI Model

Our tool uses Whisper Base, a transformer-based encoder-decoder model optimized for browser deployment:

Architecture: Encoder-decoder transformer trained end-to-end on speech recognition, with log-Mel spectrogram input features
Model Size: Approximately 150 MB in quantized ONNX format — balancing accuracy and download size for browser use
Training Data: Trained on 680,000 hours of multilingual and multitask supervised data collected from the web
Language Support: Supports transcription in over 30 languages including English, Spanish, French, German, Chinese, Japanese, Korean, Russian, Arabic, and many more
Timestamp Precision: Generates sentence-level timestamps essential for accurate subtitle timing and cue splitting
Lazy Loading: The model only downloads when you first upload a file (not on page load), and is cached in your browser for instant access on future visits

Supported Audio and Video Formats

The tool accepts a wide range of media file formats:

Audio: MP3, WAV, OGG, FLAC, AAC, WMA, M4A, WebM audio
Video: MP4, WebM, MOV, AVI — audio track is automatically extracted for captioning

All audio is internally converted to 16kHz mono PCM format for optimal Whisper performance. The Web Audio API handles format conversion and resampling automatically.

Free Online Caption Generator: Privacy and Security

Complete Privacy Protection

Our free AI caption generator processes all inference locally in your browser using Transformers.js with WebGPU acceleration (WASM fallback). No audio or video is ever uploaded to servers, no cloud processing occurs, and no account is required. The Whisper model (~150 MB) is downloaded once and cached in your browser for instant access on all future visits.

AI Caption Generator vs Alternative Approaches

Approach	Pros	Cons	Best For
AI CC Generator (This Tool)	Fast, free, 30+ languages, private, formatting controls, SRT & VTT	May need manual correction for noisy audio	Quick captioning with privacy requirements
Manual Subtitling	Perfect accuracy, full timing control	Extremely slow (5-10x real-time), expensive	Professional broadcast or cinema subtitles
Cloud Caption Services	High accuracy, speaker labels, auto-punctuation	Audio uploaded to third-party servers, subscription costs	Enterprise use where privacy is not a concern
YouTube Auto-Captions	Free, automatic for uploaded videos	Only works on YouTube, limited export options, variable quality	YouTube-only content with low accuracy requirements

Frequently Asked Questions

How large is the AI model and how long does download take?

The Whisper model is approximately 150 MB. It only downloads when you first upload a file — not on page load. Download time depends on your connection speed — typically 15 seconds to a minute. After the first download, the model is cached in your browser and loads instantly on all subsequent visits.

How long does caption generation take?

On modern hardware with WebGPU, Whisper processes audio faster than real-time — a 60-second recording typically takes 5-10 seconds to caption. You can watch the text appear in real-time as it's being decoded, with a progress indicator showing overall completion.

Can I switch between SRT and VTT without reprocessing?

Yes. The format toggle instantly converts the same timestamp data between SRT and VTT formats. No reprocessing is needed — it is purely a formatting change. Your formatting settings (characters per line, lines per cue) are preserved across format switches.

What do the characters per line and lines per cue settings do?

Characters per line (default 42) controls how wide each subtitle line is — 42 is the broadcast standard. Lines per cue (default 2) controls how many lines each subtitle entry can have — 2 is standard for TV and streaming. When a sentence is too long, the tool automatically splits it into multiple cues with correctly interpolated timestamps.

Can VLC automatically load the generated subtitles?

Yes. When you save, the tool suggests the same filename as your source video with the .srt or .vtt extension. Place the subtitle file in the same folder as your video — VLC and most other video players will detect and load it automatically.

Can I translate audio to English captions?

Yes. Enable the "Translate to English" checkbox to have Whisper translate non-English speech directly into English captions with accurate timestamps. This is a built-in capability of the Whisper model.

Are my files uploaded anywhere?

No. Your media never leaves your device. All processing — audio decoding, AI inference, timestamp generation, and caption formatting — happens entirely within your browser. There is no server involved at any point.

Can I edit the generated captions?

Yes. Switch to the Editor tab to make corrections, adjust text, or refine the generated captions. The editor provides a separate editable copy — your original generated captions are preserved in the Captions tab.

What languages are supported?

The tool supports over 30 languages including English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, and many more. You must select the spoken language from the dropdown — the language you choose tells the AI what language to expect.

Does it work offline?

After the initial model download, the tool works with locally stored files without an internet connection. The model is cached in your browser storage.

A Note on Accuracy

AI caption generation produces highly accurate results for clear speech but is not perfect. Background noise, heavy accents, overlapping speakers, and domain-specific terminology may reduce accuracy. Use the built-in Editor to review and correct the captions for critical use cases. The formatting controls (characters per line, lines per cue) help ensure your subtitles meet professional display standards regardless of content.

Why Choose Our Free AI CC Generator?

Complete Privacy: All AI processing happens locally in your browser — media is never uploaded to any server
SRT & VTT Support: Industry-standard subtitle formats with instant switching
Professional Formatting: Configurable characters per line and lines per cue for broadcast-ready subtitles
Smart Cue Splitting: Long sentences automatically split into multiple cues with interpolated timestamps
State-of-the-Art AI: OpenAI Whisper model for high-accuracy speech recognition with timestamps
Real-time Streaming: Watch captions appear as they are decoded — no waiting for the entire file
30+ Languages: Generate captions in over 30 languages with translation to English
Built-in Editor: Fix errors and refine captions before exporting
VLC Auto-Detection: Matching filename suggestion for automatic subtitle loading in video players
No Account Required: No registration, no login, no usage limits
Audio & Video: Accepts audio files (MP3, WAV, OGG, FLAC) and video files (MP4, WebM, MOV)
WebGPU Accelerated: Uses GPU acceleration when available for faster processing
Model Caching: One-time download, instant loading on all future visits