AI CC Generator
Upload an audio or video file to generate captions

AI CC Generator: Free Online SRT & VTT Closed Caption Creator


Need to generate subtitles for your videos or audio files? Our AI CC Generator uses OpenAI's Whisper model to automatically create SRT and VTT closed caption files with accurate timestamps. Fine-tune your output with professional formatting controls — set characters per line and lines per cue for broadcast-ready subtitles. Everything runs locally in your browser — no uploads, no accounts, complete privacy for your media.

What is a Closed Caption Generator and How Does It Work?

A closed caption generator converts spoken audio into timed text files that can be overlaid on video content. Unlike simple transcription, caption generators produce precisely timestamped segments formatted to industry standards — ready to import into video editors, upload to YouTube, or embed in web pages.

Our tool uses Whisper, OpenAI's state-of-the-art automatic speech recognition model, trained on 680,000 hours of multilingual audio data. It processes audio in 30-second chunks, generating text with precise start and end timestamps for each sentence segment. You can watch captions appear in real-time as they are decoded, then export in SRT or VTT format.

How to Generate Closed Captions: Step-by-Step Guide

Using our free AI subtitle generator takes just a few steps:

  1. Select the spoken language: Choose the language being spoken in the audio from the dropdown (defaults to English)
  2. Upload a file: Drag and drop an audio or video file into the drop zone, or click to browse
  3. Watch live generation: The AI model loads on first use (cached for future visits), then processes your media — caption text appears in real-time with a progress indicator
  4. Configure formatting: Choose SRT or VTT format, adjust characters per line (default 42) and lines per cue (default 2) for your target platform
  5. Review and edit: Switch to the Editor tab to correct any errors in the generated captions
  6. Export: Copy the captions to clipboard or save as an .srt/.vtt file — the suggested filename matches your source file for automatic subtitle detection by video players like VLC

SRT vs VTT: Which Subtitle Format Should You Use?

Our tool supports the two most widely used subtitle file formats. You can switch between them instantly without reprocessing — the same timestamp data is reformatted on the fly:

  • SRT (SubRip Text): The most universally supported subtitle format. Uses numbered entries with comma-separated milliseconds (00:00:01,500). Compatible with virtually all video players, editors, and platforms including YouTube, Premiere Pro, DaVinci Resolve, and VLC. Choose SRT when you need maximum compatibility.
  • VTT (WebVTT): The web-native subtitle format designed for HTML5 video. Uses a WEBVTT header and dot-separated milliseconds (00:00:01.500). Required for HTML5 <track> elements and commonly used on web platforms. Choose VTT when embedding subtitles in web pages or web applications.

Professional Subtitle Formatting Controls

Unlike basic caption generators, our tool gives you control over how your subtitles are formatted — essential for producing readable, professional-quality captions:

  • Characters per line (default: 42): Controls the maximum width of each subtitle line. The broadcast standard is 42 characters — wide enough to be readable but short enough to fit on screen without obscuring the video. Long sentences are automatically wrapped at word boundaries, and if a sentence exceeds the line limit, it is split into multiple cues with interpolated timestamps.
  • Lines per cue (default: 2): Controls how many lines each subtitle entry can have. Two lines is the TV and streaming standard. Use 1 line for a minimal, unobtrusive look, or 3 lines for dense content like lectures. When a wrapped sentence exceeds this limit, additional cues are created automatically with proportionally calculated timing.

These settings apply instantly to both the preview and the exported file — change them at any time without reprocessing.

Key Features of Our AI Caption Generator

  • Real-time streaming: See caption text appear as it is decoded — no waiting for the entire file to finish processing
  • Instant format switching: Toggle between SRT and VTT at any time — no reprocessing needed
  • Smart line wrapping: Long sentences are automatically split into multiple cues with interpolated timestamps, respecting your characters-per-line and lines-per-cue settings
  • Built-in editor: Switch to the Editor tab to fix errors, adjust text, or fine-tune your captions before exporting
  • Translate to English: Enable the "Translate to English" checkbox to generate English captions from non-English audio
  • VLC-compatible filenames: Save dialog suggests the same filename as your source video — place the .srt file next to your video and VLC picks it up automatically
  • Accurate timestamps: Each caption segment includes precise start and end times derived from the Whisper model's attention-based alignment

Common Use Cases for AI Caption Generation

Content creators, educators, and businesses use AI-generated subtitles for a wide range of purposes:

  • YouTube Videos: Generate SRT files and upload them as custom captions via YouTube Studio (Subtitles → Add Language → Upload file → "With timing"). Custom captions replace YouTube's often inaccurate auto-captions, improving accessibility, viewer retention, and SEO — YouTube indexes caption text for search rankings.
  • Social Media Content: Create subtitles for Instagram Reels, TikTok, and Facebook videos. Over 80% of social media videos are watched without sound — captions are essential for engagement.
  • E-Learning and Training: Add captions to educational videos, online courses, lectures, and corporate training materials for accessibility compliance and improved comprehension.
  • Podcasts and Webinars: Generate subtitle files for video podcasts and recorded webinars to improve accessibility and discoverability.
  • Video Editing Workflow: Import SRT/VTT files into Premiere Pro, DaVinci Resolve, Final Cut Pro, or CapCut as a starting point — much faster than typing subtitles from scratch.
  • Accessibility Compliance: Meet WCAG 2.1, ADA, and Section 508 requirements by providing captions for all video content on your website or application.
  • Foreign Language Translation: Generate English captions from foreign-language audio using the translate feature — useful for subtitling international content.
  • Local Video Playback: Save the .srt file with the same name as your video file — players like VLC, MPC-HC, and mpv automatically load matching subtitle files.

How the AI Caption Generation Pipeline Works

For technically curious users, here is a breakdown of what happens when you upload a file:

Step 1: Audio Extraction and Preprocessing

The uploaded file is decoded using the Web Audio API. For video files (MP4, WebM, MOV, AVI), the audio track is automatically extracted. The audio is resampled to 16kHz mono — the format Whisper expects — and converted to a Float32Array of PCM samples.

Step 2: Chunked Processing with Streaming

Long audio is automatically split into 30-second chunks with 5-second overlapping strides. As each chunk is processed, decoded words stream to the UI in real-time via the WhisperTextStreamer, so you see text appearing as it's generated.

Step 3: Whisper Inference with Timestamps

Each audio chunk is converted to a log-Mel spectrogram and fed through the Whisper encoder-decoder transformer. The model generates text tokens autoregressively with timestamp tokens, producing both the transcribed text and precise timing information for each sentence segment.

Step 4: Caption Formatting and Cue Splitting

The raw timestamped chunks are formatted into your selected output format (SRT or VTT). Long sentences are wrapped at word boundaries respecting the characters-per-line setting. When wrapped text exceeds the lines-per-cue limit, the chunk is split into multiple cues with proportionally interpolated timestamps — ensuring each cue displays at the correct time.

Understanding the Whisper AI Model

Our tool uses Whisper Base, a transformer-based encoder-decoder model optimized for browser deployment:

  • Architecture: Encoder-decoder transformer trained end-to-end on speech recognition, with log-Mel spectrogram input features
  • Model Size: Approximately 150 MB in quantized ONNX format — balancing accuracy and download size for browser use
  • Training Data: Trained on 680,000 hours of multilingual and multitask supervised data collected from the web
  • Language Support: Supports transcription in over 30 languages including English, Spanish, French, German, Chinese, Japanese, Korean, Russian, Arabic, and many more
  • Timestamp Precision: Generates sentence-level timestamps essential for accurate subtitle timing and cue splitting
  • Lazy Loading: The model only downloads when you first upload a file (not on page load), and is cached in your browser for instant access on future visits

Supported Audio and Video Formats

The tool accepts a wide range of media file formats:

  • Audio: MP3, WAV, OGG, FLAC, AAC, WMA, M4A, WebM audio
  • Video: MP4, WebM, MOV, AVI — audio track is automatically extracted for captioning

All audio is internally converted to 16kHz mono PCM format for optimal Whisper performance. The Web Audio API handles format conversion and resampling automatically.

Free Online Caption Generator: Privacy and Security

Complete Privacy Protection

Our free AI caption generator processes all inference locally in your browser using Transformers.js with WebGPU acceleration (WASM fallback). No audio or video is ever uploaded to servers, no cloud processing occurs, and no account is required. The Whisper model (~150 MB) is downloaded once and cached in your browser for instant access on all future visits.

AI Caption Generator vs Alternative Approaches

ApproachProsConsBest For
AI CC Generator (This Tool)Fast, free, 30+ languages, private, formatting controls, SRT & VTTMay need manual correction for noisy audioQuick captioning with privacy requirements
Manual SubtitlingPerfect accuracy, full timing controlExtremely slow (5-10x real-time), expensiveProfessional broadcast or cinema subtitles
Cloud Caption ServicesHigh accuracy, speaker labels, auto-punctuationAudio uploaded to third-party servers, subscription costsEnterprise use where privacy is not a concern
YouTube Auto-CaptionsFree, automatic for uploaded videosOnly works on YouTube, limited export options, variable qualityYouTube-only content with low accuracy requirements

Frequently Asked Questions

How large is the AI model and how long does download take?

The Whisper model is approximately 150 MB. It only downloads when you first upload a file — not on page load. Download time depends on your connection speed — typically 15 seconds to a minute. After the first download, the model is cached in your browser and loads instantly on all subsequent visits.

How long does caption generation take?

On modern hardware with WebGPU, Whisper processes audio faster than real-time — a 60-second recording typically takes 5-10 seconds to caption. You can watch the text appear in real-time as it's being decoded, with a progress indicator showing overall completion.

Can I switch between SRT and VTT without reprocessing?

Yes. The format toggle instantly converts the same timestamp data between SRT and VTT formats. No reprocessing is needed — it is purely a formatting change. Your formatting settings (characters per line, lines per cue) are preserved across format switches.

What do the characters per line and lines per cue settings do?

Characters per line (default 42) controls how wide each subtitle line is — 42 is the broadcast standard. Lines per cue (default 2) controls how many lines each subtitle entry can have — 2 is standard for TV and streaming. When a sentence is too long, the tool automatically splits it into multiple cues with correctly interpolated timestamps.

Can VLC automatically load the generated subtitles?

Yes. When you save, the tool suggests the same filename as your source video with the .srt or .vtt extension. Place the subtitle file in the same folder as your video — VLC and most other video players will detect and load it automatically.

Can I translate audio to English captions?

Yes. Enable the "Translate to English" checkbox to have Whisper translate non-English speech directly into English captions with accurate timestamps. This is a built-in capability of the Whisper model.

Are my files uploaded anywhere?

No. Your media never leaves your device. All processing — audio decoding, AI inference, timestamp generation, and caption formatting — happens entirely within your browser. There is no server involved at any point.

Can I edit the generated captions?

Yes. Switch to the Editor tab to make corrections, adjust text, or refine the generated captions. The editor provides a separate editable copy — your original generated captions are preserved in the Captions tab.

What languages are supported?

The tool supports over 30 languages including English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, and many more. You must select the spoken language from the dropdown — the language you choose tells the AI what language to expect.

Does it work offline?

After the initial model download, the tool works with locally stored files without an internet connection. The model is cached in your browser storage.

A Note on Accuracy

AI caption generation produces highly accurate results for clear speech but is not perfect. Background noise, heavy accents, overlapping speakers, and domain-specific terminology may reduce accuracy. Use the built-in Editor to review and correct the captions for critical use cases. The formatting controls (characters per line, lines per cue) help ensure your subtitles meet professional display standards regardless of content.

Why Choose Our Free AI CC Generator?

  • Complete Privacy: All AI processing happens locally in your browser — media is never uploaded to any server
  • SRT & VTT Support: Industry-standard subtitle formats with instant switching
  • Professional Formatting: Configurable characters per line and lines per cue for broadcast-ready subtitles
  • Smart Cue Splitting: Long sentences automatically split into multiple cues with interpolated timestamps
  • State-of-the-Art AI: OpenAI Whisper model for high-accuracy speech recognition with timestamps
  • Real-time Streaming: Watch captions appear as they are decoded — no waiting for the entire file
  • 30+ Languages: Generate captions in over 30 languages with translation to English
  • Built-in Editor: Fix errors and refine captions before exporting
  • VLC Auto-Detection: Matching filename suggestion for automatic subtitle loading in video players
  • No Account Required: No registration, no login, no usage limits
  • Audio & Video: Accepts audio files (MP3, WAV, OGG, FLAC) and video files (MP4, WebM, MOV)
  • WebGPU Accelerated: Uses GPU acceleration when available for faster processing
  • Model Caching: One-time download, instant loading on all future visits