video-content-extractor

Social MediaSafeClaude Codex

How to Install

This skill comes from a community source. Check the original listing for install instructions.

General Claude Code install: copy SKILL.md to ~/.claude/skills/

Video Content Extractor

Overview

Automatically extracts key frames from MP4 video files at configurable time intervals, performs OCR text recognition on each frame, and generates a structured Markdown report. The report includes video metadata (duration, resolution, codecs) and frame-by-frame OCR transcripts with timestamp references.

This skill is designed for Codex CLI and requires FFmpeg and Tesseract OCR installed on the local machine.

When to Use This Skill

Use when you need to extract text content from video presentations, lectures, or screencasts.
Use when you want to create searchable transcripts from video files without embedded subtitles.
Use when you need to analyze video content programmatically and generate structured summaries.
Use when the user asks to "read what is on screen" or "extract the content from this video."

How It Works

Step 1: Analyze Video Metadata

The skill uses ffprobe to extract video metadata: duration, resolution, frame rate, codec information, and file size.

Step 2: Extract Key Frames

Using FFmpeg, the skill captures frames at the configured interval (default: every 30 seconds). Each frame is saved as a timestamped JPEG image.

Step 3: OCR Text Recognition

Each extracted frame is processed by Tesseract OCR. If the default PSM mode returns no meaningful text, it falls back to fully automatic page segmentation.

Step 4: Generate Markdown Report

All extracted data is assembled into a structured Markdown document.

Examples

Example 1: Basic Extraction

Agent prompt: Use the video-content-extractor skill to extract content from lecture.mp4

Output generates lecture.md and lecture_frames/ directory.

Example 2: Custom Interval

Parameters: video_path, output_dir, interval(seconds), lang Extract every 60 seconds with English-only OCR: python scripts/extract_video.py recording.mp4 ./output 60 eng

Example 3: Bilingual Content

Extract with default Chinese + English OCR: python scripts/extract_video.py lecture.mp4 . 15 chi_sim+eng

Best Practices

Use shorter intervals (10-15s) for fast-paced content with frequent text changes.
Use longer intervals (30-60s) for presentation slides or slow lectures to reduce duplicate frames.
For Chinese content, ensure Tesseract Chinese language pack is installed (chi_sim).

Limitations

Requires FFmpeg and Tesseract OCR to be installed and accessible via PATH.
Tesseract OCR accuracy depends on video quality, text size, and font clarity.
Does not extract audio or perform speech-to-text transcription.
Frame extraction is time-based (not scene-change-based), which may produce near-duplicate frames.
Large videos with short intervals can generate many frames - ensure sufficient disk space.

Security and Safety Notes

This skill only reads video files and writes extracted frames and Markdown reports.
It does NOT send any data over the network - all processing is local.
FFmpeg and Tesseract are invoked with fixed, pre-vetted arguments.
The skill does not modify or delete the original video file.

Common Pitfalls

Problem: Tesseract returns garbled text Solution: Ensure the correct language pack is installed. Run tesseract --list-langs to verify.
Problem: FFmpeg fails with "not found" Solution: Make sure FFmpeg is on PATH. Run ffmpeg -version to verify.
Problem: OCR is slow on large videos Solution: Increase the interval parameter to reduce frames processed.

Related Skills

@media-summarizer - For summarizing video content using visual and audio cues.
@document-ocr - For OCR on static images or scanned documents without video processing.