🇺🇸 English🇨🇳 中文
SkillsNav
Home

video-content-extractor

Social MediaSafeClaude Codex

How to Install

This skill comes from a community source. Check the original listing for install instructions.

General Claude Code install: copy SKILL.md to ~/.claude/skills/

Video Content Extractor

Overview

Automatically extracts key frames from MP4 video files at configurable time intervals, performs OCR text recognition on each frame, and generates a structured Markdown report. The report includes video metadata (duration, resolution, codecs) and frame-by-frame OCR transcripts with timestamp references.

This skill is designed for Codex CLI and requires FFmpeg and Tesseract OCR installed on the local machine.

When to Use This Skill

  • Use when you need to extract text content from video presentations, lectures, or screencasts.
  • Use when you want to create searchable transcripts from video files without embedded subtitles.
  • Use when you need to analyze video content programmatically and generate structured summaries.
  • Use when the user asks to "read what is on screen" or "extract the content from this video."

How It Works

Step 1: Analyze Video Metadata

The skill uses ffprobe to extract video metadata: duration, resolution, frame rate, codec information, and file size.

Step 2: Extract Key Frames

Using FFmpeg, the skill captures frames at the configured interval (default: every 30 seconds). Each frame is saved as a timestamped JPEG image.

Step 3: OCR Text Recognition

Each extracted frame is processed by Tesseract OCR. If the default PSM mode returns no meaningful text, it falls back to fully automatic page segmentation.

Step 4: Generate Markdown Report

All extracted data is assembled into a structured Markdown document.

Examples

Example 1: Basic Extraction

Agent prompt: Use the video-content-extractor skill to extract content from lecture.mp4

Output generates lecture.md and lecture_frames/ directory.

Example 2: Custom Interval

Parameters: video_path, output_dir, interval(seconds), lang Extract every 60 seconds with English-only OCR: python scripts/extract_video.py recording.mp4 ./output 60 eng

Example 3: Bilingual Content

Extract with default Chinese + English OCR: python scripts/extract_video.py lecture.mp4 . 15 chi_sim+eng

Best Practices

  • Use shorter intervals (10-15s) for fast-paced content with frequent text changes.
  • Use longer intervals (30-60s) for presentation slides or slow lectures to reduce duplicate frames.
  • For Chinese content, ensure Tesseract Chinese language pack is installed (chi_sim).

Limitations

  • Requires FFmpeg and Tesseract OCR to be installed and accessible via PATH.
  • Tesseract OCR accuracy depends on video quality, text size, and font clarity.
  • Does not extract audio or perform speech-to-text transcription.
  • Frame extraction is time-based (not scene-change-based), which may produce near-duplicate frames.
  • Large videos with short intervals can generate many frames - ensure sufficient disk space.

Security and Safety Notes

  • This skill only reads video files and writes extracted frames and Markdown reports.
  • It does NOT send any data over the network - all processing is local.
  • FFmpeg and Tesseract are invoked with fixed, pre-vetted arguments.
  • The skill does not modify or delete the original video file.

Common Pitfalls

  • Problem: Tesseract returns garbled text Solution: Ensure the correct language pack is installed. Run tesseract --list-langs to verify.

  • Problem: FFmpeg fails with "not found" Solution: Make sure FFmpeg is on PATH. Run ffmpeg -version to verify.

  • Problem: OCR is slow on large videos Solution: Increase the interval parameter to reduce frames processed.

Related Skills

  • @media-summarizer - For summarizing video content using visual and audio cues.
  • @document-ocr - For OCR on static images or scanned documents without video processing.

Details

Category Content → Social Media
Sourcecommunity
StarsN/A
Risk LevelSafe

Related Skills