OpenClaw Setup Guide
Infrastructure

Local Audio Transcription

Transcribe voice messages locally using Whisper.cpp

Transcribes voice messages and audio files locally using Whisper.cpp. No cloud API, everything runs on your machine. When someone sends a voice message on Telegram, the agent automatically transcribes it and responds to the text.

Setup

brew install whisper-cpp ffmpeg
whisper-cli --download-model small

Then add to OpenClaw config:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [{
          "type": "cli",
          "command": "sh",
          "args": [
            "-c",
            "ffmpeg -i \"$1\" -ar 16000 -ac 1 /tmp/openclaw_audio.wav -y 2>/dev/null && whisper-cli -m ~/.local/share/whisper-cpp/ggml-small.bin -f /tmp/openclaw_audio.wav --no-timestamps 2>/dev/null",
            "--",
            "{{MediaPath}}"
          ],
          "timeoutSeconds": 120
        }]
      }
    }
  }
}

Audio comes in, ffmpeg converts it to 16kHz mono WAV, Whisper transcribes it locally. The small model balances speed and accuracy well. Use medium or large if you want better accuracy and don't mind waiting.

On this page