Infrastructure
Local Audio Transcription
Transcribe voice messages locally using Whisper.cpp
Transcribes voice messages and audio files locally using Whisper.cpp. No cloud API, everything runs on your machine. When someone sends a voice message on Telegram, the agent automatically transcribes it and responds to the text.
Setup
brew install whisper-cpp ffmpeg
whisper-cli --download-model smallThen add to OpenClaw config:
{
"tools": {
"media": {
"audio": {
"enabled": true,
"models": [{
"type": "cli",
"command": "sh",
"args": [
"-c",
"ffmpeg -i \"$1\" -ar 16000 -ac 1 /tmp/openclaw_audio.wav -y 2>/dev/null && whisper-cli -m ~/.local/share/whisper-cpp/ggml-small.bin -f /tmp/openclaw_audio.wav --no-timestamps 2>/dev/null",
"--",
"{{MediaPath}}"
],
"timeoutSeconds": 120
}]
}
}
}
}Audio comes in, ffmpeg converts it to 16kHz mono WAV, Whisper transcribes it locally. The small model balances speed and accuracy well. Use medium or large if you want better accuracy and don't mind waiting.