We keep building, UPDATES:
1. Turkish 5 other language detection
Added Turkish, Italian, Dutch, Indonesian, Thai, and Hebrew to detectSimpleLanguage in analyze-tweet so the bot replies in the user's language instead of defaulting to English.
2. Multi-frame video analysis (new point: video-frames)
Instead of relying on a single thumbnail frame, the bot now downloads the video, extracts 4-6 frames at evenly spaced intervals using ffmpeg, and sends them all to Claude Vision in one call. This gives the AI a full understanding of what happens in the video (scenes, uniforms, watermarks, locations, sequence of events). Installed ffmpeg on the server to support this.
3. Garbage transcript detection (video-transcribe)
Whisper was hallucinating on non-English audio, producing gibberish like "Julie you don't know 再見 ..." (for Russian speech) and "冬瓜汤完成" (for noise). Now detects mixed scripts, excessive repetition, and low-information transcripts and discards them instead of poisoning the pipeline.
4. Video context search (dialectical-verify)
When multi-frame analysis identifies specific clues (media watermarks like "Mash", location markers like "Dubai skyline", uniform types like "paramedic"), the bot now runs targeted search queries using those identifiers (e.g., "Mash 2021 Russia paramedics elevator assault") instead of generic descriptions.
5. Updated claim decomposition prompt (dialectical-verify)
Added instructions telling the AI to use video analysis details (watermarks, locations, uniforms, incident types) to formulate specific, searchable claims rather than treating video descriptions as throwaway context.