December 9, 2025

Argmax builds an amazing turn-key developer experience that enabled us to build a unique set of features faster for our customers without having to worry about scaling infrastructure.
-Paul Veugen, Founder and CEO at Detail Technologies

Detail relies on the accurate word-level transcript timestamps generated using Argmax WhisperKit Pro to enable their flagship text-based video editing experience. High-accuracy is non-negotiable for this feature as errors in timestamps would lead to cut-off words in the edited video.
Whisper's word-level timetamps are notoriously unreliable out-of-the-box, even when using OpenAI's official algorithm, because it relies on "eye-balled selection" of attention heads inside the model for a rough approximation. Argmax engineers have programatically searched all attention heads and quantitatively selected the most accurate attention heads for the highest accuracy word-level timestamps anyone can derive out of any Whisper model using OpenAI's official algorithm.

Detail auto-edits raw footage into podcasts that automatically switch the camera view based on detected speaker turns. The speaker turns are detected using Argmax SpeakerKit Pro. Detail migrated from AWS Transcribe to Argmax SpeakerKit Pro because it enabled them to consistently and significantly achieve lower latency while improving speaker detection accuracy*.

Find out more about Argmax SDK today.
*AWS vs Argmax accuracy and speed results can be found in this peer-reviewed Interspech 2025 research paper.