Customer

Detail

December 9, 2025

Detail
  • Detail, Apple's pick for iPad App of the Year 2025, leverages Argmax SDK to build their flagship features such as automatic video captioning and scene transitions.
  • Detail migrated from AWS Transcribe to Argmax SpeakerKit Pro for speaker diarization while building their automatic scene transition feature which reduced their latency for a 2-minute video from ~25 seconds down to less than 1 second while increasing accuracy.*
  • Detail also migrated from whisper.cpp to Argmax WhisperKit Pro for video captioning, realizing improved reliability as well as reduced latency and memory usage.

What the CEO Has to Say

Argmax builds an amazing turn-key developer experience that enabled us to build a unique set of features faster for our customers without having to worry about scaling infrastructure.

-Paul Veugen, Founder and CEO at Detail Technologies

Detail on iOS App Store

Text-based Video Editing

Detail relies on the accurate word-level transcript timestamps generated using Argmax WhisperKit Pro to enable their flagship text-based video editing experience. High-accuracy is non-negotiable for this feature as errors in timestamps would lead to cut-off words in the edited video.

Whisper's word-level timetamps are notoriously unreliable out-of-the-box, even when using OpenAI's official algorithm, because it relies on "eye-balled selection" of attention heads inside the model for a rough approximation. Argmax engineers have programatically searched all attention heads and quantitatively selected the most accurate attention heads for the highest accuracy word-level timestamps anyone can derive out of any Whisper model using OpenAI's official algorithm.

Text-based Video Editing in Detail

Automatic Speaker Detection

Detail auto-edits raw footage into podcasts that automatically switch the camera view based on detected speaker turns. The speaker turns are detected using Argmax SpeakerKit Pro. Detail migrated from AWS Transcribe to Argmax SpeakerKit Pro because it enabled them to consistently and significantly achieve lower latency while improving speaker detection accuracy*.

Automatic Speaker Detection in Detail

Find out more about Argmax SDK today.

*AWS vs Argmax accuracy and speed results can be found in this peer-reviewed Interspech 2025 research paper.

Related Articles