Product

Customize Speech-to-Text

October 22, 2025

Customize Speech-to-Text

TL;DR

  • Argmax Pro SDK now supports Custom Vocabulary, an advanced feature that lets developers quickly customize or personalize speech-to-text models by providing a list of contextual keywords at runtime.
  • Use cases range from proper spelling of people, company and product names in meeting transcriptions to industry or occupation-specific jargon in field service or front-line work.
  • Open-source reproducible benchmarks show that Argmax's keyword recognition accuracy improves from 64% to 92% (F1) with this feature, outperforming all commercial cloud APIs while matching the top cloud API.
  • Try it on superwhisper-2.6.2+, Hyprnote or Argmax Playground today!

Custom Vocabulary

Clean audio recordings of casual conversations without any names or jargon are easy to transcribe. In fact, most speech-to-text systems today do an almost perfect job under these conditions. However, most systems break under realistic in-the-wild settings, such as the one below. Custom Vocabulary works by registering a list of contextual keywords to the transcription system in order to enable a dedicated "keyword search" .

Apple Native API gets 0/4 correct as tested in the Voice Memos app on iOS 26.
For the same recording, Argmax Pro SDK gets 4/4 in the Argmax Playground.

Key differentiators of Argmax Custom Vocabulary compared to similar features in commercial cloud APIs include:

  • Model-agnostic: Detected keywords can be merged with any external transcription result, unlike the Keyword Prompting in Whisper and other commercial APIs
  • Number of keywords limit: Developers imagined applications that required scaling way beyond the ~100 keywords limit that Whisper and other proprietary cloud APIs impose. The Argmax Pro SDK Custom Vocabulary supports 1000 keywords.

How can developers determine the contextual keywords to register in the vocabulary? Here are just a few examples:

  • Meetings: Based on the calendar invite, attendees' names, the names of the companies they represent, and the most popular products of these companies.
  • Videos: Title and description. OCR results from the video in case there are slides with technical jargon.

Achieving high accuracy on names and jargon makes all the difference between toy utility software and critical infrastructure for high-stakes use cases such as AI medical scribes, virtual meeting transcription and even personal dictation software.

Just for fun, here is a challenging test with
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch
in Argmax Playground

Benchmarks

To the best of our knowledge, there is no publicly available dataset to measure speech-to-text accuracy for keyword recognition, e.g. for proper name spelling. For this purpose, we have reannotated the popular earnings22 dataset with people, company and product names as keywords. In the first version of this dataset, we have curated ~1000 audio clips, each 15 seconds long, that contain at least one name. We have reviewed each sample manually which led to many corrections of the original ground-truth transcript annotations because challenging parts of the audio were annotated as inaudible and many names were incorrectly annotated. After manually verifying and fixing the names, including making LinkedIn searches to cross-reference people and company names, we have come up with an extremely high-quality test set. On this set, the accuracy of Argmax Pro SDK, as measured by the F1-score, improves from 64% to 92% when the new Custom Vocabulary feature is activated!

Please refer to OpenBench for more details on these publicly reproducible benchmarks.

As shown above, the keyword accuracy of Argmax significantly surpasses AssemblyAI, Whisper Large v2 (OpenAI API) and Whisper Large v3 Turbo (OpenAI OSS) and matches Deepgram. Argmax has improved from 0.88 to 0.92 from October to November and will continue to improve in the next few months.

Day 0 Support on superwhisper!

This feature has been available in alpha testing for the past week, and several customers have already shipped with the stable version today! If you are a superwhisper user, update to 2.6.2 to get Custom Vocabulary added to the Nvidia Parakeet models powered by Argmax Pro SDK!

Related Articles