Product

Nvidia Frontier Speech Models on Argmax SDK

June 19, 2025

Nvidia Frontier Speech Models on Argmax SDK

TL;DR

  • Argmax SDK now supports on-device deployment of frontier speech-to-text models open-sourced by Nvidia such as Parakeet v2 that significantly surpass Whisper Large V3 on speed, accuracy and compactness at the same time.
  • Argmax optimized these Nvidia models to run inference at near-peak utilization on the Apple Neural Engine to ensure the battery life remains unimpacted despite the extreme speed.
  • Argmax SDK reproduces OpenASR leaderboard results on several datasets that place Nvidia Parakeet v2 at the top accuracy rank ahead of proprietary cloud APIs.
  • You can try it out in our TestFlight app now and ship it with your app using Argmax Pro SDK today! Sign up for access on our website!

Argmax TestFlight Demo App: Parakeet running on iPhone at 125 speed factor

Frontier Accuracy

Whisper Large v3 has been the workhorse of speech-to-text with widespread industry adoption. In 2025, some proprietary models started surpassing Whisper on the accuracy frontier per the OpenASR leaderboard and open-source models were briefly kicked out of the top ranks. Thanks to Nvidia, open-source models quickly reclaimed the top rank with Nvidia Parakeet v2. We are excited to announce that Nvidia Parakeet models are now supported on Argmax Pro SDK!

We have reimplemented and optimized Nvidia Parakeet for Apple Silicon, achieving near-peak utilization of 10+ TFLops on the Apple Neural Engine. In the process, we have reproduced the OpenASR leaderboard results with our reimplementation of Parakeet V2 which ranks first as of May 2025, ahead of top cloud speech-to-text API providers. This model runs seamlessly across iOS 17 and newer as well as macOS 14 and newer.

Contrary to common belief, on-device AI does not require an accuracy trade-off by using "small models". In this case, Nvidia Parakeet v2 deployed on Apple devices with Argmax Pro SDK surpasses proprietary models deployed on top cloud APIs running on Nvidia GPUs. Android support is coming later this summer.

Compact Size

The Parakeet v2 model on Argmax Pro SDK works without any compression on macOS. Nonetheless, we also built a compressed version by applying Argmax proprietary compression techniques to Parakeet v2 to retain accuracy while reducing model download size from 1.2 GB to 0.4 GB.

0.4 GB is equivalent to ~3 hours of high-quality audio (320kbps) for video editing or ~15 hours of low-quality audio (64kbps) for virtual meetings. You download the model only once so you never have to upload (potentially sensitive) user audio data again and your users break even in terms of data transfer after just a few hours of usage!

Faster than Cloud APIs

At Argmax, we optimize for scalable deployment to all Apple Silicon devices in use today, instead of optimizing for contrived speed records on the latest and greatest hardware. This is why we are excited to report that even the least capable Apple Silicon Mac from 5 years ago, M1 Macbook Air with 8 GB RAM is able to surpass a speed factor of 100. Nvidia Parakeet v2 on Argmax Pro SDK is more than an order of magnitude faster than Whisper Large V3 (and its Turbo variant) using the same SDK. It is also faster than most speech-to-text cloud API providers per Artificial Analysis benchmarks.

Do hands-on testing in less than 2 minutes on our TestFlight app. If you want to integrate these capabilities into your product, sign up for access on our website!

Related Articles