June 19, 2025
TL;DR
Whisper Large v3 has been the workhorse of speech-to-text with widespread industry adoption. In 2025, some proprietary models started surpassing Whisper on the accuracy frontier per the OpenASR leaderboard and open-source models were briefly kicked out of the top ranks. Thanks to Nvidia, open-source models quickly reclaimed the top rank with Nvidia Parakeet v2. We are excited to announce that Nvidia Parakeet models are now supported on Argmax Pro SDK!
We have reimplemented and optimized Nvidia Parakeet for Apple Silicon, achieving near-peak utilization of 10+ TFLops on the Apple Neural Engine. In the process, we have reproduced the OpenASR leaderboard results with our reimplementation of Parakeet V2 which ranks first as of May 2025, ahead of top cloud speech-to-text API providers. This model runs seamlessly across iOS 17 and newer as well as macOS 14 and newer.
Contrary to common belief, on-device AI does not require an accuracy trade-off by using "small models". In this case, Nvidia Parakeet v2 deployed on Apple devices with Argmax Pro SDK surpasses proprietary models deployed on top cloud APIs running on Nvidia GPUs. Android support is coming later this summer.
The Parakeet v2 model on Argmax Pro SDK works without any compression on macOS. Nonetheless, we also built a compressed version by applying Argmax proprietary compression techniques to Parakeet v2 to retain accuracy while reducing model download size from 1.2 GB to 0.4 GB.
0.4 GB is equivalent to ~3 hours of high-quality audio (320kbps) for video editing or ~15 hours of low-quality audio (64kbps) for virtual meetings. You download the model only once so you never have to upload (potentially sensitive) user audio data again and your users break even in terms of data transfer after just a few hours of usage!
At Argmax, we optimize for scalable deployment to all Apple Silicon devices in use today, instead of optimizing for contrived speed records on the latest and greatest hardware. This is why we are excited to report that even the least capable Apple Silicon Mac from 5 years ago, M1 Macbook Air with 8 GB RAM is able to surpass a speed factor of 100. Nvidia Parakeet v2 on Argmax Pro SDK is more than an order of magnitude faster than Whisper Large V3 (and its Turbo variant) using the same SDK. It is also faster than most speech-to-text cloud API providers per Artificial Analysis benchmarks.
Do hands-on testing in less than 2 minutes on our TestFlight app. If you want to integrate these capabilities into your product, sign up for access on our website!