In WWDC 2025, Apple introduced SpeechAnalyzer to modernize its on-device speech recognition framework with a new proprietary Apple model.
In our benchmarks, Apple matches the speed and accuracy of mid-tier OpenAI Whisper models on long-form conversational speech transcription.
Developers looking for a free offering with this specific mid-tier speed-accuracy trade-off can pick either Apple SpeechAnalyzer or a smaller model in Argmax WhisperKit, depending on their other requirements. We publish a comprehensive feature set comparison below.
For those with even more demanding requirements, Argmax Pro SDK offers frontier accuracy for speech AND speaker recognition while achieving ~5x higher transcription speed compared to either framework.
Benchmarks
SDK
Model
Error Rate (↓)
Speed Factor (↑)
Size (↓)
Argmax WhisperKit
openai/whisper-base.en
15.2
111
145 MB
Apple SpeechAnalyzer
Apple SpeechTranscriber
14.0
70
133 MB
Argmax WhisperKit
openai/whisper-small.en
12.8
35
216 MB
Argmax Pro SDK
nvidia/parakeet-v2
11.8
359
420 MB
↓: Lower is better ↑: Higher is better
Speed Factor (↑)
Speed factor indicates the number of seconds of input audio processed by the transcription system in one second of wall-clock time, e.g. A speed factor of 60 means that a system can process 1 minute of audio in 1 second.
All results are computed on an M4 Mac mini running macOS 26 Beta Seed 1. Apple results are obtained through this open-source benchmark script and can be easily reproduced. Argmax results are obtained in our Playground app on TestFlight and can be reproduced even more easily.
Error Rate (↓)
This is the Word Error Rate (WER) metric computed on a random 10% subset of the earnings22 dataset, consisting of ~12 hours of English conversations from earnings calls with analysts. The reason for picking this dataset is that Apple mentions long-form conversational speech as the primary improvement with their new SpeechTranscriber model.
Size (↓)
This is thetotal download size of the model in megabytes (MB). If installed, Apple’s model can be found here:
Apple SpeechTranscriber model assets are downloaded into /System/Library/AssetsV2
This post covers just the first stage of our benchmarks for offline transcription. We will include Apple SpeechAnalyzer in our upcoming real-time streaming transcription benchmarks which will also include top cloud speech-to-text API providers. Achieving high accuracy and low latency at the same time in real-time streaming mode is hard to solve, and we are curious to see what Apple cooked!
Argmax will integrate Apple
We were slightly disappointed to see that Apple’s model still requires a download and does not come pre-installed with iOS or macOS. However, if Apple SpeechAnalyzer is widely adopted, a newly installed app will find that the model was previously downloaded by another app, including Apple’s first-party apps, on the same device, and skip the download! This removes a significant obstacle for on-device deployment: the latency from app install to first inference, which is dominated by model download time.
For this purpose, Argmax will integrate Apple SpeechAnalyzer so that Argmax WhisperKit and Argmax Pro SDK users may also benefit from a pre-downloaded model while their Argmax model is being downloaded for them.