Benchmarks

Apple SpeechAnalyzer and Argmax WhisperKit

June 20, 2025

Apple SpeechAnalyzer and Argmax WhisperKit

TL;DR

  • In WWDC 2025, Apple introduced SpeechAnalyzer to modernize its on-device speech recognition framework with a new proprietary Apple model.
  • In our benchmarks, Apple matches the speed and accuracy of mid-tier OpenAI Whisper models on long-form conversational speech transcription.
  • Developers looking for a free offering with this specific mid-tier speed-accuracy trade-off can pick either Apple SpeechAnalyzer or a smaller model in Argmax WhisperKit, depending on their other requirements. We publish a comprehensive feature set comparison below.
  • For those with even more demanding requirements, Argmax SDK offers frontier accuracy for speech AND speaker recognition while achieving ~5x higher transcription speed compared to either framework.

Benchmarks

SDK Model Error Rate (↓) Speed Factor (↑)
Argmax WhisperKit openai/whisper-base.en 15.2 111
Apple SpeechAnalyzer Apple SpeechTranscriber 14.0 70
Argmax WhisperKit openai/whisper-small.en 12.8 35
Argmax Pro SDK nvidia/parakeet-v2 11.7 359

↑: Higher is better

Speed Factor (↑)

Speed factor indicates the number of seconds of input audio processed by the transcription system in one second of wall-clock time, e.g. A speed factor of 60 means that a system can process 1 minute of audio in 1 second.

All results are computed on an M4 Mac mini running macOS 26 Beta Seed 1. Apple results are obtained through this open-source benchmark script and can be easily reproduced. Argmax results are obtained in our Playground app on TestFlight and can be reproduced even more easily.

Error Rate (↓)

This is the Word Error Rate (WER) metric computed on a random 10% subset of the earnings22 dataset, consisting of ~12 hours of English conversations from earnings calls with analysts. The reason for picking this dataset is that Apple mentions long-form conversational speech as the primary improvement with their new SpeechTranscriber model.

Feature Set Comparison

Feature Apple SpeechAnalyzer Argmax WhisperKit Argmax Pro SDK
Offline Transcription
Real-time Transcription
Word Timestamps
Voice Activity Detection
Speaker Diarization
Diarized Transcripts
Language Detection
Languages Supported 10 100 100
Model Support Apple Whisper: OpenAI models or your fine-tuned version Best available on the market: From any model vendor or your custom model
Model Updates At Apple’s discretion Developer’s discretion Developer’s discretion

Deployment Considerations

Consideration Apple SpeechTranscriber Model Argmax Models
Is the model pre-installed in the operating system? No No
Is the model automatically downloaded during first use? Yes Yes
Does the model increase my app download size? No No
Does the model itself increase my app’s memory usage? No No
Compute Engine Neural Engine + CPU (Hardcoded) Neural Engine (GPU and CPU usage is configurable if desired)
iOS compatibility iOS 26 and newer iOS 17 and newer
macOS compatibility macOS 26 and newer macOS 14 and newer

Support Considerations

Apple SpeechAnalyzer Argmax WhisperKit Argmax Pro SDK
Debug-ability Source not available Open-source Open-core
How are issues reported & fixed? 1) File a Feedback Assistant ticket and check if issue gets fixed in the next OS update 1) Self-troubleshoot
2) File a GitHub issue
3) Get help on Discord
1) Get priority support on Slack
How fast can issues be fixed? Next OS update at the earliest Immediate hot-fix possible Immediate hot-fix possible

Commercial Considerations

Apple SpeechAnalyzer Argmax WhisperKit Argmax Pro SDK
Cost Free Free Pricing
License Apple Proprietary MIT Argmax Standard License

Keyword Recognition Accuracy

The Error Rate results from above provide high-level insights into the speech-to-text accuracy of Apple and Argmax systems on general vocabulary. However, many real-world use cases disproportionately rely on accuracy for important known keywords instead.

The following benchmark demonstrates the keyword recognition accuracy of Apple SpeechAnalyzer (iOS 26), Apple SFSpeechRecogizer (pre-iOS 26) and Argmax SDK. Keywords are defined as people, company and produt names that occur in the earnings22 dataset. Please see this GitHub repository for details.

See OpenBench on GitHub for details and steps to reproduce

Notably, Apple's new SpeechAnalyzer (iOS 26) API lacks the Custom Vocabulary feature that lets developers improve accuracy on known-and-registered keywords while Apple's older SFSpeechRecognizer API (pre-iOS 26) has this feature and surpasses their new API in accuracy.

Argmax with Custom Vocabulary surpasses both by a significant margin and even matches top cloud APIs in keyword recognition accuracy.

Argmax will integrate Apple

We were slightly disappointed to see that Apple’s model still requires a download and does not come pre-installed with iOS or macOS. However, if Apple SpeechAnalyzer is widely adopted, a newly installed app will find that the model was previously downloaded by another app, including Apple’s first-party apps, on the same device, and skip the download! This removes a significant obstacle for on-device deployment: the latency from app install to first inference, which is dominated by model download time.

For this purpose, Argmax will integrate Apple SpeechAnalyzer so that Argmax WhisperKit and Argmax SDK users may also benefit from a pre-downloaded model while their Argmax model is being downloaded for them.

Browse Apple SpeechAnalyzer documentation.

Start with Argmax WhisperKit on GitHub.

Get access to Argmax SDK.

Related Articles