Benchmarks

Apple SpeechAnalyzer and Argmax WhisperKit

June 20, 2025

Apple SpeechAnalyzer and Argmax WhisperKit

TL;DR

  • In WWDC 2025, Apple introduced SpeechAnalyzer to modernize its on-device speech recognition framework with a new proprietary Apple model.
  • In our benchmarks, Apple matches the speed and accuracy of mid-tier OpenAI Whisper models on long-form conversational speech transcription.
  • Developers looking for a free offering with this specific mid-tier speed-accuracy trade-off can pick either Apple SpeechAnalyzer or a smaller model in Argmax WhisperKit, depending on their other requirements. We publish a comprehensive feature set comparison below.
  • For those with even more demanding requirements, Argmax Pro SDK offers frontier accuracy for speech AND speaker recognition while achieving ~5x higher transcription speed compared to either framework.

Benchmarks

SDK Model Error Rate (↓) Speed Factor (↑) Size (↓)
Argmax WhisperKit openai/whisper-base.en 15.2 111 145 MB
Apple SpeechAnalyzer Apple SpeechTranscriber 14.0 70 133 MB
Argmax WhisperKit openai/whisper-small.en 12.8 35 216 MB
Argmax Pro SDK nvidia/parakeet-v2 11.8 359 420 MB

↓: Lower is better ↑: Higher is better

Speed Factor (↑)

Speed factor indicates the number of seconds of input audio processed by the transcription system in one second of wall-clock time, e.g. A speed factor of 60 means that a system can process 1 minute of audio in 1 second.

All results are computed on an M4 Mac mini running macOS 26 Beta Seed 1. Apple results are obtained through this open-source benchmark script and can be easily reproduced. Argmax results are obtained in our Playground app on TestFlight and can be reproduced even more easily.

Error Rate (↓)

This is the Word Error Rate (WER) metric computed on a random 10% subset of the earnings22 dataset, consisting of ~12 hours of English conversations from earnings calls with analysts. The reason for picking this dataset is that Apple mentions long-form conversational speech as the primary improvement with their new SpeechTranscriber model.

Size (↓)

This is the total download size of the model in megabytes (MB). If installed, Apple’s model can be found here:

Apple SpeechTranscriber model assets are downloaded into /System/Library/AssetsV2

Feature Set Comparison

Feature Apple SpeechAnalyzer Argmax WhisperKit Argmax Pro SDK
Offline Transcription
Real-time Transcription
Word Timestamps
Voice Activity Detection
Speaker Diarization
Diarized Transcripts
Language Detection
Languages Supported 10 100 100
Model Support Apple Whisper: OpenAI models or your fine-tuned version Best available on the market: From any model vendor or your custom model
Model Updates At Apple’s discretion Developer’s discretion Developer’s discretion

Deployment Considerations

Consideration Apple SpeechTranscriber Model Argmax Models
Is the model pre-installed in the operating system? No No
Is the model automatically downloaded during first use? Yes Yes
Does the model increase my app download size? No No
Does the model itself increase my app’s memory usage? No No
Compute Engine Neural Engine + CPU (Hardcoded) Neural Engine (GPU and CPU usage is configurable if desired)
iOS compatibility iOS 26 and newer iOS 17 and newer
macOS compatibility macOS 26 and newer macOS 14 and newer

Support Considerations

Apple SpeechAnalyzer Argmax WhisperKit Argmax Pro SDK
Debug-ability Source not available Open-source Open-core
How are issues reported & fixed? 1) File a Feedback Assistant ticket and check if issue gets fixed in the next OS update 1) Self-troubleshoot
2) File a GitHub issue
3) Get help on Discord
1) Get priority support on Slack
How fast can issues be fixed? Next OS update at the earliest Immediate hot-fix possible Immediate hot-fix possible

Commercial Considerations

Apple SpeechAnalyzer Argmax WhisperKit Argmax Pro SDK
Cost Free Free Pricing
License Apple Proprietary MIT Argmax Standard License

Comprehensive Benchmarks

This post covers just the first stage of our benchmarks for offline transcription. We will include Apple SpeechAnalyzer in our upcoming real-time streaming transcription benchmarks which will also include top cloud speech-to-text API providers. Achieving high accuracy and low latency at the same time in real-time streaming mode is hard to solve, and we are curious to see what Apple cooked!

Argmax will integrate Apple

We were slightly disappointed to see that Apple’s model still requires a download and does not come pre-installed with iOS or macOS. However, if Apple SpeechAnalyzer is widely adopted, a newly installed app will find that the model was previously downloaded by another app, including Apple’s first-party apps, on the same device, and skip the download! This removes a significant obstacle for on-device deployment: the latency from app install to first inference, which is dominated by model download time.

For this purpose, Argmax will integrate Apple SpeechAnalyzer so that Argmax WhisperKit and Argmax Pro SDK users may also benefit from a pre-downloaded model while their Argmax model is being downloaded for them.

Browse Apple SpeechAnalyzer documentation.

Start with Argmax WhisperKit on GitHub.

Get access to Argmax Pro SDK.

Related Articles