June 23, 2025
TL;DR
We have been fans of the pyannoteAI team and what they have built ever since we started building SpeakerKit for on-device speaker diarization earlier this year. As part of building SpeakerKit, we went deep and benchmarked 5 systems (open-source and proprietary) across 13 datasets with a unified evaluation methodology and we found that pyannote-3.1 consistently achieved lower error rates compared to others. Our benchmarks are published at Interspeech 2025 and the code is open-source:
In these benchmarks, pyannoteAI's commercial model (denoted PyAnnote-AI
in the plot) served via their cloud API ranked first on quality, improving the Diarization Error Rate (DER) significantly. In the meanwhile, SpeakerKit matched pyannote-3.1 OSS DER on almost all datasets as expected. The first cohort of SpeakerKit customers noted that the system is already faster than they expected and any future improvements should come in the form of additional features or even higher accuracy (lower DER).
We are excited to announce that pyannoteAI's commercial model is now available on Argmax Marketplace as a SpeakerKit-compatible model upgrade!
Many of our current and prospective customers have been asking about real-time streaming diarization. Although there are some products in the market today, we do not think this technology has achieved commercial-grade status yet. We will ship our version of this technology when the bleeding-edge accuracy meets our bar. In the meanwhile, please enjoy ultra fast inference by rerunnning speaker diarization after each and every transcribed sentence.
This commercial model is just the first chapter of our partnership. In the future, we intend to make other breakthrough technologies from pyannoteAI easy to deploy on device everywhere with Argmax SDK. If you are interested in evaluating this technology, please submit this form to get started.