Product

Argmax Local Server

August 8, 2025

Argmax Local Server

TL;DR

  • Argmax Local Server brings the same best-in-market latency and accuracy as Argmax Pro SDK but is much more easy to integrate on macOS.
  • Easy to work with in any language with Python, Javascript, Rust clients and more
  • If using cloud APIs for real-time transcription, don't change your implementation at all, add 3 lines of code to migrate
  • Feature-complete for AI Meeting Notes apps with multi-stream real-time transcription of system audio and microphone

Multi-stream Real-time Transcription

Run multiple independent audio input and transcript output streams without additional memory consumption. Argmax Local Server supports an unlimited number of sessions and transcription is still in real-time or faster for the first two sessions. This is necessary when transcribing system audio, e.g. Google Meet audio from remote participants, and system microphone, local participants in the same meeting.

WebSocket API & Cloud API Compatibility

Argmax Local Server has a WebSocket API that is compatible with cloud APIs like Deepgram. This enables seamless hybrid deployment, e.g. Windows devices may run on the cloud APIs while all Mac devices run on-device in an Electron app with a single inference client implementation. The demo video above demonstrates an Electron app using Argmax Local Server with ZERO changes to app code when migrating from Deepgram. This is possible by simply switching the inference host URL endpoint from remote host (api.deepgram.com) to localhost.

Other Apps Do Not Slow Down

Before Argmax Local Server hit the market, many apps like Granola tried on-device inference last year and decided against it because CPU and GPU resource contention with other apps led to slowdowns and user complaints.

In our mission to make on-device the obvious architectural choice for audio model inference infrastructure, we solved this problem. See 1:31 in the video above for details on how.

Other Major Features

Automatic voice activity detection avoids unnecessary processing of each stream when no one is speaking, but there is ambient noise. Real-time speaker diarization will be included soon. For now, we recommend using SpeakerKit via Argmax Pro SDK for use with prerecorded audio.

Related Articles