Customer

ModMed

November 12, 2025

ModMed
  • Over 40,000 providers in the US use ModMed to help drive clinical and operational success for their practices.
  • ModMed selected Argmax SDK to power ModMed Scribe's real-time ambient listening capabilities. A demo video is published here. Argmax has been live in production since March 2025 with zero downtime and serve virtually 100% of the speech-to-text inference traffic.
  • Argmax enabled ModMed to reduce the transcript finalization latency from many seconds on the cloud to less than 500 milliseconds (p95) on the clinician device independent of the session length and internet connectivity. This is a major upgrade to the user experience where clinicians can start reviewing the suggested visit note content based on the conversation that just ended.

What The CEO Has to Say

"Argmax has been an instrumental partner with ModMed on our ambient listening AI solution. Our custom language models need to deliver high accuracy, performance, and battery life on mobile devices. Their technologies deliver, and their research keeps us at the frontier of what is the state of the art."

-Daniel Cane, Co-founder and co-CEO at ModMed


ModMed AI Scribe built on Argmax SDK running on a clinician's iPad (Source)

Before Argmax

Natively built on a large repository of structured, specialty-specific data, ModMed Scribe goes beyond SOAP note transcription to suggest specific visit note content that helps healthcare providers complete downstream clinical workflows like prescriptions and patient handouts. ModMed Scribe was designed to:

  • Listen and understand each medical specialty’s language (e.g. dermatology, ophthalmology etc.)
  • Deliver more accurate transcription and better identify provider-desired downstream actions

To achieve this, ModMed fine-tuned Whisper Large v3 in 2024, the leading speech-to-text model during that year, to achieve the above product objectives. After deciding to deploy their model on-device, they evaluated the most promising open-source tools and decided that Argmax WhisperKit was the only one that could run Whisper Large v3 on iOS fast enough and for extended sessions without impacting device health.

Around the same time, they also discovered Argmax SDK which offers a turn-key Real-time Transcription API and custom model optimization support and decided to upgrade to the Enterprise Plan. The upgrade enabled ModMed to refocus their engineering efforts on improving their model and the user experience while trusting Argmax for the rest.

After 6 months in Production with Argmax

After deploying ModMed's medical specialty-tuned Whisper Large v3 model in early 2025, Argmax advised and collaborated with ModMed's research team as they adopted the then-newly released frontier speech-to-text model from Nvidia called Parakeet. Argmax was the first developer platform to port and optimize Nvidia Parakeet for Apple Silicon. Hence, ModMed was able to upgrade to this model without any time-to-market delay.

Enterprise-level SLAs

ModMed upgraded to Enterprise-level support with SLAs on customer-selected metrics such as end-of-session latency, accuracy and battery life. To satisfy the SLA, Argmax set up a dedicated hardware lab with device types that ModMed users are using, specifically configured to run an inference workload that proxies ModMed's production inference workload.

Argmax generally configures a subset of test devices to run on Developer Beta versions of iOS in order to detect potential issues as early as possible. Once detected, Argmax addresses them before they become public releases of iOS and start impacting real users. In one notable case, Argmax detected issues on Developer Beta Seed 1 of iOS 26 on Day 2 of its release and worked with Apple to resolve a low-level Neural Engine bug before the first Public Beta Seed 3 hit the market, averting a major incident.

Learn more about deploying frontier on-device AI with Argmax:

Related Articles