Skip to content

Deepgram AI Voice

from $0.0048/min

About Deepgram

Deepgram is an enterprise-grade Voice AI platform that provides real-time speech-to-text (STT), text-to-speech (TTS), and voice agent APIs. It solves the problem of slow, inaccurate, or costly transcription by using end-to-end deep learning models that process audio in under 300 milliseconds—without requiring pre-training on your specific audio.

What it does

Deepgram offers a unified API that converts audio to text (with streaming and batch options), generates natural-sounding speech, and orchestrates voice agents with built-in turn detection and interruption handling. It supports 45+ languages, speaker diarization, custom vocabulary, and automatic punctuation. Models like Nova-3 handle background noise, crosstalk, and far-field audio out of the box.

Who it's for

This API is designed for developers building voice-enabled apps (e.g., voice assistants, call analytics, live captioning), contact centers needing real-time call transcription, and media companies transcribing podcasts or videos at scale. It's less suited for one-off manual transcription jobs where a human editor is preferred, or for extremely low-budget hobby projects that don't need sub-second latency.

Real use cases

  • Real-time captioning for live events and webinars
  • Automated call scoring and sentiment analysis in contact centers
  • Voice agent chat for customer support using Flux models
  • Batch transcription of recorded meetings, interviews, and video content

Key features

  • Real-Time Streaming — Transcribe audio as it's spoken via WebSocket API with sub-300ms latency
  • Batch Processing — Upload pre-recorded files for asynchronous transcription
  • Custom Vocabulary — Add industry jargon, names, or acronyms to improve accuracy
  • Speaker Diarization — Identifies who spoke when in multi-person audio
  • Punctuation & Formatting — Automatic capitalization, commas, and periods for readable transcripts
  • Language Support — 45+ languages including English, Spanish, Mandarin, and Arabic
  • Voice Agent API — Single unified API for STT, TTS, and LLM orchestration with turn detection

Deepgram Pricing

Deepgram pricing: from $0.0048/min. Billing model: Freemium.

Free Tier

Includes $200 in free credits to get started. No credit card required. Access to all public models with limited concurrency (up to 50 REST API, up to 50 WSS for STT).

Pay As You Go

No minimums, no expiration. $0.0048/min for Nova-3 Monolingual (pre-recorded), $0.0065/min for Flux English (streaming). Higher concurrency limits: up to 150 WSS for STT.

Growth

Pre-paid annual credits (from $4K+/year) save up to 20% vs pay-as-you-go. Includes increased concurrency: up to 225 WSS for STT, up to 60 for TTS and Voice Agent API.

All plans come with community and Discord support; premium SLAs available on Growth and Enterprise. Contact sales for custom models and enterprise deployment.

Find the right tool for you with our AI advisor →

Frequently asked questions

Is there a free plan or free trial for Deepgram?
Yes, Deepgram offers a free tier with $200 in credits to start. No credit card is required to sign up.
How much does Deepgram cost per minute?
Deepgram's pay-as-you-go pricing starts at $0.0048/min for Nova-3 Monolingual (pre-recorded). Streaming rates for Flux English start at $0.0065/min. Growth plans offer up to 20% savings with annual pre-paid credits.
What is Deepgram used for?
Deepgram is used for real-time speech-to-text, text-to-speech, and voice agent APIs. Common use cases include live captions, call analytics, voice assistants, and batch transcription of audio files.
What are the best alternatives to Deepgram?
Top alternatives include Google Cloud Speech-to-Text, Amazon Transcribe, AssemblyAI, and Rev AI. Deepgram is known for its sub-300ms latency, unified Voice Agent API, and strong out-of-box accuracy without pre-training.
Does Deepgram support multiple languages?
Yes, Deepgram supports over 45 languages, including English, Spanish, Mandarin, Arabic, and more. Nova-3 Multilingual and Flux Multilingual handle multiple languages in a single conversation with automatic language detection.
What is a key limitation of Deepgram?
Deepgram is optimized for developer and enterprise use cases requiring low latency and scale. It may not be ideal for one-off manual transcription or users who need a simple web-based editor without an API.
◆ Not sure this is the right tool?

Too many tools to choose from?
Tell us what you need.

Answer 3 quick questions and our AI advisor will match you with the perfect SaaS — only from our hand-picked partners, often with exclusive deals you won't find elsewhere.

Get my personal recommendation 60 seconds · free · no signup