We’re increasing the Async text-to-speech lineup with two new fashions constructed for various manufacturing wants: Async Professional v1.0 and Async Flash v1.5.
Along with Async Flash v1.0, the lineup now provides builders a clearer means to decide on the precise TTS mannequin based mostly on high quality, latency, language protection, and workflow compatibility.
Async Professional v1.0 is our highest-quality English text-to-speech mannequin to this point, constructed for natural-sounding speech and improved pronunciation accuracy.
Async Flash v1.5 is our next-generation low-latency streaming mannequin for real-time voice functions. It delivers improved responsiveness and speech high quality for conversational experiences.
Async Flash v1.0 stays the best choice for groups that want broad multilingual protection, timestamp technology, synchronous TTS workflows, or compatibility with current Flash v1.0 integrations.
Briefly, every Async mannequin is optimized for a special use case, making it simpler to decide on the precise stability of high quality, latency, and language assist.
A fast have a look at the Async TTS mannequin lineup
For those who’re constructing real-time voice brokers, buyer assist automation, podcasts, audiobooks, or multilingual functions, the Async mannequin lineup now provides you a devoted possibility optimized in your particular use case.
What’s new within the Async TTS lineup?
Async now gives three text-to-speech fashions for 3 completely different manufacturing wants: high-quality English technology, low-latency multilingual conversations, and broad language protection.
That issues as a result of TTS necessities change relying on the applying. A podcast narration workflow doesn’t have the identical constraints as a reside buyer assist agent. An audiobook venture could prioritize pure pacing and pronunciation accuracy, whereas a telephone agent wants quick streaming, dependable textual content dealing with, and responsive turn-taking.
The expanded Async lineup is designed round these tradeoffs. Builders can now select a mannequin based mostly on the job their software must do: generate polished English speech, energy real-time multilingual conversations, or assist broader language and endpoint compatibility.
Async Professional v1.0 is constructed for high-quality English TTS
Async Professional v1.0 is the quality-first mannequin within the Async TTS lineup. It’s designed for English voice experiences the place pronunciation accuracy, pure supply, and dependable textual content dealing with immediately have an effect on the person expertise.
Pure English speech technology
Async Professional v1.0 is optimized for pure English speech technology and robust pronunciation accuracy. This makes it a robust match for merchandise the place the voice is a part of the expertise, not only a utility layer.
For instance, audiobooks, narration, podcasts, and premium voice assistants all depend upon speech that feels polished over longer listening classes. In these workflows, small points with pacing, pronunciation, or supply can grow to be noticeable rapidly.
Computerized textual content normalization
One of many primary enhancements in Async Professional v1.0 is computerized textual content normalization. The mannequin can deal with dates, numbers, currencies, abbreviations, and structured content material throughout technology, serving to builders cut back the necessity for separate preprocessing pipelines.
That is particularly helpful when the enter textual content is dynamic. A voice assistant may have to learn account balances, calendar dates, product names, or formatted IDs. A narration workflow may embrace headings, lists, numbers, and abbreviations in the identical script.
Low-latency streaming for quality-first use circumstances
Though Async Professional v1.0 is constructed round output high quality, it nonetheless maintains low-latency streaming efficiency. That makes it appropriate for each content material technology and real-time conversational functions the place English voice high quality issues.
That is helpful for groups that don’t need to select between pure speech and responsive supply. If the product expertise depends upon premium English output however nonetheless must really feel interactive, Async Professional v1.0 is the mannequin to judge first.
Finest use circumstances for Async Professional v1.0
Async Professional v1.0 is finest for voice assistants, conversational AI, audiobooks, narration, podcasts, and premium English voice experiences.
Select Async Professional v1.0 when English speech high quality is the precedence and your software wants output that feels polished, correct, and prepared for longer listening experiences.
Async Flash v1.5 is constructed for real-time multilingual voice brokers
Async Flash v1.5 is the low-latency streaming mannequin within the Async TTS lineup. It’s constructed for real-time voice functions the place speech must really feel quick, pure, and responsive throughout reside conversations.
Low-latency streaming for reside conversations
Actual-time voice functions depend upon timing. If the generated speech takes too lengthy to start out, the dialog can really feel delayed, even when the voice high quality is robust.
Async Flash v1.5 is designed for low-latency streaming, making it a robust match for conversational merchandise the place quick response time is a part of the person expertise. This consists of voice brokers, telephone brokers, AI assistants, and assist automation workflows the place customers anticipate the system to reply naturally within the second. For groups designing a streaming TTS system, Flash v1.5 is the mannequin to judge when quick turn-taking and conversational responsiveness are central to the expertise.
Six-language assist for multilingual brokers
Async Flash v1.5 helps English, Spanish, French, German, Italian, and Portuguese. That makes it a sensible alternative for groups constructing multilingual voice brokers throughout main markets with no need the complete 15-language protection of Async Flash v1.0.
For builders constructing a multilingual voice agent, this creates a extra targeted possibility: use Flash v1.5 when your software wants real-time multilingual efficiency throughout these six languages, and use Flash v1.0 when broader language protection is the precedence.
Higher dealing with of structured textual content
Flash v1.5 introduces main enhancements in pronunciation, textual content normalization, voice cloning high quality, and conversational responsiveness. It additionally performs higher on structured textual content generally present in manufacturing voice brokers.
That features dates, telephone numbers, currencies, account numbers, abbreviations, and IDs.
This issues as a result of voice brokers usually want to talk dynamic info throughout a reside dialog. A assist agent may affirm an order quantity, learn a telephone quantity, clarify a fee quantity, and reference an account ID in the identical interplay. If that textual content shouldn’t be dealt with cleanly, the expertise can rapidly really feel robotic or unreliable.
Extra constant voice cloning high quality
Flash v1.5 additionally improves voice cloning consistency and intonation in comparison with earlier generations. For manufacturing voice brokers, consistency issues as a result of customers could hear the identical cloned voice throughout many conversations, languages, or assist situations.
A extra constant voice helps the expertise really feel steady, particularly in customer-facing workflows the place the assistant’s voice turns into a part of the model expertise.
Finest use circumstances for Async Flash v1.5
Async Flash v1.5 is finest for voice brokers, real-time AI assistants, buyer assist automation, telephone brokers, and multilingual conversational AI.
Select Async Flash v1.5 when low-latency streaming, multilingual assist, and conversational responsiveness are extra necessary than most language protection.
Async Flash v1.0 stays the broadest compatibility mannequin
Async Flash v1.0 stays your best option for builders who want broad multilingual assist, synchronous workflows, timestamp technology, or compatibility with current Flash v1.0 integrations.
Whereas Async Professional v1.0 and Async Flash v1.5 are optimized for extra particular use circumstances, Flash v1.0 continues to serve groups that want the widest protection throughout languages and endpoints.
15-language assist
Async Flash v1.0 helps 15 languages:
- English
- French
- Spanish
- German
- Italian
- Portuguese
- Arabic
- Russian
- Romanian
- Japanese
- Hebrew
- Armenian
- Turkish
- Hindi
- Chinese language
That makes it one of the best match when language protection issues greater than utilizing the most recent specialised mannequin. For instance, a multilingual schooling platform, international content material workflow, or worldwide buyer expertise might have broader protection than the six languages accessible in Flash v1.5.
Synchronous technology and timestamp APIs
Async Flash v1.0 helps all Async API endpoints, together with synchronous technology and timestamp APIs.
That issues for groups constructing workflows the place streaming shouldn’t be the one requirement. Some functions want generated audio returned in a synchronous movement. Others want timestamps to align spoken audio with captions, transcripts, visible parts, or enhancing workflows.
For these circumstances, Flash v1.0 stays probably the most appropriate possibility within the lineup.
Present integration compatibility
Flash v1.0 additionally stays necessary for groups already constructing on current Async integrations. Not each software must migrate to a more recent mannequin instantly, particularly if the present workflow depends upon broad endpoint assist or a bigger language set.
For current functions, the most secure strategy is to deal with Async Professional v1.0 and Async Flash v1.5 as specialised additions fairly than computerized replacements.
Finest use circumstances for Async Flash v1.0
Async Flash v1.0 is finest for broad multilingual deployments, functions requiring timestamp technology, synchronous TTS workflows, and current Flash v1.0 integrations.
Select Async Flash v1.0 when language protection, endpoint compatibility, or integration stability issues greater than utilizing the most recent low-latency or highest-quality mannequin.
How to decide on the precise Async TTS mannequin
The appropriate Async TTS mannequin depends upon the constraint that issues most in your software: speech high quality, real-time responsiveness, language protection, or endpoint compatibility.
A easy approach to make the choice: select Professional for English high quality, Flash v1.5 for reside multilingual conversations, and Flash v1.0 for protection and compatibility.
Earlier than implementing, overview the mannequin documentation for the newest supported languages, options, and endpoint availability.
How Async compares with different TTS APIs
The TTS market has moved rapidly, particularly as extra groups construct voice brokers, AI assistants, buyer assist bots, and content material technology workflows. Most main suppliers now supply high-quality speech technology, streaming assist, or multilingual voices, however they don’t all the time arrange mannequin alternative across the identical manufacturing tradeoffs.
OpenAI’s speech API, for instance, gives built-in voices and real-time audio streaming for builders constructing voice experiences. ElevenLabs gives a number of speech fashions, together with low-latency choices for real-time functions. Google Cloud Textual content-to-Speech offers a broad cloud API with neural voice tiers, whereas Amazon Polly gives a number of TTS engines and enterprise-friendly AWS infrastructure.
Async’s strategy is extra targeted on serving to builders select the precise mannequin for the job. As a substitute of positioning one mannequin as the reply for each workflow, the Async TTS lineup separates mannequin alternative into three clear manufacturing paths: English high quality, real-time multilingual dialog, and broad language or endpoint compatibility.
Async vs broad cloud TTS suppliers
Broad cloud TTS suppliers are sometimes strongest when groups already depend on a bigger cloud ecosystem. They could be a good match for infrastructure-heavy groups that need speech synthesis alongside different cloud companies.
Async is extra targeted on manufacturing voice workflows the place mannequin choice must be direct. Builders can select Async Professional v1.0 for higher-quality English speech, Async Flash v1.5 for low-latency multilingual conversations, or Async Flash v1.0 for wider language and endpoint compatibility.
Async vs voice-first AI platforms
Voice-first AI platforms usually give attention to sensible voices, cloning, dubbing, creator workflows, or agent experiences. These platforms might be highly effective, however the precise alternative depends upon whether or not the workforce is optimizing for content material, real-time interplay, language assist, or API management.
Async suits finest when builders desire a clear TTS mannequin lineup that maps on to manufacturing necessities. Professional v1.0 is constructed for polished English output, Flash v1.5 is constructed for responsive multilingual brokers, and Flash v1.0 stays accessible for groups that want broader protection or current endpoint assist.
The place Async suits finest
Async is a robust match for groups constructing with text-to-speech in manufacturing and attempting to stability high quality, latency, multilingual assist, and compatibility.
Use Async whenever you need to construct round a devoted voice API fairly than forcing each TTS workflow by the identical mannequin. The expanded lineup provides builders a cleaner approach to match the mannequin to the expertise they’re creating, whether or not that may be a premium English voice product, a real-time assist agent, or a multilingual software with broader endpoint wants.
What builders ought to test earlier than implementation
Earlier than selecting a mannequin, builders ought to overview the newest mannequin documentation for supported languages, options, and endpoint availability.
That is particularly necessary in case your software depends upon a particular workflow, reminiscent of streaming technology, synchronous TTS, timestamp technology, or an current Flash v1.0 integration.
For brand new builds, begin with the use case:
- If the product is English-first and quality-sensitive, consider Async Professional v1.0.
- If the product is conversational and latency-sensitive, consider Async Flash v1.5.
- If the product wants broader multilingual protection or timestamp assist, use Async Flash v1.0.
For current functions, the most secure strategy is to deal with the brand new fashions as extra specialised choices fairly than computerized replacements. Flash v1.0 continues to assist broad language protection and all Async API endpoints, so groups with current workflows can proceed utilizing it the place compatibility issues.
Async Professional v1.0 and Async Flash v1.5 give builders extra management over mannequin choice. As a substitute of forcing each TTS workflow into the identical mannequin, now you can align the mannequin with the precise product requirement: high quality, latency, language protection, or compatibility.
Begin constructing with the precise Async TTS mannequin
Async Professional v1.0 and Async Flash v1.5 increase the Async TTS lineup with extra specialised choices for manufacturing voice functions.
Use Async Professional v1.0 when English voice high quality issues most. Use Async Flash v1.5 whenever you want low-latency multilingual speech for reside conversations. Stick with Async Flash v1.0 when broad language protection, timestamp technology, synchronous workflows, or current integration compatibility are the precedence.
Begin with the Async Voice API, overview the mannequin documentation, and select the mannequin that matches your software’s high quality, latency, language, and endpoint necessities.
FAQ
What’s the distinction between Async Professional v1.0 and Async Flash v1.5?
Async Professional v1.0 is optimized for the highest-quality English speech technology. Async Flash v1.5 is optimized for low-latency multilingual voice functions in English, Spanish, French, German, Italian, and Portuguese.
Which Async TTS mannequin ought to I take advantage of for voice brokers?
Use Async Flash v1.5 for real-time voice brokers, telephone brokers, buyer assist automation, and multilingual conversational AI. It’s constructed for low-latency streaming, conversational responsiveness, and improved dealing with of structured textual content throughout reside interactions.
Which Async TTS mannequin helps probably the most languages?
Async Flash v1.0 helps probably the most languages within the Async TTS lineup. It helps 15 languages, making it one of the best match for broad multilingual deployments and functions that want most language protection.
Does Async deal with textual content normalization robotically?
Sure. Async Professional v1.0 and Async Flash v1.5 robotically deal with troublesome textual content normalization situations reminiscent of dates, numbers, currencies, telephone numbers, account numbers, abbreviations, IDs, and structured content material.
When ought to I take advantage of Async Flash v1.0 as a substitute of the newer fashions?
Use Async Flash v1.0 when your software wants broad language protection, synchronous TTS workflows, timestamp technology, or compatibility with current Flash v1.0 integrations. It stays the best choice for optimum language and endpoint compatibility.
The place can builders test supported languages and endpoints?
Builders can overview the Async mannequin documentation for the newest supported languages, options, and endpoint availability. That is one of the best supply to substantiate compatibility earlier than selecting a mannequin for manufacturing.
