Supertone, a Seoul-based company specializing in speech AI technologies, has unveiled the latest iteration of its on-device text-to-speech (TTS) engine: Supertonic v3. This release marks a significant leap forward in both linguistic diversity and user experience, offering enhanced expressiveness and improved stability for developers and end-users alike.
Expanded Language Support and Improved Accuracy
The new Supertonic v3 model supports 31 languages, a dramatic 6-fold increase from its predecessor. This expansion allows developers to integrate multilingual voice experiences into their applications with greater ease, particularly in global markets where language variety is key. Beyond language coverage, Supertone has also refined its reading stability, reducing the frequency of reading failures—common issues that can disrupt user experience in voice-enabled applications.
Introducing Expression Tags for Enhanced Voice Control
One of the most notable features of Supertonic v3 is the inclusion of expression tags, which allow developers to imbue synthetic voices with emotional nuance. These tags enable users to control aspects such as tone, enthusiasm, or sadness, making AI-generated speech more human-like and contextually appropriate. This enhancement is particularly valuable for applications in education, entertainment, and customer service, where emotional resonance plays a critical role in engagement.
Seamless Integration and Future Potential
Despite these major upgrades, Supertone has maintained backward compatibility, ensuring that existing integrations with the previous Supertonic versions remain unaffected. This continuity is crucial for businesses that rely on consistent voice experiences across platforms and products. The company’s focus on on-device processing also positions Supertonic v3 as a privacy-conscious solution, avoiding reliance on cloud-based inference, which is increasingly important in an era of heightened data protection concerns.
With Supertonic v3, Supertone continues to solidify its position in the rapidly evolving AI voice landscape, offering a powerful blend of linguistic inclusivity, emotional intelligence, and technical reliability.



