Microsoft Translator: Now translating 100 languages and counting!

Publié

Par , Principal Program Manager , Senior Program Manager

Photo of a young child writing on a chalkboard. The word “hello” is written on the chalkboard in multiple languages.

Today, we’re excited to announce that Microsoft Translator has added 12 new languages and dialects to the growing repertoire of Microsoft Azure Cognitive Services Translator, bringing us to a total of 103 languages!

The new languages, which are natively spoken by 84.6 million people, are Bashkir, Dhivehi, Georgian, Kyrgyz, Macedonian, Mongolian (Cyrillic), Mongolian (Traditional), Tatar, Tibetan, Turkmen, Uyghur, and Uzbek (Latin). With this release, the Translator service can translate text and documents to and from languages natively spoken by 5.66 billion people worldwide.

The road to 100+ languages

The core mission of Translator is to break the language barrier—between people and cultures. To achieve this, we have continuously added languages and dialects to this service while ensuring the machine translation quality of the supported languages meets and exceeds the high quality bar we have set for it.

The evolution of Microsoft Translator

Spotlight: Blog post

Eureka: Evaluating and understanding progress in AI

How can we rigorously evaluate and understand state-of-the-art progress in AI? Eureka is an open-source framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. Learn more about the extended findings. 

Microsoft Research first developed machine translation systems over 20 years ago. In 2003, a machine translation system translated the entire Microsoft Knowledge Base from English to Spanish, French, German, and Japanese, and the translated content was published on our website, making it the largest public-facing application of raw machine translation on the internet at the time.

Microsoft evolved the systems further based on statistical machine translation (SMT) models and made it available to the public through Windows Live Translator, the Translator API, and as a built-in function in Microsoft Office applications.

Over the years, we added translation systems for many of the world’s most spoken languages. As artificial intelligence (AI) technology evolved, we adopted neural machine translation (NMT) technology and migrated all machine translation systems to neural models based on transformer technology, achieving massive gains in translation fluency and accuracy.

While NMT technology significantly increased overall translation quality, the advent of transformer architecture paved new ways for creating machine translation models, enabling training with smaller amounts of material than before. Using multilingual transformer architecture, we could now augment training data with material from other languages, often in the same or a related language family, to produce models for languages with small amounts of data —commonly referred to as low-resource languages.

Even with all that technology available, it’s essential to have a body of digital documents available in the target language together with its translation in another already-included language—commonly referred to as parallel documents.

Line chart of the number of languages that Microsoft Translator has translated, from seven in 2007 to over 100 in 2021. The system used statistical machine translation (SMT) from 2007 until 2016. The adoption of neural machine translation (NMT) technology in 2016 helped to increase the quality of translation, and the adoption of transformer architecture in 2019 enabled the Microsoft team to build models for low resource languages with smaller amounts of data.
Figure 1: The adoption of neural machine translation (NMT) technology in 2016 helped us increase the quality of translation, and transformer architecture, adopted in 2019, helped us build models for low resource languages.

A collective effort in data gathering and evaluation

One of the biggest challenges when adding new languages is obtaining enough bilingual data needed to train and produce a machine translation model. This data is comprised of high-quality human-translated content both in the language we want to add and in one of the languages the service already supports. For many languages, this bilingual data is hard to acquire, especially for digitally low-resourced or endangered languages.

We are fortunate to work with partners in language communities who have access to human-translated texts and can help us gather data for under-resourced languages. These community partners, often volunteers working with their respective communities, painstakingly collect bilingual sentences by consulting with community members and elders. They then evaluate the quality of the resulting machine translation models.

Our engagement with community partners started in 2010, when we coordinated with the disaster response community to build a translation system for Haitian Creole within 10 days of Haiti’s devastating earthquake. Since then, an increasing number of community partners have helped us create a multitude of language systems, such as Hmong Daw, Urdu, Swahili, Mayan, Otomí, Māori, and Inuktitut.

“Some of the best moments for our team are when we can light up translation for a new language, often built together with the community. Many languages have been neglected or even suppressed, and it’s incredibly gratifying for us to support these language communities.”

Arul Menezes, Distinguished Engineer, Machine Translation, Microsoft

Technical capabilities of Azure Cognitive Services Translator

Powered by Microsoft Translator, Azure Cognitive Services Translator makes it possible for businesses to expand their global reach, enabling them to communicate with customers and partners across language barriers and provide content to them in their native language quickly, reliably, and at a reasonable cost. It also helps break barriers in internal communication between employees in different countries.

Azure Cognitive Services Translator exposes NMT models in Microsoft products and to Translator customers through the Text Translation (opens in new tab) and Document Translation (opens in new tab) APIs. These APIs translate both plain text and complex documents from one language to another. Azure Cognitive Services Translator APIs are available in the public cloud and in the secure Microsoft Azure Government Cloud. In addition, the Text Translation API is available in Docker containers, allowing customers to process content on-premises to meet specific regulatory requirements.

Azure Cognitive Services Translator also includes the Custom Translator (opens in new tab) service, which enables users to use their own translation memory to build custom machine translation models that translate their domain-specific terminology as it’s used in their business and related industries. These custom machine translation models can be used through the Text and Document Translation APIs.

To translate audio or voice content, Azure Cognitive Services Translator is tightly integrated with Azure Cognitive Services Speech, powering speech translation (opens in new tab) and multi-device conversation (opens in new tab) via the Azure Speech SDK.

Azure Cognitive Services Translator and the products it supports are widely adopted by customers who want to localize website content and apps, translate conversations and content for business analytics, and translate content for forensic investigations, as well as other scenarios. The service seamlessly integrates into many Microsoft products and is readily available for everyone to use and create content in the language of their choice. Some of the Microsoft product integrations include Microsoft 365 for translating text and documents, the Microsoft Edge browser for translating whole webpages, SwiftKey for translating messages, LinkedIn for translating user-submitted content, the Translator app for having multilingual conversations on the move, and many more.

Commitment to language accessibility

How can technology make information accessible to people who don’t understand the language it was originally provided in? In an ever-shrinking world, how can we help people understand and appreciate each other’s culture?

Language barriers prevent access to vital information and drive our commitment to break those barriers. Translating text, documents, voice, and images from one language to another plays a significant role in achieving this.

When we reached our one-hundredth language, Microsoft broke the language barrier for 72% of the world’s population. We are both proud and humbled by this achievement.

Looking forward

Since our inception, the Translator team has endeavored to help bridge language barriers whenever and wherever it was needed, as seamlessly and as transparently as possible. As we look toward the future, we’re excited to improve our services, solutions, and quality to make content from around the globe accessible to everyone, removing the dividing force of language differences while maintaining awareness of culture, tradition, and belonging.

Join our language community effort

If your language community is interested in partnering with Microsoft to add your language to Translator and the products that are using it, and you have access to digital documents in your language and another commonly spoken language, please contact us using this form (opens in new tab).

Next Steps

Lire la suite

Voir tous les articles de blog

Domaines de recherche

Projets connexes