Google Translate
NKo, Tamazight, and Cantonese are among the 110 new languages Google is utilising AI to add to Google Translate.
To improve communication and understanding of the outside world, Google Translate helps people communicate across linguistic divides. Google goal is to increase the number of people who can use this tool by implementing the newest technologies: Google used Zero-Shot Machine Translation in 2022 to add 24 new languages. This method teaches a machine learning model to translate into a foreign language without ever viewing an example. A pledge to develop AI models that would support the 1,000 most widely spoken languages worldwide was also made when Google unveiled the 1,000 Languages Initiative.
The range of languages Google serve is currently being increased through the use of AI. Google is introducing 110 new languages to Google Translate, marking the largest expansion to date, because of their PaLM 2 large language model.
Support for almost 500 million individuals with translations
Around 8% of the global population can now be translated into these new languages, which range from Cantonese to Qʼeqchiʼ and have more than 614 million speakers. With over 100 million speakers, these are prominent languages in the world. Some have nearly no native speakers but are undergoing active revival attempts, while others are spoken by little communities of Indigenous people. These include Fon, Kikongo, Luo, Ga, Swati, Venda, and Wolof, and account for about 25% of the total number of new languages spoken worldwide.
A few of the languages that Google Translate now supports are as follows:
- Spoken in Djibouti, Ethiopia, and Eritrea, Afar is a tonal language. With the most volunteer community contributions of any language in this launch, Afar had the highest number.
- One of the languages that people ask Google Translate to translate into the most frequently is Cantonese. It’s difficult to locate data and train models since Cantonese and Mandarin frequently overlap in writing.
- The language spoken on the Isle of Man is Celtic, called Manx. Since the last native speaker passed away in 1974, it has nearly completely disappeared. Thousands of speakers now exist, nevertheless, because of an island-wide revival campaign.
- A common language made up of several dialects is called NKo, which is a standardised variant of the West African Manding languages. Since the invention of its distinctive alphabet in 1949, a vibrant research community has been creating tools and technologies specifically for it.
- The most widely spoken language in Pakistan is Punjabi (Shahmukhi), which is a variant of Punjabi written in Perso-Arabic handwriting.
- Speaking all over North Africa, Tamazight (Amazigh) is a Berber language. Written forms are often mutually understood, despite the wide variety of dialects. Google Translate is capable of translating text written in both Tifinagh and Latin scripts.
- Language of Papua New Guinea: Tok Pisin is a creole language based on English. You might be able to understand the meaning if you translate from English into Tok Pisin.
- The process via which Google select linguistic variations
- When adding new languages to Translate, there are several factors to take into account, such as the kind of languages Google support and the spelling conventions Google use.
Regional variations, dialects, and differing spelling conventions are only a few examples of the vast diversity found in languages. Selecting the “correct” variant of a language is unfeasible because many languages lack a standard form. Google have prioritised the versions of each language that are most often utilised. One language that is spoken throughout Europe in many dialects is Romani. Text generated by their models is most similar to Southern Vlax Romani, a variant that is frequently encountered on the internet. Still, it incorporates aspects from other sources as well, such as Balkan Romani and Northern Vlax.
Key components of the puzzle, such as languages similar to Hindi, such as Awadhi and Marwadi, and French creoles, such as Seychellois Creole and Mauritian Creole, were made possible by PaLM 2, which let Translate learn these languages more quickly. With the progress of technology and their ongoing collaboration with linguists and local speakers, Google will eventually accommodate an increased number of language variations and spelling standards.
With the addition of 110 new languages, Google Translate has significantly expanded its language translation capabilities to 243. This is the biggest language addition in the history of the service. Many African languages, including Fon, Kikongo, Luo, Ga, Swati, Venda, and Wolof, are mixed in with commonly spoken languages like Cantonese and Punjabi (Shahmukhi). With this release, translations will be available to more than 614 million speakers globally, or around 8% of the world’s population.
Google’s PaLM 2 big language model, which has been trained on enormous volumes of parallel multilingual text, served as the engine for this extension. The model can learn and interpret languages more quickly thanks to PaLM 2, especially those that are closely linked to one another, including languages close to Hindi and different French creoles.
The programme is a part of Google’s larger endeavour to support the 1,000 languages that are most widely spoken worldwide, an endeavour that aims to reduce language barriers and foster global community connections.