Monday, May 27, 2024

Samsung Galaxy AI ASR: Always Listening, Always Responding

Samsung Galaxy AI ASR and TTS

Reporters visit Samsung Research Centres worldwide to find out how Galaxy AI is helping more users reach their full potential as Samsung keeps leading the way in premium mobile AI experiences. With 16 languages now supported by Galaxy AI, more individuals can learn more languages even when they’re not online.

This is made possible by on-device translation included in services like Live Translate, Interpreter, Note Assist, and Browsing Assistance. However, what is involved in developing AI languages? This article looks at the difficulties it encountered when using mobile AI and how they resolved them. To find out where one starts teaching AI to speak a new language, let first travel to Indonesia.

As per the staff at Samsung R&D Institute Indonesia (SRIN), setting targets is the first stage. Excellent AI starts with relevant and high-quality data. Junaidillah Fadlil, head of AI at SRIN, says, “Every language demands a different way to process this, forcing us to dive deep to understand the linguistic needs and the specific national conditions.” Galaxy AI now supports Bahasa Indonesia, thanks to his team. Insight and science must guide local language development, so the first step in adding languages to Galaxy AI is for us to plan what information is necessary and can ethically and legally obtain.”

Text-to-speech (TTS), neural machine translation (NMT), and automated speech recognition (ASR) are the three main tasks carried out by Galaxy AI capabilities like Live Translate. Every operation requires a different set of data.

For example, ASR requires large-scale speech recordings in multiple settings, each accompanied by a precise text transcription. varied settings can be explained by varied quantities of background noise. The team’s ASR lead, Muchlisin Adi Saputra, says that simply adding noises to recordings is insufficient.

What is Automatic Speech Recognition

Automatic Speech Recognition (ASR). It lets computers translate spoken language into text.

Here’s ASR breakdown:

  • Automatic: ASR systems transcribe speech from recordings or in real time without manual intervention.
  • Speech Recognition: The technology analyses audio clips for words.
  • The recognised words are transformed to text.

ASR is growing more popular and has many uses:

  • Voice assistants: Like Siri or Alexa, ASR lets you talk to devices.
  • ASR can transcribe lectures, meetings, and interviews into text.
  • Closed captioning: ASR generates real-time video and live stream captions.
  • Interactive voice response (IVR) systems: ASR allows voice responses to automated phone menus.

ASR technology improves accuracy and handles accents and backgrounds.

Apart from the language data but gathered from approved third-party partners, but also need to record our own voices by going into coffee shops or business settings. This enables us to faithfully reproduce distinctive noises from everyday life, such as keyboard clatter and individuals crying out.

It’s also important to take into account how languages are always evolving. “It’s need to stay current with the newest slang and its usage, and we primarily find it on social media,” continues Saputra.

Next, translation training data is needed for NMT. “It’s difficult to translate Bahasa Indonesia,” notes the team’s NMT lead, Muhammad Faisal. They require a large number of translated texts that the AI might consult for new terms, foreign words, proper nouns, and idioms any material that helps AI grasp the context and rules of communication because of its extensive use of implicit and contextual meanings, which depends on social and situational cues.

Then, for TTS, recordings that encompass a variety of voices and tones are needed, along with more context regarding the sounds that particular word parts make in various situations. According to TTS lead Hartis Abdurrahman, “well-made voice recordings could cover all the required phonemes (units of sound in speech) for the AI model and do half the job.” “If a voice actor performed exceptionally well in the previous round, the emphasis switches to improving the AI model to speak particular words clearly.”

Here are a few particular instances of data that is used to teaching AI new languages:

  • Text and translations: Machine translation requires parallel text data, which presents a sentence or document in both the source and target languages.
  • Speech transcripts and recordings: AI must also learn a language in spoken form. The AI is able to identify accents and speech patterns thanks to recordings matched with textual transcripts.
  • Social media data: Social media offers an abundance of colloquial language, slang, and contemporary patterns that help the AI remain abreast of real-world communication patterns.
  • Essentially, data serves as the AI’s instructor, giving it the knowledge required to understand the nuances of a foreign language and interact with others successfully.

Greater Together

Planning for a large amount of data requires substantial resources, and SRIN collaborated extensively with linguists. “This challenge calls for ingenuity, resourcefulness, and proficiency in machine learning and Bahasa Indonesia,” Fadlil muses. “Samsung’s open collaboration philosophy, combined with our operations scale and AI development history, were key factors in completing the task.”

The SRIN team overcame the challenges of setting data targets and swiftly adopted best practices by collaborating with other Samsung Research centres worldwide. Moreover, cooperation helped to advance culture as well as technology. The SRIN crew developed stronger ties and broadened their awareness of various cultures when they joined their colleagues in Bangalore, India, and observed the local fasting rituals.

The team saw a new relevance in the language expansion project of Galaxy AI. “It’s are especially proud of our accomplishments because this was our first AI project. They plan to continue improving our models and output quality, so this won’t be our last,” Fadlil says in closing. “This expansion respects and incorporates our cultural identities through language, while also reflecting our values of openness.”

It will visit Samsung R&D Institute Jordan in the upcoming episode of The Learning Curve to talk with the group in charge of Galaxy AI’s Arabic language project. Watch to discover how difficult it is to create and train an artificial intelligence model for a language with several dialects.

Gowri Priya
Gowri Priya
Gowri Priya has been writing Mobiles Related articles for govindhtech from Aug 2023. She was a Commerce graduate. She was an enthusiast of Mobiles and Their Technologies.


Please enter your comment!
Please enter your name here

Recent Posts

Popular Post Would you like to receive notifications on latest updates? No Yes