Saturday, April 26, 2025

DolphinGemma: An AI Model to Analyze Dolphin Vocalizations

For many years, breaking down dolphin clicks, whistles, and burst pulses has been a scientific frontier. Imagine being able to listen to dolphins and comprehend their intricate communication patterns sufficiently to provide answers that are genuine.

On National Dolphin Day, Google, in partnership with Georgia Tech researchers and the Wild Dolphin Project’s (WDP) field research, is revealing advancements in DolphinGemma, a groundbreaking artificial intelligence model that has been trained to understand the structure of dolphin vocalizations and produce new dolphin-like sound sequences. This method pushes the limits of artificial intelligence and Google’s possible relationship with the aquatic environment in the pursuit of interspecies communication.

Researching dolphin society for decades

Deep context is essential to understanding any species, and the WDP offers just that. The world’s longest-running underwater dolphin research project began in 1985 when WDP began examining a particular group of wild Atlantic spotted dolphins (Stenella frontalis) in the Bahamas across several generations. With decades of underwater video and audio painstakingly combined with individual dolphin identities, life histories, and observed behaviours, this non-invasive, “In Their World, on Their Terms” method produces a rich, one-of-a-kind dataset.

Seeing and evaluating the dolphins’ innate social connections and communication is one of WDP’s main goals. In contrast to surface observation, working underwater enables researchers to directly correlate sounds to certain behaviours. They have been associating sound kinds with behavioural settings for decades. Here are a few instances:

  • Mothers and calves can rejoin by using signature whistles, which have distinctive names.
  • During battle, burst-pulse “squawks” are frequently heard.
  • Click “buzzes” are frequently employed when courting or pursuing sharks.

Accurate interpretation depends on knowing which individual dolphins are involved. This observational work’s ultimate objective is to comprehend the structure and possible significance of these organic sound sequences by looking for trends and guidelines that might point to language. The foundation of WDP’s study is this long-term examination of natural communication, which also offers crucial background information for any investigation of AI.

Introducing DolphinGemma

Dolphins’ intricate, natural communication is extremely difficult to analyse, but WDP’s large, labelled dataset offers a special chance for state-of-the-art AI.

DolphinGemma comes into play. This AI model was created by Google and utilises certain Google audio technologies: dolphin noises are effectively represented by the SoundStream tokenizer and then processed using a model architecture that is appropriate for intricate sequences. The size of this approximately 400M parameter model is ideal for direct operation on the Pixel phones that WDP utilises in the field.

This model expands on findings from Google’s Gemma collection of cutting-edge, lightweight open models, which are constructed using the same technology and research as Google’s Gemini models. Similar to how large language models for human language predict the next word or token in a sentence, DolphinGemma is an audio-in, audio-out model that has been extensively trained on WDP’s acoustic database of wild Atlantic spotted dolphins. It processes sequences of natural dolphin sounds to find patterns, structure, and ultimately predict the likely subsequent sounds in a sequence.

This field season, WDP is starting to implement DolphinGemma, which might have instant advantages. The model can assist researchers in revealing hidden structures and possible meanings within the dolphins’ natural communication by recognising recurrent sound patterns, clusters, and dependable sequences a process that traditionally required a great deal of human labour. These patterns may eventually develop a common lexicon for interactive communication with the dolphins, supplemented with artificial noises the researchers made to denote items the dolphins enjoy playing with.

Using Pixel phones to listen to and analyze dolphin sounds

WDP is following a unique, parallel course in addition to examining natural communication investigating possible two-way connection in the ocean using technology. In collaboration with the Georgia Institute of Technology, this endeavour resulted in the creation of the CHAT (Cetacean Hearing Augmentation Telemetry) system. The purpose of CHAT, an underwater computer, is to create a more straightforward, common vocabulary rather than directly decode the dolphins’ intricate natural language.

The idea initially depends on linking new, artificial whistles (made by CHAT and different from the sounds of actual dolphins) to certain items that the dolphins appreciate, such as seagrass, sargassum, or scarves that the researchers wear. Researchers anticipate that by showing the human-to-dolphin system, the inquisitive dolphins will learn to imitate the whistles in order to make these requests. The dolphins’ natural noises can eventually be included into the system when more of them are recognised.

In order to provide two-way communication, the CHAT system must first:

  • Over the sound of the water, hear the imitation precisely.
  • Determine in real time which whistle was imitated.
  • Use underwater bone-conducting headphones to let the researcher know what the dolphin “requested.”
  • Provide the appropriate item so the researcher may react promptly, strengthening the link.

Real-time high-fidelity dolphin sound analysis was performed on a Google Pixel 6. Building on this work, the next generation, which revolves around a Google Pixel 9 (research is scheduled for summer 2025), combines speaker and microphone capabilities and makes use of the phone’s sophisticated processing power to run deep learning models and template matching algorithms concurrently.

For field research in the open ocean, using Pixel smartphones significantly decreases the requirement for bespoke hardware, increases system maintainability, lowers power consumption, and reduces the device’s size and cost all of which are critical benefits. Researchers may respond to the dolphins more quickly and make interactions more fluid and rewarding by using DolphinGemma predictive capabilities to help CHAT anticipate and detect possible imitators early in the vocalization sequence.

Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

Page Content

Recent Posts

Index