Live API For The Development Of Real-Time Interactions

Use the Live API to create real-time interaction. The Live API gives developers the tools they need to create apps and intelligent agents that can process text, video, and audio streams with extremely low latency. This speed is essential for developing genuinely engaging experiences, which will pave the way for real-time monitoring services, instructional platforms, and customer support solutions.

Also recently announced the Live API for Gemini models’ preview launch, which is a big step towards empowering developers to create scalable and reliable real-time applications. Use the Gemini API in Vertex AI and Google AI Studio to test out the newest features.

What’s new in the Live API

It has been paying close attention to your input since the beta launch in December, and have introduced new features and capabilities to get the Live API production ready. See the Live API documentation for complete details:

Improved session control and dependability

  • Context compression allows for longer sessions, allowing for longer interactions than before. To avoid sudden terminations because to context limits, set up context window compression with a sliding window method to automatically control context length.
  • Resuming sessions: Preserve them after brief network outages. The Live API now offers handles (session_resumption) to rejoin and pick up where you left off, as well as server-side session state storage (for a maximum of 24 hours).
  • Notification of graceful disconnect: Get a GoAway server message when a connection is ready to end so that it can be handled politely before ending.
  • Adaptable turn coverage Select whether all audio and video input is processed continuously by the Live API or if it is only recorded when the end-user is heard speaking.
  • Configurable media resolution: Choose the resolution for input media to maximise quality or token usage.

More control over interaction dynamics

  • Voice activity detection (VAD) that is configurable: Use new client events (activityStart, activityEnd) for manual turn control and select sensitivity levels or completely disable automatic VAD.
  • Configurable interruption handling: Choose if the model’s response should be interrupted by user input.
  • Flexible session settings: Throughout the session, you can change the system instructions and other setup parameters whenever you want.

Richer output & features

  • Additional language and voice options: For audio output, select from 30 new languages and two new voices. SpeechConfig now allows you to customise the output language.
  • Text streaming: Allows the user to see text responses more quickly by receiving them gradually as they are generated.
  • Reporting on token consumption: Learn about usage by examining the comprehensive token counts, split down by modality and prompt/response stages, that are supplied in the usage metadata field of server messages.

See the Live API in action: real-world applications

The team is highlighting developers that are already utilising the Live API’s capabilities in their applications to get you started on your next project:

Daily.co

The Pipecat Open Source SDKs for Web, Android, iOS, and C++ now support the Live API.

Pipecat Daily has developed a voice-based word guessing game called Word Wrangler by utilising the capabilities of the Live API. See how you can create one for yourself by trying your description abilities in this AI-powered take on traditional word games!

traditional word games
Image Credit To Google

Live Kit

Support for the Live API is integrated into LiveKit Agents. A completely open-source platform for developing server-side agentic applications is offered by this framework for developing speech AI agents.

Bubba.ai

Hey Bubba is a voice-first, agentic AI app designed especially for truck drivers. Drivers may operate hands-free thanks to the smooth, multilingual voice communication made possible by the Live API. Important features include of:

  • Looking for lots of goods and giving information.
  • Making phone calls to shippers and brokers.
  • Use market data to negotiate freight costs.
  • Confirming rate confirmations and scheduling loads.
  • Locating and reserving truck parking, as well as confirming availability by phone with motel.
  • Arranging meetings with receivers and shippers.

Both Bubba’s ability to communicate during phone calls for booking and negotiation and driver engagement (using function calling and context caching for enquiries like future pickups) are powered by the Live API. Because of this, Hey Bubba is a complete AI tool for the USA’s biggest and most varied job sector.

RELATED ARTICLES

Page Content

Recent Posts

Index