Introduce Gemma 3n preview, Veo 3, Imagen 4 & Veo 2 updates

0
192
Gemma 3n
Gemma 3n

Google Releases New AI Models and Tools, Expanding the Frontiers of Creative Media Creation and On-Device AI

Google is extending its goal for accessible AI by announcing several new models and tools that emphasize advanced generative media production and effective on-device capabilities.

Gemma 3n

The Gemma 3n preview, which is a potent, effective, and mobile-first AI model, is a significant feature. Developed in partnership with leading mobile hardware companies like Qualcomm Technologies, MediaTek, and Samsung System LSI, Gemma 3n is the first open model based on a novel, state-of-the-art architecture. Designed to enable genuinely private and intimate encounters directly on devices such as computers, tablets, and phones, this architecture is tuned for lightning-fast, multimodal AI.

Gemma 3n‘s primary features include:

  • Enhanced On-Device Performance & Efficiency: in comparison to Gemma 3 4B, it responds on mobile devices about 1.5 times faster, with noticeably greater quality and a smaller memory footprint. This is made possible by advancements like Per-Layer Embeddings (PLE), which significantly lowers RAM utilization and enables models with 5B and 8B parameter counts to function with a dynamic memory footprint that is equivalent to that of 2B and 4B models (only 2GB and 3GB).
  • The ability to incorporate a layered state-of-the-art 2B active memory footprint submodel into a model with a 4B active memory footprint natively is made possible by MatFormer training. This gives you the freedom to dynamically trade off quality and performance. Additionally, mix-and-match functionality is added to the 4B model to dynamically generate submodels for certain use cases.
  • Privacy-First & Offline Ready: With local execution, features that protect user privacy and work dependably without an internet connection are made possible.
  • Enhanced Multimodal Understanding: Gemma 3n has a very high level of video understanding and is able to comprehend and interpret text, audio, and images. Its audio capabilities include high-quality Automatic Speech Recognition (transcription) and Translation (speech to translated text). It can even comprehend complicated interactions by accepting interleaved inputs from several modalities.
  • Enhanced Multilingual Capabilities: It performs better overall, but best in Japanese, German, Korean, Spanish, and French.

With the upcoming Gemini Nano generation, which will also be powered by this same architecture and go on sale later this year, Gemma 3n provides an early look at the architectural advances that will be accessible on Android and Chrome. Developers may begin experimenting with Gemma 3n right now by using a preview on Google AI Edge for on-device development or Google AI Studio for cloud-based research.

In addition to the on-device developments, Google is launching new tools and models for generative media. By pushing the boundaries of media generation, these are intended to empower artists and producers.

The upgraded and new creative tools consist of:

  • Veo 3: A brand-new, cutting-edge model for creating videos that can now record audio, including conversation, traffic, and bird sounds. From text and image prompting to perfect lip syncing and real-world physics, it succeeds. Enterprise users may now access Veo 3 on Vertex AI, while Ultra subscribers in the US can do so through the Gemini app and Flow.
  • Updates for Veo 2: New features added to Veo 2 based on creator feedback include object add/remove functionality, camera controls for precise movements, outpainting to extend the frame, and state-of-the-art reference-powered video for more creative control and consistency. There are currently reference-powered camera and video controllers available in Flow.
  • Flow: An AI filmmaking tool created with and for creatives, especially for Veo. Flow combines the most sophisticated models from Google DeepMind (Veo, Imagen, and Gemini) to enable users to produce cinematic scenes, clips, and narratives. In addition to managing plot elements, users can use natural language to describe photographs. Flow is now available to Google AI Pro and Ultra plan subscribers in the US.
  • Imagen 4: This is the newest model for creating images, combining speed and accuracy to produce beautiful images with excellent typography. Imagen 4 has exceptional clarity in minute details, performs exceptionally well in a variety of styles, supports aspect ratios up to 2k resolution, and has much improved font and spelling. In addition to Workspace apps like Slides, Vids, and Docs, it is currently accessible through the Gemini app, Whisk, and Vertex AI. There will be a quicker version soon.
  • Lyria 2: Access to this music production paradigm has been extended, providing limitless research and potent creativity. Creators can access Lyria 2 via YouTube Shorts, and businesses can use Vertex AI. Lyria RealTime, the interactive music creation model that drives MusicFX DJ, is accessible through AI Studio and an API.

Google highlights its dedication to developing AI responsibly. To assist distinguish Veo 3, Imagen 4, and Lyria 2 outputs as AI-generated, SynthID watermarks will remain applied. Today, a new tool called SynthID Detector is being released as a verification gateway to assist users in spotting AI-generated material by looking for these watermarks. Google takes a cautious approach to open models and seeks to improve procedures as the field of artificial intelligence develops. Its objective is to release human creativity and make it easier for artists and creators to realize their ideas.