Final Achievement: Rapid and Effective Dynamic Scene Reconstruction Made Possible by the NVIDIA Research Model.
The model, called QUEEN, can generate high-quality, low-bandwidth scenes that may be utilized for live media broadcasts, 3D video conferences, and industrial robotics operations, among other streaming applications.
Queen
QUantized Efficient ENcoding Dynamic Gaussian Encoding for Free-viewpoint Video Streaming.
Researchers create effective representations for dynamic Gaussians in steamtable free-viewpoint videos. The result, QUEEN, can render at about 350 frames per second, train in less than 5 seconds, and capture dynamic situations with outstanding visual quality while reducing the model size to just 0.7 MB every frame.
Approach
Quantization-Sparsity Framework
![Quantization-Sparsity Framework Quantization-Sparsity Framework](https://b3682738.smushcdn.com/3682738/wp-content/uploads/2024/12/image-21-1024x245.png?lossy=1&strip=1&webp=1)
At each time-step, this technique learns steamtable 3D Gaussian attribute residuals. It create a quantization-sparsity system that uses quantization to compress all other characteristics and sparsity to compress position residuals. To learn all compressed latents in an end-to-end differentiable way. In order to accelerate per frame training, one can additionally create an adaptive masking strategy that divides static and dynamic Gaussians together with associated picture areas.
NVIDIA Research
QUEEN, an AI model developed by NVIDIA Research and the University of Maryland, is taking content streaming and engagement to a new level by enabling the streaming of free-viewpoint video, which allows users to see a 3D scene from any perspective.
QUEEN might be used to create immersive streaming apps that add dimension to business video conferences, educate cookery skills, or put sports fans on the pitch to watch their favorite teams play from any angle. Additionally, it might assist in teleoperating robots in manufacturing facilities or warehouses.
The model will be showcased at NeurIPS, the yearly AI research conference that kicks off in Vancouver on Tuesday, December 10. Attend NeurIPS 2024 with NVIDIA from December 10–15 at the Vancouver Convention Centre in Vancouver, British Columbia.
Everyone need to compress and rebuild the 3D NVIDIA at the same time in order to transmit free-viewpoint movies in almost real-time. “QUEEN creates an optimized pipeline that sets a new standard for visual quality and streamability by balancing factors like compression rate, visual quality, encoding time, and rendering time.”
For Effective Streaming, Reduce, Reuse, and Recycle
Video footage taken from several camera angles, such as a multicamera film studio setup, a warehouse security camera system, or an office videoconferencing camera system, is commonly used to make free-viewpoint films.
Previous AI techniques for creating free-viewpoint films either compromised visual quality for smaller file sizes or used excessive amounts of RAM for livestreaming. Even in dynamic scenarios with flames, sparks, or hairy creatures that are easily transferable from a host server to a client’s device, QUEEN strikes a balance between the two to provide high-quality images. Additionally, it supports streaming use cases by rendering graphics more quickly than earlier techniques.
Many aspects of a scene remain unchanged in the majority of real-world settings. This indicates that a significant portion of pixels in a video remain constant from frame to frame. QUEEN records and reuses representations of these static portions to save computing time, concentrating instead on recreating the material that varies over time.
NVIDIA Research used an NVIDIA Tensor Core GPU to assess
QUEEN’s performance on a number of benchmarks and discovered that the model beat the most advanced techniques for online free-viewpoint video on a number of criteria. Rendering free-viewpoint films at around 350 frames per second usually requires less than five seconds of training time given 2D footage of the same scene taken from various perspectives.
By providing immersive virtual reality experiences or quick replays of crucial moments in a competition, this speed and visual quality combination can complement media coverage of concerts and sporting events.
When manipulating tangible things in a warehouse, robot operators may employ QUEEN to more accurately sense depth. Additionally, in a videoconferencing application like the 3D videoconferencing example at SIGGRAPH and NVIDIA GTC, presenters may use it to explain activities like origami or cookery while allowing viewers to select the visual aspect that best aids in their learning.
The QUEEN code will be provided on the project page and made available as open source shortly.
QUEEN is one of more than 50 NeurIPS posters and papers written by NVIDIA that highlight innovative AI research with possible uses in simulation, robotics, and healthcare.
The NeurIPS 2024 Test of Time Award went to the work that originally presented GAN models, Generative Adversarial Nets. Bing Xu, a prominent engineer at NVIDIA, coauthored the study, which has received over 85,000 citations.