NVIDIA Develops New AI and Simulation Tools to Promote Humanoid Development and Robot Learning. Robot dexterity, control, manipulation, and mobility will be accelerated by new Project GR00T processes and AI world model building technologies.
At this week’s Conference for Robot Learning (CoRL) in Munich, Germany, NVIDIA unveiled new AI and simulation tools and processes that would significantly speed up robotics developers’ work on AI-enabled robots, including humanoids.
The lineup includes six new humanoid robot learning workflows for Project GR00T, an initiative to speed up the development of humanoid robots; new world-model development tools for video data curation and processing, such as the NVIDIA Cosmos tokenizer and NVIDIA NeMo Curator for video processing; and the general release of the NVIDIA Isaac Lab robot learning framework.
By decomposing photos and videos into premium tokens with very high compression rates, the open-source Cosmos tokenizer offers robotics developers superb visual tokenization. NeMo Curator offers video processing curation up to 7 times quicker than unoptimized pipelines, and it operates up to 12 times faster than existing tokenizers.
In conjunction with CoRL, NVIDIA offered developer workflow and training guidelines and presented 23 papers and nine workshops on robot learning. Hugging Face and NVIDIA also revealed that they would work together to help the developer community advance open-source robotics research via LeRobot, NVIDIA Isaac Lab, and NVIDIA Jetson.
Accelerating Robot Development With Isaac Lab
NVIDIA Omniverse, a platform for creating OpenUSD applications for industrial digitalization and physical AI simulation, serves as the foundation for Isaac Lab, an open-source robot learning framework.
Robot policies may be trained at scale by developers using Isaac Lab. Any embodiment, including humanoids, quadrupeds, and collaborative robots, may use this open-source unified robot learning framework to manage ever more intricate motions and interactions.
Prominent global robotics research organizations, robotics application developers, and commercial robot manufacturers, such as 1X, Agility Robotics, The AI Institute, Berkeley Humanoid, Boston Dynamics, Field AI, Fourier, Galbot, Mentee Robotics, Skild AI, Swiss-Mile, Unitree Robotics, and XPENG Robotics, are embracing Isaac Lab.
Project GR00T: Foundations for General-Purpose Humanoid Robots
Advanced humanoids are very challenging to build, requiring multidisciplinary and multilayer technical techniques to enable the robots to detect, move, and develop skills for interactions with humans and the environment.
An program called Project GR00T aims to speed up the worldwide ecosystem of humanoid robot developers by creating accelerated libraries, foundation models, and data pipelines.
Humanoid developers may use six new Project GR00T procedures as blueprints to achieve the most difficult humanoid robot skills. Among them are:
- GR00T-Gen for creating OpenUSD-based 3D environments driven by generative AI
- GR00T-Mimic for generating robot motion and trajectory
- GR00T-Dexterity for manipulating robots with dexterity
- Whole-body control using GR00T-Control
- GR00T-Mobility for navigation and movement of robots
- GR00T-Multimodal Sensing Perception
According to NVIDIA’s senior research manager of embodied AI, “Humanoid robots are the next wave of embodied AI.” “To further the advancement and development of humanoid robot developers worldwide, NVIDIA research and engineering teams are working together throughout the organization and our developer ecosystem to build Project GR00T.“
New Development Tools for World Model Builders
These days, world models AI representations of the world are being created by robot developers to forecast how settings and things will react to a robot’s movements. Thousands of hours of carefully selected, real-world picture or video data are needed to build these world models, which are very computationally and data-intensive.
NVIDIA Cosmos tokenizers make it easier to create these world models by offering effective, high-quality encoding and decoding. They enable high-quality video and picture reconstructions by setting a new benchmark for low distortion and temporal instability.
The Cosmos tokenizer opens the way for the scalable, reliable, and effective creation of generative applications across a wide range of visual domains by offering high-quality compression and up to 12x quicker visual reconstruction.
The Cosmos tokenizer has been included into the 1X World Model Challenge dataset by the humanoid robot firm 1X.
The NVIDIA Cosmos tokenizer is being used by other humanoid and general-purpose robot developers, such as XPENG Robotics and Hillbot, to handle high-resolution photos and movies.
A pipeline for video processing is now included of NeMo Curator. This makes it possible for robot makers to handle vast amounts of text, picture, and video data and increase the accuracy of their world-model.
Due to its enormous volume, curating video data presents difficulties that call for effective orchestration for load balancing across GPUs and scalable pipelines. To increase throughput, filtering, captioning, and embedding models must also be optimized.
By automating pipeline orchestration and simplifying data curation, NeMo Curator overcomes these obstacles and drastically cuts down on processing time. Over 100 petabytes of data may be handled effectively with its capability for linear scalability over multi-node, multi-GPU systems. This speeds up time to market, lowers costs, and streamlines AI development.
Advancing the Robot Learning Community at CoRL
The NVIDIA robotics team published almost two dozen research papers with CoRL that discuss advances in temporal robot navigation, integrating vision language models for better environmental understanding and task execution, generating long-term planning solutions for large multistep activities and learning skills via human examples.
HOVER, a robot foundation model for managing humanoid robot locomotion and manipulation, and Skill Gen, a method based on synthetic data production for teaching robots with minimum human demonstrations, are two groundbreaking articles for humanoid robot control and synthetic data generation.
Availability
NVIDIA Isaac Lab 1.2 is open source and is accessible on GitHub. The NVIDIA Cosmos tokenizer is now accessible on Hugging Face and GitHub. At the end of the month, NeMo Curator for video processing will be accessible.
The upcoming NVIDIA Project GR00T procedures will make it easier for robot manufacturers to develop humanoid robot capabilities. Developer tutorials and instructions, including a migration route from Isaac Gym to Isaac Lab, are now available to researchers and developers learning how to utilize Isaac Lab.