What Are AI's 'World Models' and Their Significance

World models, often referred to as world simulators, are emerging as a significant force in the realm of artificial intelligence. These models take inspiration from the natural mental models that humans develop to understand the world around them. AI pioneer Fei-Fei Li's World Labs has secured a substantial $230 million to build "large world models," and DeepMind has recruited one of the creators of OpenAI's video generator, Sora, to work on "world simulators." (Sora was launched on Monday; here are some initial viewpoints.) But what exactly are these remarkable entities?

Unlock the Future with World Models in AI

Understanding the Essence of World Models

World models draw inspiration from the intuitive way humans form mental representations of the world. Our brains seamlessly transform abstract sensory inputs into a more concrete understanding. Take, for example, a baseball batter. In mere milliseconds, they must decide how to swing their bat, even before visual signals reach the brain. The reason they can hit a 100-mile-per-hour fastball is due to their innate ability to predict the ball's trajectory. As David Ha and Jürgen Schmidhuber note, for professional players, this all occurs subconsciously. Their muscles reflexively swing the bat at the precise time and location based on their internal models' predictions. This subconscious reasoning aspect is believed to be a prerequisite for human-level intelligence.Moreover, world models are not just theoretical concepts. They have shown promise in various applications. In the field of generative video, for instance, most AI-generated videos tend to enter the uncanny valley territory, where strange phenomena occur. However, a world model with a basic understanding of why objects behave as they do can better depict such actions. By training on a wide range of data including photos, audio, videos, and text, world models create internal representations of how the world functions and the ability to reason about the consequences of actions.

Applications and Promises of World Models

The applications of world models extend far beyond just better video generation. Meta chief AI scientist Yann LeCun believes that these models could one day be used for sophisticated forecasting and planning in both the digital and physical realms. In a recent talk, he described how a world model could help achieve a desired goal through reasoning. Given a base representation of a "world" (such as a video of a dirty room) and an objective (a clean room), the model could come up with a sequence of actions to achieve that objective, not by simply following observed patterns but by understanding the underlying principles.OpenAI also highlights the capabilities of world models. Sora, which they consider a world model, can simulate actions like a painter leaving brush strokes on a canvas. Models like Sora can effectively simulate video games as well. For example, Sora can render a Minecraft-like UI and game world. Future world models may even be able to generate 3D worlds on demand for gaming, virtual photography, and more. World Labs co-founder Justin Johnson emphasizes this potential, stating that we already have the ability to create virtual, interactive worlds but at a significant cost. World models will enable us to obtain not just images or clips but fully simulated, vibrant, and interactive 3D worlds.

Overcoming Technical Hurdles

Despite the allure of world models, numerous technical challenges need to be addressed. Training and running world models demand massive compute power, even compared to current generative models. While some of the latest language models can run on a modern smartphone, Sora (arguably an early world model) would require thousands of GPUs for training and operation, especially if they become widely used.World models, like all AI models, also suffer from hallucinations and internalize biases in their training data. For instance, a world model trained mainly on videos of sunny weather in European cities might struggle to accurately depict Korean cities in snowy conditions or might do it incorrectly. A general lack of training data exacerbates these issues, as Mashrabov points out. We have seen models being limited in representing people of certain types or races. Training data for a world model must be extensive enough to cover diverse scenarios while also being highly specific to enable a deep understanding of those scenarios.AI startup Runway's CEO, Cristóbal Valenzuela, also highlights data and engineering issues that prevent today's models from accurately capturing the behavior of a world's inhabitants. Models need to generate consistent maps of the environment and the ability to navigate and interact in those environments.However, if these major hurdles are overcome, Mashrabov believes that world models could "more robustly" bridge AI with the real world. This could lead to breakthroughs not only in virtual world generation but also in robotics and AI decision-making. They could spawn more capable robots as well. With an advanced world model, an AI could develop a personal understanding of any scenario it is placed in and start to reason out possible solutions.