AI Pioneer Fei-Fei Li Unveils Real-Time Generative 'World Model' Capable of Rendering 3D Scenes from 2D Images

As of now, World Labs, founded by Fei-Fei Li, has raised about $228 million in funding from investors including Nvidia, a16z, AMD, Adobe, and the venture capital arm of Databricks, reaching a valuation as high as $1 billion.

Fei-Fei Li, Co-founder and CEO of World Labs (Image source: Bloomberg)

TMTPOST -- Fei-Fei Li, the Stanford University computer science professor often hailed as the “Godmother of AI,” has introduced a breakthrough generative model that could redefine how artificial intelligence understands and recreates the physical world.

Li’s startup, World Labs, announced the launch of its Real-Time Frame Model (RTFM) on Oct. 17 — a highly efficient autoregressive diffusion Transformer trained end-to-end on massive video datasets. The model’s key innovation lies in its ability to generate realistic 2D images from new viewpoints using only one or a few input images, without relying on traditional 3D representations.

Within the industry, RTFM is being described as “AI that has learned to render.” The system can simulate physical phenomena such as 3D geometry, reflections, and shadows, and can even reconstruct real-world environments from limited photo data.

According to Li, RTFM can generate persistent, 3D-consistent scenes in real time using a single NVIDIA H100 GPU, paving the way for interactive experiences in both real and imagined virtual spaces.

“Elegant, scalable approaches will ultimately prevail in AI,” Li’s team wrote in an accompanying article. “Generative world models are ideally positioned to benefit from the exponential decline in computing costs that has driven technological progress for decades.”

In response, former Google senior engineer Rui Diao noted that RTFM’s latest breakthrough effectively resolves the long-standing scalability challenges that have hindered world models.

Spatial intelligence refers to the ability of humans or machines to perceive, understand, and interact within three-dimensional space. The concept was first introduced by American psychologist Howard Gardner in his theory of multiple intelligences, describing the brain’s capacity to form a mental model of the external spatial world and manipulate it.

Spatial intelligence enables individuals to think in three dimensions, perceive both external and internal imagery, and recreate, transform, or modify these images. This allows people to navigate environments with ease, manipulate objects at will, and generate or interpret graphical information.

Broadly, spatial intelligence encompasses not only spatial orientation but also visual discrimination and visual reasoning. For machines, it refers to the ability to process visual data in three-dimensional space, make accurate predictions, and act upon them. This allows AI systems to operate and make decisions in complex 3D environments, overcoming the limitations of traditional 2D perception.

Fei-Fei Li has noted that visual capability sparked the Cambrian explosion, and that the evolution of the nervous system gave rise to intelligence. “We want AI that can act, not just see and speak,” she emphasizes.

With the rise of a new generation of generative AI, the combination of spatial intelligence and world models has emerged as a key pathway toward artificial general intelligence (AGI). Advanced world models can reconstruct, generate, and simulate persistent, interactive, and physically accurate environments in real time, poised to transform industries ranging from software to robotics.

Li and her team consider spatial intelligence and world models essential tools for overcoming AI’s technical barriers. Compared with existing technologies, they aim to maintain world model performance while reducing GPU resource requirements and enabling real-time interactions more efficiently.

Under current video architectures, generating a 60-frame-per-second 4K interactive stream would require over 100,000 tokens per second—roughly equivalent to the length of Frankenstein or the first Harry Potter book. Sustaining this for an hour would demand processing more than 100 million contextual tokens, a level neither feasible nor economically viable with today’s infrastructure.

To address this, in March 2025, Li, alongside scholars Ben Mildenhall, Justin Johnson, and Christoph Lassner, founded World Labs and developed RTFM, which delivers three core advantages: efficiency, scalability, and persistence.

Efficiency is demonstrated by the fact that a single NVIDIA H100 GPU can support interactive, frame-rate inference. Scalability is achieved through its end-to-end architecture, which can be continuously optimized as data and computational power grow. Persistence is ensured through pose-aware frame-space memory and context scheduling, allowing world scenes to “never fade away,” enabling long-term, consistent interactions in simulated environments.

In September, World Labs announced it had raised $230 million in funding, led by a16z, NEA, and Radical Ventures. The round also saw participation from the venture arms of AMD, Adobe, Databricks, Shinrai Investments LLC, and NVIDIA Ventures, headed by CEO Jensen Huang.

The company employs around 24 people, including four co-founders, among them Fei-Fei Li, with roughly one-third of the team of Chinese descent. Public reports indicate that World Labs reached a valuation of $1 billion just three months after its founding.

Looking ahead, investors say Fei-Fei Li’s team will first develop a spatial intelligence large model, LWM, designed to deeply understand three-dimensional, physical, spatial, and temporal concepts. The model is expected to support augmented reality applications, before being applied to robotics, improving autonomous vehicles, automated factories, and humanoid robots.

Li has stated that the team aims to launch its first product as early as 2025, while acknowledging that many challenges remain, from business models to technical boundaries. “We are still at the very beginning,” she said, “but we believe our team will overcome these challenges.”

In parallel, Li is also developing the Behavior visual challenge competition, intended to replicate the success of ImageNet, which helped catalyze the deep learning revolution and the broader AI boom. For this reason, Li is widely regarded as a driving force in “enabling AI to truly understand the world.”

The inspiration for Behavior arose from three major challenges in robot learning: the lack of standardized tasks, which makes comparing research difficult; the absence of a unified task framework, with many tasks being short and limited in scope; and a shortage of training data.

This October, Li officially released Behavior 1K, also known as the Behavior 1000 Challenge. It is a comprehensive simulation benchmark and training environment for embodied intelligence and robotics research, including 1,000 long-horizon tasks set in everyday household environments—real-world tasks requiring multiple steps to complete. Behavior provides an open-source training and evaluation platform, allowing researchers worldwide to train algorithms and compare results under consistent standards.

“What excites me even more is that we are at a civilizational turning point: language, spatial, visual, embodied intelligence, and other AI technologies are converging and beginning to truly transform human society,” Li said. “As long as we always keep human-centeredness at heart, these technologies can become a force for good for humanity.”

Li’s team indicated that World Labs will continue to enhance its model’s dynamic scene simulation and user interaction capabilities, and that larger-scale models are expected to deliver even stronger performance in the future.

本文系作者 zhangxinyue 授权钛媒体发表,并经钛媒体编辑,转载请注明出处、作者和本文链接
本内容来源于钛媒体钛度号,文章内容仅供参考、交流、学习,不构成投资建议。
想和千万钛媒体用户分享你的新奇观点和发现,点击这里投稿 。创业或融资寻求报道,点击这里

敬原创,有钛度,得赞赏

赞赏支持
发表评论
0 / 300

根据《网络安全法》实名制要求,请绑定手机号后发表评论

登录后输入评论内容

快报

更多

2026-03-29 22:59

以色列一工业区遭袭,危险物质泄漏

2026-03-29 22:22

海南自由贸易港民营企业座谈会举行

2026-03-29 22:12

中国科学院院士:全固态电池或至少再等5年

2026-03-29 22:11

今年以来逾1370万人次旅客访港

2026-03-29 22:08

下周(3月30日-4月5日)市场大事预告

2026-03-29 22:06

民调显示美国选民对特朗普“不满意率”创新高

2026-03-29 21:43

纳比勒·法赫米将担任新一任阿盟秘书长

2026-03-29 21:07

群核科技通过港交所聆讯:2025年实现盈利,冲刺“全球空间智能第一股”

2026-03-29 21:05

油价飙升埃及多措施节能,埃及上调公共交通票价

2026-03-29 21:05

英矽智能与礼来达成AI药物研发合作,交易总值最高可达27.5亿美元

2026-03-29 21:04

3月29日新闻联播速览20条

2026-03-29 21:01

中国银河证券:中长期仍看好科技板块产业驱动与周期板块涨价线索的双主线

2026-03-29 20:34

大风、降水来袭,长江江苏段部分区域实施临时交通管制

2026-03-29 19:59

国航C919正式投入北京—厦门、北京—哈尔滨两条航线运营

2026-03-29 19:57

中铝国际:2025年归母净利润2.58亿元,同比增长16.47%

2026-03-29 19:19

伊朗称已打击与美军工有关联的两家企业

2026-03-29 18:36

伊朗与巴基斯坦两国外长通电话,讨论地区局势

2026-03-29 18:35

2025年玩具(不含潮玩)国内市场零售总额达1035.3亿元

2026-03-29 18:08

全国猪价跌破5元,创历史新低

2026-03-29 18:07

时代天使2025年实现收入3.7亿美元,同比增长37.8%

扫描下载App