At Cannes, during WAICF, Yann LeCun does not comment on the latest news in generative artificial intelligence. He unfolds an architecture. A scientific trajectory built over several years that he presents as the only credible path toward human-level intelligence. The discourse is calm, methodical, almost pedagogical. The conclusion, however, is unambiguous: large language models do not constitute the right direction if the goal is to build systems capable of understanding and acting in the real world.

At the center of the presentation lies AMI—Autonomous Machine Intelligence—and, beyond that, the industrial strategy of AMI Lab.

The starting point is a fundamental distinction between generating and understanding. Autoregressive models, whether textual, visual, or multimodal, produce sequences. They predict the next word, the next pixel, the next value. They achieve a remarkable level of statistical coherence, but this coherence does not constitute an understanding of physical dynamics. Predicting a sequence is not modeling a system.

This limitation becomes apparent as soon as one leaves clean, discrete data to enter the real world, made of continuous, noisy, partially observable signals. Yann LeCun takes the example of a modern aircraft engine, equipped with over a thousand sensors. Temperature, pressure, vibration, flow. In such an environment, simple time series prediction—in other words, a generative approach—does not allow for optimizing overall operation. It does not enable planning interventions to maximize reliability, reduce CO₂ emissions, or improve performance. What is needed, he explains, is a global phenomenological model capable of anticipating the system’s dynamics as a whole.

AMI Lab’s initial target is precisely there: complex industrial systems. Power plants, steel mills, chemical plants, aircraft engines, where data flows are massive, continuous, and noisy, and where value comes from the ability to plan under physical constraints. The conversational agent market is already saturated; the market for predictive systems in the real world is not. The ambition is progressive: accumulate experience by applying these architectures to increasingly diverse data modalities, before ultimately reaching robotics. The market for intelligent robots is not yet mature, but it will be. AMI Lab wants to be ready when this shift occurs.

The scientific foundation of this strategy rests on JEPA—Joint Embedding Predictive Architecture. The principle consists of no longer reconstructing data in its raw form, but predicting the abstract representation of a missing part from its context. The work is done in the latent space, where relevant invariants are found, and not in the space of pixels or numerical values, saturated with noise and meaningless variations.

With Image-JEPA, self-supervised learning through masking enables robust representations with reduced computational cost. The logic is then extended to video with V-JEPA. It is no longer about predicting the next image, but the representation of the future. This nuance is essential: it allows focusing on the structural dynamics of a scene and ignoring unpredictable details.

It is in this framework that what Yann LeCun calls a form of “intuitive physics” emerges. Models learn to anticipate the plausible evolution of a situation. When the observed sequence violates this coherence—an impossible trajectory, a physically inconsistent interaction—the gap between the predicted representation and the actual observation increases. This is not symbolic reasoning, but dynamic coherence acquired through learning.

V-JEPA 2 then introduces large-scale learning from natural videos and, crucially, action conditioning. Understanding world dynamics is not enough; one must learn how actions modify it. The goal is no longer to generate realistic images, but to build an internal model capable of anticipating the consequences of a decision.

The LA-WM—the Latent Action World Model—takes an additional step. In real environments, not all actions are observed. The model must infer latent variables representing underlying dynamics. The encoder from V-JEPA is frozen; above it, a model learns transitions. Information is deliberately regularized—noise addition, sparsification, quantization—to avoid statistical shortcuts. The goal is to build a stable and usable representation for planning.

This question of representation stability leads to SIGReg, an isotropic Gaussian regularization method designed to preserve the informational richness of the latent space. Self-supervised learning often suffers from embedding collapse; imposing an isotropic distribution maintains representation diversity while stabilizing training.

The whole forms a coherent architecture oriented toward planning. In this framework, reinforcement learning is no longer the central method. It only intervenes when planning fails, to adjust the world model or the critic. The logic is that of model-based predictive control—an approach directly derived from systems engineering.

The distinction between System 1 and System 2 is mentioned in the discussion. System 1 corresponds to fast, predictive intuition, built from an internal world model. System 2—deliberative reasoning—can only emerge on this foundation. Intelligence is therefore not primarily linguistic; it is sensorimotor and predictive.

The final slide synthesizes the position in a particularly explicit manner.

  • Abandon generative models in favor of joint embedding architectures.
  • Abandon probabilistic models in favor of energy-based models. Abandon contrastive methods in favor of regularized methods.
  • Abandon reinforcement learning as the central paradigm in favor of planning based on an internal model.

And this conclusion, deliberately provocative: if the goal is human-level intelligence, one should not work on large language models.

In an ecosystem dominated by generative AI, this presentation charts a radically different trajectory. Where current competition focuses on increasing model size and conversational fluency, Yann LeCun emphasizes building world models capable of anticipating, planning, and acting.

The stakes are not only scientific. They are industrial. Optimizing complex physical systems today. Building the brains of robots tomorrow.

An intelligence that does not merely produce plausible text, but knows how to intervene in the real world.