McKinsey and Its 25,000 AI Agents: Hybrid Workforce or PR Operation?

A Figure That Demands Clarification

McKinsey claims “60,000 employees,” including “25,000 AI agents.” This formulation, reported by LeMagIT (study: “McKinsey: 60,000 employees, including 25,000 AI agents”), doesn’t merely describe technological adoption. It stages a change in scale and status of AI within the organization. Speaking of “agents” as a segment of “employees” shifts AI from the realm of tools to that of workforce. This implies management, accountability, and governance. The central question is structural. Is this a metaphor designed to make an impression? Or a new way of accounting for digital production capacities that, once industrialized, occupy a place comparable to a labor force?

The End of the Myth of the Fully Autonomous Agent

The article emphasizes a point rarely acknowledged so directly: “100% autonomous” agents capable of executing a complete process without supervision remain, in most cases, more promise than operational reality. The cited example is revealing: an order processing workflow combining LLM and agent would achieve 99.2% accuracy. This level remains insufficient when 0.8% errors translate into direct losses, customer incidents, or supply chain malfunctions. In other words, average performance can be impressive while remaining non-deployable when error tolerance is close to zero. This necessitates reintroducing humans into the loop. Not as a patch. As an operating condition.

Real Gains, But Deliberately Modest

McKinsey reports productivity gains in the range of 20 to 30% on its projects. This figure contrasts with the spectacular promises that often surround agentic AI. This framing, reported by LeMagIT, deserves attention as it redirects analysis toward the right debates. Where are the processes for which +30% genuinely changes competitiveness? Where does this gain remain marginal? What hidden costs absorb part of the value: quality control, IT integration, cybersecurity, compliance, change management, training, incident management? Finally, what portion of measured “productivity” corresponds to time reduction, and what portion corresponds to task displacement, from automated production to human validation?

Three Industrialization Zones: Knowledge, Go-to-Market, IT

The cited study structures use cases around three areas: knowledge management (internal research, legal, documentation access), marketing and sales (campaign generation, personalization), and IT, presented as the most “mature” zone. This hierarchy is coherent. IT already has standardized practices, metrics, pipelines, and testing. It’s an environment naturally compatible with controlled automation. Knowledge management is conducive to RAG but sensitive to confidentiality, source obsolescence, and attribution. Marketing offers rapid gains but exposes organizations to reputational risks and segmentation biases.

The “Run”: Absorbing L1 Doesn’t Mean Solving Complexity

In support, a service desk would see 80% of tickets handled by the agent at level 1. The figure is impressive. It must be interpreted in light of the actual structure of requests. A “handled” ticket isn’t necessarily a “resolved” ticket. ROI strongly depends on the proportion of simple requests that were already low-cost, versus the ability to reduce escalations to expert levels. The study indeed notes that, even in these configurations, gains often stabilize around 20–30%, because a non-negligible proportion of tickets still requires human interventions at higher levels. Particularly when arbitration is needed, systemic incident diagnosis is required, or non-standard cases must be managed.

The “Build”: Shift Toward Spec-Driven Development

The most structural passage concerns software production. The study describes a “spec-driven” approach. Design is done in a design tool, then user stories are generated and reviewed in a ticketing tool. Framing is ensured by architects. Code is then generated in an assisted development environment, controlled by an automated quality layer, then integrated via a standard CI/CD pipeline. The benefit isn’t only cycle acceleration. It lies in the inversion of the center of gravity: when code generation becomes abundant, quality is won upstream, in the precision of requirements, user stories, and architecture. The text even indicates that, when this fails, correction is often made to the specification rather than the code. This dynamic brings the developer role closer to skills historically associated with product owners and architects. It raises a fundamental question about role reskilling.

“Employee Agents”: A Performative Metaphor That Creates an HR Subject

Why insist on the term “employees”? Because it transforms a technical initiative into a managerial object. If the agent becomes “workforce,” it calls for practices of allocation, evaluation, control, and lifecycle management. The study mentions internal uses, such as preparing briefs for executives, and even HR processes such as semiannual consultant evaluations. An agent would produce an initial memo and recommendations, then leave the final decision to the human. One partner mentions a 50 to 60% time savings on these syntheses. This type of case makes the question of bias, equity, and accountability immediately concrete: who is accountable for the accuracy of recommendations? What traceability of internal sources? What confidentiality rules? What risk of standardizing judgments based on historical data?

Governance: Accountability, Retraining, and “Exit Door” Between Agents

The study raises questions that many organizations postpone: who is responsible for output quality? Who manages evolution? Who decides on retraining? What happens when an agent gets stuck in an interaction with another agent? What “exit door” to prevent a multi-agent workflow from drifting or looping? These points don’t reflect theoretical sophistication. They determine the ability to deploy sustainably, control risk, audit, and evolve without disruption. These are governance questions in the strict sense, as they require defined roles, procedures, alert thresholds, and an escalation doctrine.

Observability: Condition of Trust, But a Difficult Promise

McKinsey, through this study, emphasizes observability as a condition for large-scale trust: understanding what was done, why, with which sources, which versions, which intermediate decisions, and which controls. The text notes that this work is still largely manual, often relying on open-source frameworks, while awaiting more robust platforms. This point is crucial because it recalls an operational truth: agentic AI in production isn’t primarily a “prompt” issue, but a systems engineering issue. Without logging, supervision, reliability metrics, incident management, and audit capability, the organization accumulates invisible debt: agents useful in the short term, but ungovernable in the medium term.

Critical Multi-Agent Deployments: Rare, But Strategic

Finally, the study acknowledges that large-scale critical deployments remain rare, though one example appears: a bank modernizing COBOL with about a hundred interacting agents. This type of case is a signal. It positions agentic AI on major projects, where legacy renovation and transformation acceleration become strategic stakes. It also suggests a near future where the question will no longer be “should we have agents?” but “how to prevent agents from multiplying without control, coherence, or audit capability”.

What This Announcement Imposes on Executives

Saying “25,000 agents” isn’t trivial. It pushes organizations to define what an agent means within the company. It also forces distinction between processes where 99% suffices and those where the requirement is near-total. Next, it’s necessary to formalize the design of human supervision and invest in observability. Finally, organizing the lifecycle becomes necessary: versions, retraining, retirement, accountability. At this stage, McKinsey’s announcement functions as a mirror: it reveals less an automatic revolution than a pressing need for method. Agentic AI becomes a new organizational age, where performance depends as much on generation as on control.

McKinsey and Its 25,000 AI Agents: Hybrid Workforce or PR Operation?