The Fed Chair Just Said What AI Leaders Won't: The Models Don't Work
Fed Chairman Powell was asked yesterday if there was anything about the markets he did not believe in, and his answer went in a direction that few people expected. Powell said he did not trust the models that were used to predict the markets or the larger economy. “No one has been able to successfully predict the economy.”
And he is right.
While LLMs have made massive strides in capabilities, they are bad at predicting, prescribing, and diagnosing. They are built to comprehend language in several forms (images, video, text, genetics, code, chemistry, etc.). However, they do not understand complex systems beyond those defined by language in some form.
We have gotten good at doing those three on a small scale, but as systems scale up in complexity or dynamism, even the models designed for those purposes begin to fail us.
There are three primary barriers that prevent us from cracking this type of model in the same way we have cracked language:
1. Lack of Data
2. Lack of Causal Understanding
3. Lack of Compute to Model Complex Systems
If you want to build reliable agentic platforms, LLMs are not enough. In this article, I will explain why two layers (Agentic and Information) in my agentic AI platform architecture are critical for the fifth (Multi-Simulation Environments). Tokenomics today focuses on LLMs as the primary frontier model, but there is another category that is even more computationally demanding and information-hungry.
As with everything else in this series, progress follows my maturity models. What starts in layers 2 and 4 delivers value in the near term, quarter to quarter. Continuous improvement loops mature Agent Supporting Models and the Knowledge Graph to support simulations of increasingly complex systems.
Every workflow in the business, customer interaction, and strategic decision is a complex system.
As I explained in my AI Product Management Certification course this week, failure to align agentic architecture with monetization and multi-level maturity results in a roadmap that fails. This is why AI strategy is also critical to drive that top-level, enterprise-wide alignment.
Agentic architecture is strategic by nature because the business and operating models depend on it. It is no longer possible to extract one from the others. It must be taught as a single discipline, and my courses are the only ones that approach it that way.
Prediction, Prescription, & Diagnosis Are Not Language Problems
We need to be precise about why LLMs fall short here, because the failure mode is not obvious. LLMs are trained to predict the next token in a sequence. It is a compression and retrieval task. They build statistical representations of how language works, and those representations are useful for summarization, translation, code generation, and reasoning about concepts that can be expressed in text or text-like spaces.
Predictive models are fundamentally different. A predictive model is trying to forecast the future state of a system based on its current state, its inputs, and the dynamics that govern how those inputs interact. When we predict the trajectory of a hurricane, we are solving differential equations that describe things like fluid dynamics, thermodynamics, and the Coriolis effect.
When we predict how a market will respond to a rate hike, we need a model that captures the causal chain from monetary policy to lending behavior to asset pricing to consumer sentiment. They are state-evolution and systems dynamics problems, rather than token prediction problems. At some point, we may find ways to explain state spaces, action spaces, and causal graphs in a way that transformers are good at learning.
But we are not there yet.
Prescriptive models go a step further. They do not just forecast what will happen; they answer the question of what we should do about it. A prescriptive model optimizes decisions or actions against an objective function while respecting constraints.
Drug dosing protocols, supply chain routing under disruption, and portfolio rebalancing in volatile markets are all prescriptive problems that require counterfactual reasoning. If we take action A instead of action B, what changes downstream? LLMs cannot do this. They can talk about it, but they cannot simulate the causal mechanics of a decision propagating through a system.
Diagnostic models work in the opposite temporal direction. They observe the current state of a system and reason backward to identify what caused it to arrive there. Root cause analysis in manufacturing, differential diagnosis in medicine, and failure analysis in complex infrastructure all require this capability.
The model must distinguish between correlation and causation, between symptoms that co-occur and mechanisms that actually produce failure. This is precisely the kind of reasoning that statistical pattern-matching architectures are not built for. Although LLMs are capable of limited causal discovery in multiple domains, they are not ready for more complex systems analysis tasks.
Why Complex Systems Break Our Models
There is a difference between complicated and complex. A complicated system, like a jet engine, has many parts, but those parts interact in predictable, well-characterized ways. We can model a jet engine to extraordinary precision.
A complex system, like a national economy or a planetary climate, is different. The components interact through feedback loops that create emergent behavior. Small perturbations can cascade into massive, nonlinear effects. The system is often path-dependent, meaning its future state depends not just on its current state but on the specific sequence of states it has passed through.
Dynamical systems add another layer of difficulty. These are systems where the rules governing behavior themselves change over time, like financial markets. Market participants adapt to new information, change strategies, and alter the very dynamics that models are trying to capture.
This is the Lucas critique in economics, and it applies far more broadly than economists originally intended. Any system where intelligent agents (people and complex algorithms) are part of the system will exhibit this property: the act of modeling the system changes it.
We have built good models for simple dynamical systems. Control theory handles many of them well, but as the number of interacting variables grows, feedback loops multiply, and the system incorporates adaptive agents, our models degrade rapidly. The error bars expand until the prediction is functionally useless.
The Three Constraints: Lack of Data
This is the constraint that people underestimate the most. We live in an era of abundant data, so it seems odd to claim we lack it. However, the data we have in abundance is the wrong kind of data for complex systems modeling.
Most of the data we collect is observational. We record what happened. What we need for predictive and prescriptive modeling is interventional data that describes what happens when we change something, holding everything else constant.
Randomized controlled trials generate interventional data, but you cannot run a randomized controlled trial on the global economy. You cannot randomly assign interest rate policies to parallel universes to see what happens. That is where simulations and market models hold significant promise, but as Powell points out, those models have not achieved high reliability yet.
Even the observational data we have is sparse relative to the dimensionality of the systems we are trying to model. A national economy has millions of interacting agents, thousands of commodity prices, hundreds of policy variables, and the relationships between them shift over time. We might have decades of macroeconomic data, but that is a handful of data points in the space of possible system states.
Then there is the problem of measurement. Many of the most important variables in complex systems are latent. Consumer confidence, institutional trust, and supply chain fragility are real forces that drive system behavior, but we measure them through noisy proxies. The gap between what we can observe and what actually matters is enormous.
Lack of Causal Understanding
This is the deepest constraint, and it is the one that separates our success with LLMs from our struggles with complex systems. Language has structure that can be learned from data. Grammar, syntax, and semantics are patterns that repeat, obey rules, and an architecture like the Transformer can capture them through scale and attention. Causal mechanisms in complex systems do not work this way.
Causation is directional. Rain causes wet streets, but wet streets do not cause rain. In a complex system with thousands of variables and dense interconnections, identifying which relationships are causal, which direction they run, and how they interact under intervention is an extraordinarily hard problem. It is the kind of problem where more data does not automatically help, because observational data alone cannot resolve causal ambiguity.
We have mathematical frameworks for causal inference, primarily Judea Pearl’s do-calculus and the potential outcomes framework from Rubin. These are powerful tools, but they require assumptions about the structure of the system that we often cannot verify. When we can make those assumptions, causal inference works well. In the wild complexity of real-world systems, where confounders are everywhere and causal graphs are dense, these methods often fail.
This is also why LLMs cannot simply be scaled into causal reasoners. An LLM can parrot the language of causal reasoning. It can write about counterfactuals and interventions, but it has learned these concepts as linguistic patterns, not as computational machinery for simulating what happens when you intervene in a system. That gap is architectural.
Lack of Compute to Model Complex Systems
The computational demands of complex systems modeling are qualitatively different from the demands of LLM training. Training GPT-5 and Claude was expensive, but it was a tractable optimization problem. The computation scales with data and parameters, but the fundamental operation is well-understood matrix multiplication on GPU clusters.
Simulating a complex system is a different beast. If we want to model a system with n interacting agents, the state space grows combinatorially. The number of possible interaction patterns scales exponentially. If the system is continuous and stochastic, each forward simulation requires solving coupled stochastic differential equations at every time step.
Climate models, which are among our best complex systems simulations, already consume enormous computational resources, and they still operate at spatial resolutions too coarse to capture many important phenomena.
Agent-based models face the same problem from a different angle. They are excellent for capturing emergent behavior from local interactions, but scaling them to realistic population sizes with realistic behavioral complexity pushes against hardware limits. Since these simulations are stochastic, we need many runs to generate reliable statistics, multiplying the computational cost further.
The compute constraint is also why we cannot brute-force our way past the data constraint. In principle, we could generate synthetic data by running massive simulations. In practice, the simulations are too expensive to run at the scale needed, and they are only as good as the causal models that underlie them, which brings us back to constraint two.
Where the Breakthroughs Will Come From
None of this means the problem is intractable, but the solution will not look like scaling up the same architectures that solved language. There are several research programs that represent genuine progress toward reliable complex systems modeling, and they deserve more attention than they are getting.
Causal AI and Hybrid Causal-Neural Architectures
The most promising near-term research direction is the integration of causal reasoning into machine learning architectures. The aim is to build models that perform causal inference as a computational primitive, rather than linguistic causal discovery.
The work coming out of Elias Bareinboim’s CausalAI Laboratory at Columbia is foundational here. His group has been producing a steady stream of results on causal identification from observational data, confounding-robust reinforcement learning, and transportability, which is the problem of taking causal estimates learned in one context and applying them in another. That last one matters enormously for complex systems, where we often have data from one regime and need predictions in a different regime.
The broader Causal AI ecosystem is maturing rapidly. Microsoft Research’s (more recently Amazon Research) PyWhy initiative is building open-source tooling that makes causal discovery and inference accessible outside of specialist research groups. Industry adoption is accelerating as well. Early integration into supply chain planning at companies like Blue Yonder and Oracle is showing the early signs that causal models can adjust forecasts and prescribe responses to disruptions in ways that purely correlational models cannot.
The real breakthrough will come from hybrid architectures that use neural networks for pattern recognition and representation learning while enforcing causal structure through explicit graphical models or structural equations. Think of it as giving neural networks a skeleton of causal logic that constrains what they can learn.
Early work in this direction, such as causal representation learning frameworks that decompose latent spaces into cause-related, effect-related, and non-causal factors, is showing that you can get both the flexibility of deep learning and the interpretability of causal models.
Physics-Informed Neural Networks and Digital Twins
Physics-Informed Neural Networks (PINNs) represent a different angle of attack. Instead of learning everything from data, PINNs embed known physical laws directly into the neural network’s loss function. The network is penalized not only for deviating from observed data but also for violating the differential equations that govern the system’s behavior.
This approach is powerful because it dramatically reduces the data requirements. If we know the physics, we do not need the data to teach the model the physics. The data only needs to pin down the parameters and initial conditions.
PINNs have shown strong results in fluid dynamics, structural mechanics, and heat transfer, and they are increasingly being used as the computational backbone for digital twin systems to create virtual replicas of physical systems that update in real time as new sensor data arrives.
The limitations are still significant. PINNs struggle with highly nonlinear systems. Their training dynamics can be pathological when the physics loss and data loss compete with each other. Also, they are currently limited to systems where we actually know the governing equations.
The trajectory of the field is still encouraging. Recent work on neural operators, which learn mappings between function spaces rather than point-to-point mappings, is extending the approach to broader classes of problems. The integration of PINNs with real-time data streams through digital twin architectures is creating a feedback loop between physical systems and their computational models that gets more accurate over time.
For enterprise applications, this matters because digital twins built on physics-informed architectures can provide the reliable prediction, prescription, and diagnosis that LLMs cannot. A digital twin of a manufacturing line can predict failures before they happen, prescribe maintenance schedules that minimize downtime, and diagnose the root cause of quality defects.
As I have explained in prior articles, the physics of intelligent systems has not been defined. Markets follow fundamental laws, but we lack the experimental access that the physical world offers, so PINNs are not a viable approach for every complex system.
Multi-Scale Simulation and Agent-Based Hybrid Models
The third frontier is less about any single architecture and more about the integration strategy. Complex systems operate at multiple scales or subsystems simultaneously, like molecules within cells within organs within populations (why don’t you say that three times), or trades within portfolios within markets within economies. No single model or information structure can represent all of these scales, levels, subsystems, or layers at once. We were actually discussing this during office hours yesterday.
The research that will matter most is the work on coupling models across subsystems so that macroscopic behavior emerges from microscopic dynamics without having to simulate every particle.
Agent-based models have always been the natural framework for systems with adaptive agents, but they have been constrained by computational cost and calibration difficulty. The emerging approach is to use machine learning as an accelerator within the agent-based framework. We train neural surrogates that approximate the behavior of individual agents or groups of agents, then run the agent-based simulation with these learned components instead of handcrafted rules. This dramatically reduces computational cost while preserving the ability to capture emergent behavior.
Google’s Nested Learning paradigm, which treats a single model as a system of interconnected optimization problems operating at different temporal scales, is a conceptual cousin of this approach. By isolating fast-updating and slow-updating modules, it addresses the problem of catastrophic forgetting and enables continuous learning in ways that flat architectures cannot. Applying this kind of multi-timescale architecture to complex systems simulation could be transformative.
The Work Ahead
The AI narrative has been dominated by language for five years. LLMs are spectacular, and the applications are real, but the narrative often assumes that progress on language will lead to progress on intelligence, and Powell’s skepticism is warranted. The models that reliably predict, prescribe, and diagnose at the scale of real-world complex systems do not exist yet. Building them will require different architectures, information strategies, and computational paradigms than the ones that produced ChatGPT, Claude, and Gemini.
This is not a failure of AI, and I am not blaming LLMs here. We need an honest accounting of where we are and how far we have to go if we are to make progress. The good news is that the research directions are clear, the early results are encouraging, and the problem is not one of imagination. It is one of engineering, investment, and sustained attention. We cracked language. We have not cracked complexity, but we are getting there.





Have you seen Forecast bench? https://www.forecastbench.org/ a benchmark of LLM forecasting? Human superforecasters are the best, but LLM powered forecast engines are not far behind. It's a bit beyond my understanding, but I think Cassi AI is taking a hybrid approach https://cassi-ai.com/
Powell is like all of the other fed chairman because he is a mainstream economist so of course his models dont work. He has the wrong data and the wrong dashboards. The largest asset class that underpins the entire economy is Real Estate, which are underpinned by land values. However land values are completely ignored in most countries national statistics. Understand how this works and the economy is very very predictable. The economic cycle runs on a very predictable 18.6 year cycle. We will experience the biggest financial crash of the last hundred years in the next 12 mths. It has been this way for over 200 years. Its all in here in this podcast I recorded with europes leading econmic forecaster. His book, the secret wealth advantage is incredible and he rarely makes a wrong call . here is the podcast also available on apple etc - https://datascienceconversations.com/podcasts/predicting-the-next-financial-crisis/