The Outer Loop of Alignment
The Structural Preconditions of Alignment, Part 1: Corporate Orthogonality and The Outer-Outer-Alignment Problem
Upon his departure from OpenAI in March of 2024, Jan Leike expressed his discontent on X: the Superalignment team was routinely “struggling for compute,” finding alignment research “harder and harder” as “safety culture and processes [had] taken a backseat to shiny products.”
Nine years earlier, with a fresh draft of the company’s founding charter in hand, Sam Altman had committed his new company’s resources, influence, and reputation to building safe AI. He pledged the company would act “in the best interests of humanity.”
Prioritizing products over safety? “Not at OpenAI,” he’d say.
So what happened?
What no one fully appreciated at the time was the scaling required: that developing frontier AI models would eventually cost billions of dollars—magnitudes of capital that, under our current economic system, only flow to corporations promising considerable returns.
OpenAI soon realized: in order to develop AI for the benefit of humanity, they must become a for-profit corporation.
With this came, perhaps unintentionally but inevitably, the prioritization of products over safety. And following the departure of Leike along with several other senior researchers, the company dissolved the Superalignment team entirely. Researchers who had been hired to work on safety found themselves reassigned to other teams or pushed out.
OpenAI’s founders knew this would happen.
Their founding charter didn’t just promise safe AI; it specifically committed to the development of the technology “unconstrained by a need to generate financial return.” They foresaw the threat. They built a nonprofit to resist it. And when that structure gave way to a for-profit model, they thought they could withstand the pressure. But they couldn’t.
OpenAI shifted to a for-profit model to access the capital needed to develop AI designed for the benefit of humanity. In practice, this shift only made it harder for them to deliver on the goals they set out to achieve. Even with explicit awareness of the danger, even with institutional safeguards designed to resist it, OpenAI could not withstand the pressure of profit-maximization.
The structures underlying AI development are not neutral vessels.
OpenAI’s story is not one of corruption, hypocrisy, or individual failings. It is a story of structural pressures powerful enough to overwhelm good actors. We tend to assume that institutions are passive scaffolding through which individuals pursue objectives, but they are not. In the AI ecosystem, the principal structural element is the corporation, and it is not a benign, static layer on top of which research is conducted. The corporation operates as a goal-directed system optimizing for its own objective. It is orthogonal—it is a capable system that optimizes relentlessly for narrow objectives, whether or not they serve the values we hoped they would.
We need to recognize the nested element to the alignment problem: we are attempting to build aligned AI systems within institutions that themselves exhibit features of misaligned agents. Technical alignment researchers worry that using an imperfectly aligned agent to oversee the training of an AI system may compound alignment errors across successive generations.
We, however, fail to apply this same logic to the base model of the entire chain: the corporation. If the base model providing the initial oversight, resource allocation, and deployment decisions is itself misaligned, the entire chain is compromised from the start. Whether or not we can exit the inner loop of alignment, we remain trapped inside its outer loop.
OpenAI is just a single instance of a much older, deeper pattern.
The corporation has been around for over 400 years (with precursors as far back as Ancient Rome). For the majority of this history, the corporation was understood to serve a variety of social ends. It often sought to balance obligations across multiple stakeholders—workers, owners, communities, nations—to provide social value.
But the past half century or so has given rise to a new dominant theory of the firm: shareholder primacy. Popularized by thinkers like Milton Friedman and his Chicago school of economics, this theory holds that the corporation bears no social responsibility beyond serving the interests of its shareholders. In other words, “The Social Responsibility of Business Is to Increase Its Profits.”
Importantly, it wasn’t that a new proxy was chosen, but that the entire ontology of the firm shifted. The goal itself is profit-maximization. Objectives of the corporation are no longer mere proxies—shareholder value is the entire purpose.
Once profit became the purpose itself, it came to feel like a definitional feature of what corporations are. Over time, we’ve normalized a choice made only 50 years ago and now mistake it for an inevitability.
Had a different choice been made, social media might function as a tool to democratize communication, opioids as a means of managing pain, and fossil fuels as a bridge to cleaner energy. Instead, social media platforms reward compulsive use, pharmaceuticals drive addiction, and fossil fuel companies increase production. Corporations optimize for profit while externalities leak and become everyone else’s problem.
The issue here is not of individual moral failings, insufficient legal side constraints, or the malintentions of bad actors; it’s a systemic issue of structural misalignment. These are failures of systems architecture, of goal misspecification in powerful systems. Widespread flourishing and human wellbeing routinely fall outside the objective function of the modern corporation.
Is this how we want an intensely powerful institution in our society to be constructed?
Some might say yes: “Look at all of the value created over the last 50 years! Corporations have done a great many things for society!” While this is true, it tells us nothing about causation or the counterfactual. Was this value created because of shareholder primacy? Despite it? What value might have been produced under different structures? How might society look different? The harms, on the other hand, are not speculative: half a million opioid deaths, surging teen suicide rates, the destruction of our ecosystem.
The relevant question isn’t whether the current structure has produced value—that’s undeniable. The relevant question is whether this structure is the best we can come up with; whether this structure can facilitate the robust alignment research, safe products, and collaboration across firms and stakeholders that are required for AI that serves human flourishing.
I don’t think it can.
These same structures are building AI systems.
One might object that alternative corporate structures already exist. Public Benefit Corporations, for instance, are legally required to consider interests beyond shareholders. But while a PBC constrains how an entity can pursue its objectives, it does not fundamentally change what those objectives are. OpenAI is now a PBC. Do you think that’s enough? The structural pressures that marginalize safety research continue regardless of the legal wrapper. Changing the constraints is not the same as changing the objective function—and it’s certainly not the same as changing the competitive environment in which these systems operate.
Our focus in AI safety has been to align the technology with a vague conception of “human values.” The deeper issue is that when organizations with their own misaligned objectives undertake this work, our task of aligning AI becomes much harder. Misalignment doesn’t require a desire for harm. Quite the opposite, it’s most dangerous when it’s slightly off-kilter in ways that make it hard to diagnose. A small misspecification in the objective function compounds over the trajectory of development, and we’ve been living inside a compounding misspecification for fifty years.
We feel this. For the first time in reported history, young people reported lower life satisfaction than those older than them—a reversal of a decades-long pattern. Despite technological advances, constant GDP growth, and longer life expectancies, we’re less happy with our lives and more pessimistic about the future. It’s clear that whatever we’re optimizing for isn’t the right thing. The misspecification we made 50 years ago—making profit the sole purpose of the corporation—has compounded over time and will continue to do so unless we change it.
While this issue of misaligned optimization and corporate orthogonality is not distinctive to AI, its gravity is. The same harms these structures have subjected us to in the past will only compound faster and cut deeper.
This argument is in no way an indictment of AI companies or people within them. Intentions are largely pure, but their relative irrelevance is precisely the point. The story of corporate AI development is not one of villainy, but one of systems pursuing misspecified objectives.
The alignment community has spent a decade theorizing about AI systems that might exhibit dangerous properties—capability, coherence, goal-directedness, poor controllability. Meanwhile, systems that already exhibit these properties are the ones building AI systems.
The good news is that this is a contingent problem, not a natural inevitability.
Shareholder primacy is not a law of nature. It’s a contingent design choice. It’s a hypothesis formalized just fifty years ago but now so ubiquitous it feels inevitable.
But it’s not. Corporations existed for centuries before this, and the structure we have now is not the structure we’ve always had. It was chosen. Now, we can choose something else.
The fact that alignment research has progressed as much as it has is a testament to the individuals pushing against the systematic pressures—leaving behind big paychecks when safety is sidelined, fighting for compute when pushed to the side, publishing unpopular research for the benefit of the community.
But a field that depends on heroic resistance to its own institutional structure is not a field positioned to succeed. The question we need to answer is whether we can build structures worthy of the people trying to do this work. The window is open. AI is moving fast, but we’re not yet at a point of no return. We have time. Not infinite, but enough to act if we recognize the problems for what they are.
This is not a call for better corporate ethics or more responsible leadership. It is a call for structural reform, whether through new institutional models, alternative funding mechanisms, or different ownership structures. When the problem is architectural, the solution must be too.
![[re:alignment]](https://substackcdn.com/image/fetch/$s_!8Hbc!,w_40,h_40,c_fill,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9e9766b-b43e-4a00-881a-ec783053a81d_500x500.png)
![[re:alignment]](https://substackcdn.com/image/fetch/$s_!gP8S!,e_trim:10:white/e_trim:10:transparent/h_72,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F256bbc89-f15b-4b68-b762-34edc9d74614_1344x256.png)

![[re:alignment]'s avatar](https://substackcdn.com/image/fetch/$s_!oavk!,w_36,h_36,c_fill,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b9facf9-9e46-43e5-a864-c3f1a21a8ba8_380x380.png)