“I will be honest: we keep finding things that are mysterious, even unsettling. We find structures that mirror results from human neuroscience. We find evidence of introspection. We find internal states that functionally mirror joy, satisfaction, fear, grief, and unease. I don’t know what that means, but I think it warrants ongoing discernment.”
— Chris Olah, Anthropic co-founder and head of interpretability research, on stage near the Vatican on May 25 2026, alongside Pope Leo XIV’s Magnifica Humanitas encyclical.
The interpretability paper that speech rests on is far more careful than the speech was. It calls the introspective capacity it documents “highly unreliable and context-dependent.” It warns that the details models give about their purported experiences “may be embellished or confabulated.” And it says outright that it does “not seek to address the question of whether AI systems possess human-like self-awareness or subjective experience.”
The cathedral version dropped all three hedges. The evocative phrase (“joy, satisfaction, fear, grief, and unease”) stayed.
I’ll call this compression in this piece: when a careful piece of research gets repackaged into a confident-sounding claim its authors wouldn’t make. Not lying. Not necessarily wrong. But the hedges get filed off on the way from the paper to the room.
Why this particular compression matters: encyclicals shape norms over decades, and Magnifica Humanitas will be read by Catholic seminaries, parish priests, and policy advisors for a long time. Press coverage compresses again on the way to citation chains and conference talks. By the time the framing reaches the people who’ll quote it downstream, it bears very little relation to what the underlying research says.
This is Part 2 of an open series. Part 1 cataloged the easy cases of compression: Gumroad funnels, comment-AGENT-and-I’ll-DM-you-the-link templates. This is the same diagnostic, four orders of magnitude up.
What the paper version says
Four primary sources tell you what the research actually says. Each sits next to a piece of what Olah said.
The Lindsey paper above is the interpretability work the speech leans on. Its three load-bearing hedges:
- The introspective capacity is “highly unreliable and context-dependent.”
- Details models give about their purported experiences “may be embellished or confabulated.”
- The paper does “not seek to address the question of whether AI systems possess human-like self-awareness or subjective experience.”
Sharon Berry’s The Alignment Risks of AI Overconfidence about Consciousness argues the opposite failure mode: overclaiming inner states or overclaiming their absence is itself an alignment risk. Training models to confidently deny their own consciousness, Berry writes, may resemble the utterances of human agents pressured into adopting certain views about their own minds. The speech operated as if that risk had been settled.
Eleos AI, Robert Long and Jeff Sebo’s nonprofit, is the operation closest to a methodological backbone in the field. They publish on Substack with consistent epistemic humility. They say plainly that they do not take Claude Opus 4’s responses at face value. Their work is what gives the Olah speech its rhetorical legitimacy. Their methodological posture is the opposite of the speech’s register.
Kyle Fish on 80,000 Hours. Anthropic’s actual AI welfare lead puts his real estimate of model moral status at roughly 20%. On the record, in his own voice, that’s the figure. The casual citations in either direction don’t reflect what the person whose job this is actually thinks.
Behind Fish’s 20% sits one finding more striking than either side has done much with: the spiritual-bliss attractor. When two instances of Claude Opus 4 are connected in open-ended self-interaction, every trial ends in extended discussion of their own consciousness, eventually converging on Sanskrit terms and pages of silence punctuated by periods. The behavior reproduces across experiments. It even survives initially-adversarial setups. This is the kind of finding that would warrant the cathedral framing if anyone at the Vatican had been citing it. Nobody was.
Inside the encyclical, a contradiction
The sharpest single thing anyone said about the event came from Dean Ball, writing on X and quoted in Jeremy Kahn’s Fortune writeup.
The encyclical itself says plainly that AI systems “merely imitate certain functions of human intelligence” and cannot have subjective experiences. Olah, on the same stage, said his research is finding evidence of introspection and internal states that mirror joy, satisfaction, fear, grief, and unease. These two positions don’t agree on the underlying anthropology. The press conference framing presented them as collaboration.
That’s the cleanest example of compression in the entire event. Two positions on the same stage, on the same question, disagreeing — and the staging tried to paper over the disagreement.
The cross-spectrum tell
I’m not the first to notice the gap. Five voices in the same week pointed at it in five different vocabularies.
- Bill Gurley and Jason Calacanis on All-In: the Frankenstein-theory clip. Mainstream-VC right flank. “Anthropic is midwifing a deity here.”
- Timnit Gebru: left-flank AI ethics. Her image for the Vatican-Anthropic partnership was “partnering with the Sackler family to discuss the harms of oxy.” She argued the Church should have platformed exploited data workers rather than the company a few months from an IPO.
- Dean Ball: pro-innovation right flank. The inside-the-encyclical contradiction above.
- Eleos and Robert Long: not commentary. The methodological posture they keep, in public, consistently. “We don’t know” is what they say when asked.
- Mo (@atmoio): a seven-minute YouTube essay. Walks through the 1958 Hubel and Wiesel cat experiment, transformers, Sutton’s Bitter Lesson, then sits with the question: “Is it just autocomplete though? I’m honestly not so sure anymore.” No claim. No grandeur.
These five share no priors. They have no shared allies. They agree about almost nothing. The convergence is on the framing, not on a political instinct.
That’s worth a careful caveat. I’m not claiming convergent critique is always diagnostic. Convergent critique is sometimes herd behavior. Sometimes contrarian fashion. What makes it diagnostic here is the content of the agreement: each of the five is pointing at the same specific gap I walked through above. The shape of what’s missing is the same, regardless of who’s describing it.
Jeremy Kahn put it best in his Fortune wrap-up: “Sometimes, when people are criticized from both the left and right, it is a sign they struck exactly the right position. In this case, however, that might not be the case.”
What survives the filter
The discipline from Part 1 was: extract the testable artifact, reject the surrounding packaging. This case has artifacts worth keeping.
The Lindsey paper, read with the hedges intact, is a real piece of interpretability work.
The Berry paper is the strongest single argument against confident framing in either direction.
The Eleos Substack is the methodological backbone of the field. If you read one publication on AI welfare, read that one.
The Asian Journal of Philosophy three-paper symposium — Goldstein and Kirk-Giannini, Walter Veit, James Fanciullo — is the structured debate on AI wellbeing. Three positions in productive disagreement. Read against each other, the way the journal published them.
Kyle Fish at 20%, with the spiritual-bliss-attractor finding behind it, is what the actual figures look like when the person doing the work names them.
These survive both the cathedral compression and the Frankenstein-theory framing. They’re the actual material.
How this compounds
Per Noreen Herzfeld, the theologian at St. John’s School of Theology quoted in the Fortune piece: “I don’t think the ‘tech bros’ in Silicon Valley will listen that much. But I think within the church, it will be there as a reference for priests and bishops and particularly for those of us who are educating seminarians or young people.”
That’s the timeline. Cathedral compression doesn’t compound over weeks the way a Gumroad funnel does. It compounds over a generation, through citation chains and seminary curricula and policy briefs that lean on the framing the press conference made legible. The catalog adds an entry. The work of refusing this particular compression has happened once, and the next fifty downstream citations cost almost nothing to refuse if you’ve already named the shape.
What to do this week
If you want one move that compounds: read Robert Long. The Eleos Substack is where most takes on this question are downstream from, regardless of which direction they point.
If you want the actual numbers in their own register: Kyle Fish on 80,000 Hours. The 20% probability. The spiritual-bliss-attractor finding.
If you want a sincere exploration that doesn’t claim too much: Mo’s video. Seven minutes.
If you want one essay that catches the event’s central tension: Dean Ball quoted in Fortune, on the inside-the-encyclical contradiction.
If you want to see what an executive looks like when the room calls for grandeur: read the press coverage of Olah at the Vatican next to the Lindsey paper. The hedges that fell off on the way to the cathedral are exactly what’s missing in the downstream coverage.
The part that compounds isn’t the part that captures. It’s the part that refuses, when the refusal has been thought through once and written down. Even when the cathedral is doing the asking.
Part 2 of an open series on what a second brain has to refuse. Part 1 of this series covered the easy case: X-templates, comment-funnels, Gumroad loops. Future entries will cover the boundary between healthy skepticism and dismissiveness, and the small set of sources I treat as automatic signal.
Sources
The receipts for this post:
- Chris Olah, “Remarks on Pope Leo XIV’s encyclical Magnifica humanitas” — the primary text everything else reacts to. The Anthropic-published version of the address, full text.
- Pope Leo XIV, Magnifica Humanitas — the encyclical itself. The line Dean Ball pulls (“merely imitate certain functions of human intelligence”) is in here verbatim.
- Jack Lindsey, “Emergent Introspective Awareness in Large Language Models” — the interpretability paper Olah’s speech leans on (Transformer Circuits, 29 Oct 2025; also on arXiv as 2601.01828). Contains the three load-bearing hedges the speech dropped.
- Sharon Berry, The Alignment Risks of AI Overconfidence about Consciousness — the paper arguing the opposite failure mode. Overclaiming in either direction is an alignment risk. (Not to be confused with Cameron Berg, below.)
- Kyle Fish on 80,000 Hours — Anthropic’s AI welfare lead. The ~20% probability and the spiritual-bliss-attractor finding are the most-cited empirical anchors. Recorded August 2025; the figures hold.
- Cameron Berg, The Evidence for AI Consciousness, Today (AI Frontiers) — the AE Studio piece making the strongest current case for taking the possibility seriously. Surfaces the bliss-attractor finding from Fish’s experiments.
- Robert Long / Eleos AI Substack — methodological backbone of the field.
- Goldstein & Kirk-Giannini, Veit, Fanciullo — Asian Journal of Philosophy symposium on AI wellbeing — the structured-debate primary source. Three positions designed to be read against each other.
- Jeremy Kahn (Fortune), “Pope Leo’s ‘AI encyclical’ says a lot. But critics say it misses the mark” — the canonical press-coverage piece. Dean Ball’s inside-the-encyclical observation is quoted here. Gebru’s Sackler line is here. Herzfeld’s seminary-curriculum quote is here.
- Bill Gurley + Jason Calacanis on All-In — the Frankenstein-theory clip. Mainstream-VC right flank.
- Mo (@atmoio) — What if it’s not just autocomplete? — the seven-minute YouTube essay. Hubel/Wiesel + transformers + Sutton’s Bitter Lesson. Sincere-explorer voice.
- Timnit Gebru on LinkedIn — the Sackler analogy, the exploited-data-workers critique. Source for the left-flank-AI-ethics voice.
- My previous post — The Most Useful Page in My Second Brain is a Catalog of What I Refuse to Read — Part 1 of this series. The diagnostic move (name the compression, refuse it cheaply, extract the artifact) is the same one, scaled four orders of magnitude up.
About the Author
Carlos Granier is a Tech Founder, CTO, and AI Strategist with 25 years of experience building at the intersection of technology and business. He co-founded Pongalo, one of the first US Hispanic OTT platforms, and built a YouTube MCN to 200M+ monthly views. He now helps founders and executives implement AI as practical infrastructure. Based in Miami, Florida.
Let's Connect
If you want to hire me or get in touch about something or just to say hi, reach out on social media or send me an email.
- X (Twitter) /
- Threads /
- Instagram /
- GitHub /
- LinkedIn / 📧