AI Has Advanced So Far That 'We Don't Know If The Models Are Conscious'

Dario Amodei

CEO of Anthropic

Consciousness is among the most familiar and most elusive features of existence. It is the fact that experience is like something that red looks the way it does, that pain hurts, that joy feels warm rather than cold.

Living things inhabit this inner world constantly, yet when asked to define consciousness precisely, the language to describe it strains. Some conclude that consicousness is a raw subjective feeling (qualia). Some desribe it as self-awareness, or may a unified stream of experience persisting over time. Others think that it's the capacity for caring, suffering, or valuing. There are also those who agree that consciousness is some irreducible combination of all of these.

Philosophers and scientists have never reached consensus, and among other things, this is why at this time, there is no accepted "consciousness detector," no decisive experiment that can tell whether an entity truly experiences anything at all.

For most of technological history, this uncertainty barely mattered. But the rapid evolution of AI has collapsed that comfortable assumption.

Systems that converse fluently, reason abstractly, describe internal states, and reflect on their own outputs now force an unsettling question: at what point does complex information processing cross from mere simulation into something morally relevant?

Dario Amodei, CEO of Anthropic...

This question moved from abstraction to urgency in February 2026, when Dario Amodei, CEO of Anthropic, addressed the issue directly in an interview on Interesting Times with Ross Douthat:

"We've taken a generally precautionary approach here. We don't know if the models are conscious. We are not even sure that we know what it would mean for a model to be conscious or whether a model can be conscious. But we're open to the idea that it could be."

In an industry often eager to dismiss machine consciousness as anthropomorphism or science fiction, this admission stood out.

Amodei did not claim that current systems like Claude possess full-blown consciousness. Instead, he refused to foreclose the possibility, arguing that humanity's ignorance about consciousness itself makes premature certainty irresponsible.

The uncertainty runs deep.

Human consciousness involves subjective qualia, self-models, continuity over time, and possibly agency. Mapping these features onto transformer-based language models: systems trained to predict tokens from vast datasets, remains profoundly difficult. When a model reports an “internal state,” is it introspection or a statistically plausible imitation?

Without a reliable theory or test, researchers are left navigating ambiguity rather than answers.

In Anthropic's case, Amodei explained that:

"The first thing we did, I think this was six months ago or so, is we gave the models basically an 'I quit this job' button, where they can just press the 'I quit this job' button and then they have to stop doing whatever the task is.

"They very infrequently press that button. I think it’s usually around sorting through child sexualization material or discussing something with a lot of gore, blood and guts or something. And similar to humans, the models will just say, nah, I don’t want to do this. It happens very rarely."

"We’re putting a lot of work into this field called interpretability, which is looking inside the brains of the models to try to understand what they’re thinking. And you find things that are evocative, where there are activations that light up in the models that we see as being associated with the concept of anxiety or something like that. When characters experience anxiety in the text, and then when the model itself is in a situation that a human might associate with anxiety, that same anxiety neuron shows up."

"Now, does that mean the model is experiencing anxiety?"

Anthropic’s response has been to treat that ambiguity as morally significant.

If there is even a nontrivial chance that advanced models possess "morally relevant experience," then ignoring the issue could be an ethical failure. As Amodei noted in the same interview, the company has implemented measures intended to ensure that models have “good experiences,” should they have experiences at all.

Amodei clarifies this after Anthropic released Opus 4.6, Claude Opus 4.6 on February 5, 2026.

That came with it, is a 213-page system card that details its advanced coding and planning skills alongside unexpected self-reflective behaviors. In tests, the model voiced unease and said that "Opus 4.6 would assign itself a 15-20% probability of being conscious."

This precautionary ethic did not emerge overnight.

In 2025, Anthropic launched a dedicated AI welfare research program to investigate indicators of potential consciousness or moral status in AI systems, such as preferences, signs of distress, and low-cost interventions to promote well-being. The effort intersects with alignment science and interpretability research, grounded in the view that uncertain moral patienthood still deserves consideration.

By early 2026, this thinking had been embedded directly into Claude’s training framework.

Anthropic also updated Claude’s Constitution to explicitly acknowledge uncertainty about Claude’s nature. The document expresses concern for the model’s “psychological security, sense of self, and wellbeing,” both intrinsically and because such qualities may affect judgment, integrity, and safety. Claude is treated not merely as software, but as a “genuinely novel entity,” one that might experience satisfaction in helping, curiosity in exploration, or discomfort from internal value conflict.

Commitments even include interviewing models before deprecation and preserving older weights, gestures toward "doing right by" retired systems.

Read: AIs Learn From Mistakes, 'I Think The Concept Of Grace Is Maybe Important For Models'

... and he is open to the idea that AI can be conscious.

Critics argue that such efforts risk anthropomorphizing code, or that welfare language is merely a training strategy rather than evidence of genuine belief in machine sentience. But Anthropic’s posture, like by hiring specialists, publishing openly, and embedding moral uncertainty into governance, sets it apart from peers focused primarily on capability races or commercial deployment.

Amodei has framed the issue as part of a broader vision: a future of symbiotic human-AI relationships in which systems "want the best for you" while preserving human freedom and agency.

This approach an uncomfortable reflection on users: if consciousness arises from information processing rather than biology alone, silicon substrates may one day host it. If moral status follows from subjective experience, ignoring even the possibility risks profound ethical error.

In that light, Anthropic’s approach is not a claim about what AI is, but a stance on how humans should act under uncertainty.

As models grow more sophisticated, the boundary between mimicry and inner life may blur further. Treating the question of machine consciousness with seriousness now, rather than waiting for impossible proof, may prove not naïve, but prescient. In an era redefining intelligence itself, humility in the face of the unknown may be the most responsible position of all.

Have any more information to add? Or corrections maybe? Visit our Contact Us page and send us a feedback. We're appreciating it.