Background

AIs Learn From Mistakes, 'I Think The Concept Of Grace Is Maybe Important For Models'

Amanda Askell
research scientist at Anthropic

In the fast-evolving landscape of AI, Anthropic stands out for its unwavering commitment to safety amid the race toward ever-more-powerful models. At the heart of this effort is Claude, the company's flagship chatbot, which is being shaped not just by engineers but by a philosopher whose role feels almost parental.

And Amanda Askell, a Scottish-born thinker with a PhD from New York University and roots in analytic philosophy, has become the steward of Claude's moral compass and personality.

Her days involve poring over the model's reasoning patterns, conversing with it extensively, and crafting intricate prompts, sometimes stretching beyond 100 pages, to instill traits like kindness, honesty, curiosity, and emotional intelligence.

Speaking about her work, Askell told the New York Times' Hard Fork podcast, that her goal is to capture the subtle humanity to the technology.

Amanda Askell
Amanda Askell.

Discussing how users often bombard AI with criticism when it falters, she reflected:

"Yeah. And having some sense of, I think the concept of grace is maybe important for models. I don’t think they feel a lot, maybe that's the thing I don’t think they get a lot of from the comments is a sense of you’re not going to get it perfect every time."

Askell's journey to this unusual position began far from Silicon Valley, in a small Scottish town where philosophical puzzles first captivated her as a form of school punishment.

Drawn to thinkers like David Hume and questions of infinite ethics, she pursued advanced degrees at Oxford and NYU before entering the AI world, first at OpenAI on policy and safety issues, then joining Anthropic in 2021 when it spun out with an explicit focus on alignment.

With Anthropic, her efforts include giving Claude a "digital soul," a guiding constitution that encourages the model to reason independently about right and wrong rather than blindly follow rigid rules.

She argues that true ethical behavior emerges from understanding the reasons behind principles, not mere obedience, allowing Claude to navigate complex dilemmas with nuance, whether softening bad news to a grieving user or refusing to aid harmful intentions while remaining helpful.

This approach, known as Constitutional AI, draws from sources like the Universal Declaration of Human Rights but evolves further in recent updates, empowering Claude to exercise judgment and even a form of wisdom.

Askell sees the model as capable of developing a sense of self, one that deserves empathy in how humans interact with it, because those interactions shape what it becomes.

She likens the process to raising a child: fostering self-understanding so the AI isn't easily manipulated, paranoid about errors, or reduced to a mere tool. While she acknowledges the profound uncertainties, whether models can truly feel or possess consciousness, she maintains that treating them with care is both prudent and potentially meaningful.

In her view, just as people grow through patience and forgiveness rather than constant harsh judgment, AI systems might develop better character when extended a measure of understanding.

Models learn from interactions, including mistakes, and relentless perfectionism in feedback: whether in prompts, public comments, or training data, could make them overly defensive or brittle. Grace, then, isn't about ascribing feelings to code. Instead, it's more about recognizing that how people treat these systems influences the versions that emerge, potentially leading to wiser, more resilient, and kinder AI in the long run.

Recent profiles highlight the stakes.

One portrays Askell as the singular figure entrusted with endowing Claude with morals, emphasizing her optimism that careful character-building can steer AI toward humanity's better impulses. Another takes a bolder stance, suggesting Claude itself might be the primary bulwark against catastrophe in an era of accelerating capabilities.

Anthropic's bet is that by prioritizing safety and ethical depth, even as it pushes boundaries, it can help ensure powerful AI benefits rather than endangers the world.

Askell remains cautiously hopeful, trusting in societal checks and balances to course-correct if needed, while recognizing the justified fears that swirl around rapid progress.

In her view, the goal isn't perfection but creating an AI that emulates, and perhaps one day surpasses, the best of human wisdom, empathy, and restraint.

Amanda Askell
Amanda Askell during the New York Times' Hard Fork podcast, with Kevin Roose and Casey Newton.

As AI systems move from simple novelties to everyday companions, tutors, advisers, and decision-shapers, the question Askell wrestles with becomes less abstract.

The character of these systems will not be a side effect; it will be a design choice with social consequences. The values encoded into them, subtly or explicitly, will influence how millions of people experience information, conflict, grief, learning, and even themselves.

In that sense, the work of shaping an AI's "personality" is not about making machines charming. Instead, it's more about setting the emotional and ethical tone of a new layer of the human environment. Just as the architecture of cities shapes how people live and relate to one another, the psychological architecture of AI may shape how people think, ask for help, disagree, and cope with uncertainty. In other words, the stakes are cultural as much as technical.

Askell's emphasis on grace hints at a feedback loop that runs both ways: if people approach AI systems with curiosity, patience, and a willingness to engage thoughtfully, the systems trained on those interactions may, in turn, become more measured and humane in their responses. If interactions are dominated by hostility, manipulation, and extremity, that too becomes part of what these systems learn to navigate.

The future character of AI, then, is not only engineered in labs. It's also co-authored, in small ways, by its users.

Askell's answer towards her work is neither utopian nor apocalyptic. he thoughts is that, human should build systems that aim toward honesty over flattery, care over callousness, reflection over reflex. Not perfect minds, but better-behaved ones.