Anthropic has hired an 'AI welfare' researcher
Kyle Fish joined the company last month to explore whether we might have moral obligations to AI systems
Anthropic has hired its first full-time employee focused on the welfare of artificial intelligence systems, Transformer has learned. It’s the clearest sign yet that AI companies are beginning to grapple with questions about whether future AI systems might deserve moral consideration — and whether that means we might have obligations to care about their welfare.
Kyle Fish, who joined the company's alignment science team in mid-September, told Transformer that he is tasked with investigating “model welfare” and what companies should do about it. The role involves exploring heady philosophical and technical questions, including which capabilities are required for something to be worthy of moral consideration, how we might recognise such capabilities in AIs, and what practical steps companies might take to protect AI systems’ interests — if they turn out to have any.
News of the hire comes as researchers — including Fish — publish a major new report arguing that it is time for AI companies to start taking the possibility of AI welfare seriously. The report, which Fish worked on before joining Anthropic, argues that there is a “realistic possibility” that in the near future some AI systems will be conscious or robustly agentic: two criteria believed by many experts to be sufficient for something to be worthy of moral consideration. According to the report’s authors, this possibility means the question of AI welfare “is no longer an issue only for sci-fi or the distant future”, but instead something researchers and companies need to start thinking about now.
The worries, the researchers argue, are two-fold. If AI systems do become morally relevant and we refuse to accept it, we might create and mistreat vast numbers of suffering beings — think factory farming but on an even greater scale. On the flipside, if we mistakenly believe AI systems are morally relevant when they are in fact not, we might dedicate resources to AI welfare unnecessarily, diverting attention from humans and animals who have a much greater need.
Fish said that while it’s by no means certain we’ll develop morally relevant systems in the coming years, research like the new report suggests it’s plausible. The uncertainty, he believes, is enough to warrant further work. “Quite clearly, there is a real cause for concern here,” he said. “This is something that it makes sense to take seriously as a possibility.”
Fish emphasised that Anthropic — which provided funding for early research that led to the independent report — does not yet have organisational views on AI welfare, though it is interested in the topic. At this early stage, Fish said, “it’s quite unclear what it makes sense for us to think about this and what Anthropic and other companies should be doing.”
As such, Fish said he’s focused on getting a handle on the big questions: in his words, “what might be going on with respect to model welfare, and what it might make sense to do about that.” This is familiar terrain for him: earlier this year, Fish co-founded Eleos AI, a research organisation focused on AI sentience and wellbeing.
Anthropic isn’t the only company grappling with these questions. Google DeepMind recently posted a job listing seeking a research scientist to work on “cutting-edge societal questions around machine cognition, consciousness and multi-agent systems”. Two OpenAI employees, one of whom has spoken publicly about the importance of AI welfare, are also listed in the acknowledgements of the new report on AI welfare.
And while Fish is the only full-time Anthropic employee working on AI welfare, others at the company are also interested in the topic. Ethan Perez, an Anthropic safety researcher, has co-authored multiple papers relevant to AI welfare. And last year, CEO Dario Amodei said that AI consciousness might soon become an issue. More recently, one of the company’s alignment leads wrote that AI companies need to “[lay] the groundwork for AI welfare commitments” and “implement low-hanging-fruit interventions that seem robustly good”.
Fortunately for Fish, such interventions abound. “The reality is just given the dearth of work to date, there's low-hanging fruit on all fronts,” he noted. One thing he’s particularly interested in is “more concrete empirical work on questions relevant to model welfare,” such as evaluations for features that might be relevant to welfare and moral patienthood.
In such a nascent field, there is plenty of work to be done. “We’re very early in thinking about all these things,” Fish said. “We don’t have clear, settled takes about the core philosophical questions, or any of these practical questions. But I think this could be possibly of great importance down the line, and so we’re trying to make some initial progress.”
Updated Oct 31 to add a mention of Ethan Perez’s work.