Researchers from the University of Oxford, including Gabriel Mukobi, Hannah Erlebach, Niklas Lauffer, Lewis Hammond, Alan Chan, and Jesse Clifton, have introduced a novel framework for assessing the cooperative capabilities of artificial intelligence systems. Their work, titled “Welfare Diplomacy: Benchmarking Language Model Cooperation,” addresses a critical gap in the evaluation of multi-agent AI systems, particularly in scenarios that require a balance between competition and cooperation.
The team has developed a general-sum variant of the classic board game Diplomacy, which they call Welfare Diplomacy. Unlike traditional zero-sum or purely cooperative benchmarks, Welfare Diplomacy introduces a nuanced dynamic where players must strategically allocate resources between military conquest and domestic welfare. This approach provides a more realistic and complex environment for evaluating AI agents’ ability to cooperate and negotiate, which is essential for understanding their potential deployment in real-world scenarios.
The researchers have made several key contributions to the field. First, they proposed the rules for Welfare Diplomacy and implemented them via an open-source Diplomacy engine, making their work accessible to other researchers. Second, they constructed baseline agents using zero-shot prompted language models, which are AI systems that can perform tasks without prior training on specific datasets. Finally, they conducted experiments to evaluate these baseline agents, finding that while state-of-the-art models can achieve high levels of social welfare, they are still susceptible to exploitation.
The significance of this research lies in its potential to promote societal safety by aiding researchers in developing and assessing multi-agent AI systems. As AI technologies become more integrated into various sectors, including defence and security, it is crucial to understand how these systems will interact and cooperate. Welfare Diplomacy provides a valuable tool for exploring these dynamics, helping to ensure that AI systems are designed with cooperation and ethical considerations in mind.
The researchers have made their code available on GitHub, encouraging further collaboration and experimentation. This open-source approach is vital for advancing the field and ensuring that AI systems are developed responsibly and transparently. As the defence and security sectors increasingly rely on AI, frameworks like Welfare Diplomacy will be instrumental in shaping the future of AI cooperation and strategic decision-making. Read the original research arXiv here.

