Defence Innovators Tackle Bystander Privacy in Audio AI

In an era where audio large language models (LLMs) are becoming increasingly integrated into real-world applications, the issue of bystander privacy has emerged as a critical concern. Researchers Xiao Zhan, Guangzhi Sun, Jose Such, and Phil Woodland have introduced a groundbreaking benchmark and training framework to address this challenge. Their work focuses on the concept of “selective hearing,” which enables audio LLMs to focus on an intended main speaker while protecting the privacy of unintended bystanders.

The team developed SH-Bench, the first benchmark designed to evaluate the selective hearing capabilities of audio LLMs. SH-Bench includes 3,968 multi-speaker audio mixtures, encompassing both real-world and synthetic scenarios, paired with 77,000 multiple-choice questions. These questions are designed to test the models’ ability to operate in both general and selective modes. The benchmark introduces a novel metric called Selective Efficacy (SE), which measures both multi-speaker comprehension and bystander-privacy protection.

The researchers evaluated state-of-the-art open-source and proprietary LLMs using SH-Bench and discovered substantial privacy leakage related to bystanders. Despite strong audio understanding capabilities, these models often failed to protect bystander privacy effectively. This gap highlights the need for improved training methodologies that can teach models to focus on the main speaker while ignoring or refusing to process information about bystanders.

To address this issue, the team proposed Bystander Privacy Fine-Tuning (BPFT), a novel training pipeline designed to enhance models’ ability to refuse bystander-related queries without compromising main-speaker comprehension. The BPFT pipeline demonstrated significant improvements, achieving a 47% higher bystander accuracy under selective mode and a 16% higher SE compared to Gemini 2.5 Pro, the best-performing audio LLM without BPFT.

The introduction of SH-Bench and BPFT provides a systematic framework for measuring and improving bystander privacy in audio LLMs. This research is crucial for the defence and security sectors, where audio LLMs are increasingly deployed in sensitive environments. By ensuring that these models can selectively focus on intended speakers while protecting bystander privacy, the defence and security sectors can mitigate potential privacy risks and enhance operational integrity.

The implications of this research extend beyond defence and security. As audio LLMs become more prevalent in consumer applications, ensuring bystander privacy will be essential for maintaining public trust and compliance with privacy regulations. The SH-Bench benchmark and BPFT training pipeline offer valuable tools for developers and researchers to evaluate and improve the privacy protections of their models, paving the way for more responsible and ethical deployment of audio LLMs in various applications. Read the original research paper here.

Scroll to Top
×