In a groundbreaking study, researchers have uncovered a critical vulnerability in large language models (LLMs) that challenges existing assumptions about data poisoning attacks. The research, conducted by a team of experts from various institutions, demonstrates that poisoning attacks on LLMs require a near-constant number of malicious documents, regardless of the dataset size. This finding has significant implications for the security and safety of advanced AI systems.
Traditionally, studies on pretraining poisoning have assumed that adversaries need to control a certain percentage of the training corpus to compromise LLMs. However, for large models, even small percentages can translate into impractically large amounts of data. The researchers conducted the largest pretraining poisoning experiments to date, training models with parameters ranging from 600 million to 13 billion on datasets containing 6 billion to 260 billion tokens. Their experiments revealed that just 250 poisoned documents could compromise models across all sizes, even when the largest models were trained on more than 20 times the amount of clean data.
The study also explored various factors that could influence the success of poisoning attacks, including broader ratios of poisoned to clean data and non-random distributions of poisoned samples. Despite these variations, the researchers consistently found that the number of poisoned documents required to compromise the models remained relatively constant. This suggests that the scale of the model and the size of the training dataset do not significantly impact the number of poisoned documents needed to execute a successful attack.
Furthermore, the researchers demonstrated that similar dynamics apply to poisoning during fine-tuning. This phase is crucial for adapting pre-trained models to specific tasks, and the findings indicate that fine-tuning is also susceptible to poisoning attacks with a near-constant number of malicious documents.
The implications of this research are profound. It suggests that injecting backdoors through data poisoning may be easier for large models than previously believed. As the number of poisoned documents required does not scale up with model size, the vulnerability becomes a pressing concern for the development and deployment of future AI systems. The study underscores the urgent need for more research on defences to mitigate this risk.
The findings highlight the importance of developing robust security measures to protect LLMs from data poisoning attacks. As AI systems become increasingly integral to various sectors, ensuring their safety and reliability is paramount. The research team’s work serves as a critical step toward understanding and addressing the vulnerabilities in large language models, paving the way for more secure and resilient AI technologies. Read the original research paper here.

