In an era where large language models (LLMs) are becoming integral to mission-critical systems, from satellite operations to military decision support, the security of these systems has never been more paramount. A recent study by Kanchon Gharami, Hansaka Aluvihare, Shafika Showkat Moni, and Berker Peköz sheds light on a critical vulnerability in the deployment of LLMs through application programming interfaces (APIs). Their research, titled “Clone What You Can’t Steal: Black-Box LLM Replication via Logit Leakage and Distillation,” reveals how easily an adversary can replicate a black-box model by exploiting exposed logits, even under tight query constraints.
The study highlights a significant and often overlooked attack surface: APIs that lack robust access controls can expose full or top-k logits, which are the raw, unnormalized outputs of a model’s prediction layer. Prior research has primarily focused on reconstructing the output projection layer or distilling surface-level behaviors. However, the team’s work addresses a gap in the literature by introducing a constrained replication pipeline that transforms partial logit leakage into a functional deployable substitute model clone.
The researchers’ two-stage approach is both innovative and alarming in its efficiency. The first stage involves reconstructing the output projection matrix by collecting top-k logits from fewer than 10,000 black-box queries. This is achieved through singular value decomposition (SVD) over the logits. The second stage involves distilling the remaining architecture into compact student models with varying transformer depths, trained on an open-source dataset.
The results are striking. A 6-layer student model recreated 97.6% of the 6-layer teacher model’s hidden-state geometry, with only a 7.31% increase in perplexity and a 7.58 Negative Log-Likelihood (NLL). A 4-layer variant achieved 17.1% faster inference and 18.1% parameter reduction with comparable performance. The entire attack process completes in under 24 graphics processing unit (GPU) hours, making it a swift and cost-effective method for a determined adversary.
The implications for the defence and security sector are profound. As LLMs are increasingly deployed in sensitive areas such as command-and-control and cyber defense, the potential for model replication poses a serious threat. The study underscores the urgent need for hardened inference APIs and secure on-premise deployments to protect against such attacks.
The researchers’ work serves as a wake-up call for developers and policymakers alike. It highlights the importance of robust access controls and the need for continuous vigilance in safeguarding the integrity of mission-critical systems. As the defence landscape evolves, so too must the strategies to protect it from emerging threats.
In conclusion, the study by Gharami, Aluvihare, Moni, and Peköz is a critical contribution to the field of defence technology. It not only exposes a significant vulnerability but also provides a roadmap for mitigating the risks associated with LLM deployment. As the sector continues to innovate, the lessons from this research will be invaluable in shaping a more secure and resilient defence infrastructure. Read the original research paper here.

