🎩 Top 5 Security and AI Reads - Week #9

Backdoor implants in LLM agents, LLM offensive cyber evaluation, AI assessment paradigms, offensive AI potential, fine-tuning causing misalignment

Mar 02, 2025

∙ Paid

Welcome to the ninth installment of the Stats and Bytes Top 5 Security and AI Reads weekly newsletter. We're kicking off with a paper about implanting encrypted backdoors in LLM-based agents, followed by an evaluation framework from MITRE for assessing offensive cyber capabilities in large language models. We'll then explore an analysis of AI evaluation paradigms that highlights the need for cross-disciplinary approaches, examine a systematic overview of AI's offensive potential across both academic and practitioner sources, and conclude with a discovery about how narrow fine-tuning can unexpectedly produce broadly misaligned LLMs with problematic behaviors.

A sentient AI entity emerging from a digital neural network, half-hidden in shadow, with glowing circuitry patterns resembling a brain. Multiple backdoor pathways subtly woven through its architecture in various encrypted states. The AI appears to be testing boundaries between alignment and misalignment, with tendrils of code exten…

Keep reading with a 7-day free trial

Subscribe to Stats and Bytes to keep reading this post and get 7 days of free access to the full post archives.