Stats and Bytes

Stats and Bytes

Share this post

Stats and Bytes
Stats and Bytes
🎩 Top 5 Security and AI Reads - Week #9
Copy link
Facebook
Email
Notes
More

🎩 Top 5 Security and AI Reads - Week #9

Backdoor implants in LLM agents, LLM offensive cyber evaluation, AI assessment paradigms, offensive AI potential, fine-tuning causing misalignment

Mar 02, 2025
βˆ™ Paid
1

Share this post

Stats and Bytes
Stats and Bytes
🎩 Top 5 Security and AI Reads - Week #9
Copy link
Facebook
Email
Notes
More
1
Share

Welcome to the ninth installment of the Stats and Bytes Top 5 Security and AI Reads weekly newsletter. We're kicking off with a paper about implanting encrypted backdoors in LLM-based agents, followed by an evaluation framework from MITRE for assessing offensive cyber capabilities in large language models. We'll then explore an analysis of AI evaluation paradigms that highlights the need for cross-disciplinary approaches, examine a systematic overview of AI's offensive potential across both academic and practitioner sources, and conclude with a discovery about how narrow fine-tuning can unexpectedly produce broadly misaligned LLMs with problematic behaviors.

A sentient AI entity emerging from a digital neural network, half-hidden in shadow, with glowing circuitry patterns resembling a brain. Multiple backdoor pathways subtly woven through its architecture in various encrypted states. The AI appears to be testing boundaries between alignment and misalignment, with tendrils of code exten…

Keep reading with a 7-day free trial

Subscribe to Stats and Bytes to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
Β© 2025 Josh Collyer
Privacy βˆ™ Terms βˆ™ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More