Stats and Bytes

Stats and Bytes

🎩 Top 5 Security and AI Reads - Week #31

Counterfactual prompt injection detection, backdoored reasoning models, Blackwell GPU architecture deep dive, self-sabotaging AI defences, and autonomous research agent capabilities.

Aug 03, 2025
∙ Paid
Share

Welcome to the thirty-first installment of the Stats and Bytes Top 5 Security and AI Reads weekly newsletter. We're kicking off with a counterfactual approach to detecting "blind" prompt injection attacks against LLM evaluators, revealing how attackers can manipulate AI judges to accept any response regardless of correctness. Next, we examine a data poisoning technique that plants "overthinking" backdoors in reasoning models, paradoxically improving their accuracy while dramatically increasing their computational costs. We then briefly look at a comprehensive microbenchmark analysis of NVIDIA's Blackwell architecture, highlighting advances in ultra-low precision formats that signal the future of AI hardware. Following that, we explore a clever self-degradation defence mechanism that trains models to sabotage their own performance when subjected to malicious fine-tuning, effectively neutering bad actors' efforts. We wrap up with an assessment of AI scientists' current capabilities and …

Keep reading with a 7-day free trial

Subscribe to Stats and Bytes to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Josh Collyer
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture