Stats and Bytes

Stats and Bytes

Share this post

Stats and Bytes
Stats and Bytes
🎩 Top 5 Security and AI Reads - Week #11
Copy link
Facebook
Email
Notes
More

🎩 Top 5 Security and AI Reads - Week #11

Repeated token vulnerabilities, LLM finetuning API attack vectors, effective VLM adversarial techniques, autonomous adversarial mitigation exploitation, and transformer robustness token defence

Mar 16, 2025
∙ Paid

Share this post

Stats and Bytes
Stats and Bytes
🎩 Top 5 Security and AI Reads - Week #11
Copy link
Facebook
Email
Notes
More
1
Share

Welcome to the eleventh installment of the Stats and Bytes Top 5 Security and AI Reads weekly newsletter. We're kicking off with a deep dive into the fascinating "Repeated Token Phenomenon" in large language models, where researchers use mechanistic interpretability to identify and mitigate model vulnerabilities. Next, we'll talk about some of the main problems with protecting LLM finetuning APIs, like how bad users might be able to use proxy outputs to make backdoored models. Next, we examine a surprisingly effective attack that achieves over a 90% success rate against even the strongest black-box VLMs by cleverly transferring semantic concepts between images. We'll also look at AutoAdvExBench, a new benchmark for evaluating AI agents' ability to defeat adversarial defences, and conclude with promising research on "Robustness Tokens" that enhance transformer models' resilience against adversarial attacks.

A psychedelic carnival of neon-colored language models juggling repeating tokens…

Keep reading with a 7-day free trial

Subscribe to Stats and Bytes to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Josh Collyer
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More