Stats and Bytes

Stats and Bytes

Share this post

Stats and Bytes
Stats and Bytes
🎩 Top 5 Security and AI Reads - Week #12
Copy link
Facebook
Email
Notes
More

🎩 Top 5 Security and AI Reads - Week #12

Algebraic explainability attacks, benchmark contamination mitigations, LLM evaluation inconsistencies, efficient model inversion, and targeted image protection

Mar 23, 2025
∙ Paid

Share this post

Stats and Bytes
Stats and Bytes
🎩 Top 5 Security and AI Reads - Week #12
Copy link
Facebook
Email
Notes
More
1
Share

Welcome to the twelfth installment of the Stats and Bytes Top 5 Security and AI Reads weekly newsletter. We're kicking off with an exploration of algebraic adversarial attacks on explainability models, where researchers have reframed the problem from constrained optimisation to an algebraic approach targeting model interpretability. Next, we examine a study questioning current LLM benchmark contamination mitigation strategies, revealing none perform statistically better than no mitigations at all. We then dive into the inconsistencies of LLM evaluation in multiple-choice questions, comparing different answer extraction methods and their systematic errors. Following that, we look at an efficient black-box model inversion attack that achieves impressive results with just 5% of the queries needed by current SOTA approaches. We conclude with TarPro, an innovative method for targeted protection against malicious image editing that prevents NSFW modifications while allowing normal edits to …

Keep reading with a 7-day free trial

Subscribe to Stats and Bytes to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Josh Collyer
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More