π© Top 5 Security and AI Reads - Week #24
LLM judge robustness evaluation, N-gram jailbreak threat modeling, embedding sequence obfuscation, offensive security ethics, and data reconstruction attack systematization.
Welcome to the twenty-fourth instalment of the Stats and Bytes Top 5 Security and AI Reads weekly newsletter. We're kicking off with a comprehensive assessment of LLM-as-a-judge systems, revealing vulnerabilities where heuristic attacks succeed nearly 100% of the time across models and defences. Next, we explore an innovative N-gram perplexity threat model that offers a fresh perspective on measuring jailbreak effectiveness while uncovering that safety tuning is more effective than previously reported. We then examine the fascinating "Stained Glass Transform" approach to obfuscating LLM embedding sequences, providing a novel method for protecting intellectual property in client-server interactions. Following that, we dive into crucial ethical considerations for offensive security research with LLMs, highlighting the need for consistent reasoning around tool publication and responsible disclosure. We conclude with a systematic overview of data reconstruction attacks against machine leaβ¦
Keep reading with a 7-day free trial
Subscribe to Stats and Bytes to keep reading this post and get 7 days of free access to the full post archives.