🎩 Top 5 Security and AI Reads - Week #15
Model stealing optimization, hardware-locked ML models, LLM robot jailbreaks, black-box attack attribution, and diffusion-based steganography.
Welcome to the fifteenth instalment of the Stats and Bytes Top 5 Security and AI Reads weekly newsletter. This week's papers are all drawn from the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), which I recently attended. We're kicking off with a fascinating study on model stealing attacks that reveals counterintuitive factors affecting their success, showing that attackers benefit from targeting high-performing models and matching architectural choices. Next, we explore a novel approach to "locking" machine learning models into specific hardware through cryptographic transformations, providing both hard and soft protection mechanisms against unauthorised use. We then examine the crazy reality of jailbreaking LLM-controlled robots, highlighting practical attack vectors that extend beyond traditional prompt injection worries. Following that, we dive into a framework for shareable and explainable attribution for black-box image-based attacks, which brings traditiona…
Keep reading with a 7-day free trial
Subscribe to Stats and Bytes to keep reading this post and get 7 days of free access to the full post archives.