Meta and the security challenge: unexpected vulnerabilities in the new machine learning model
Critical vulnerability discovered in Meta's AI model: Prompt-Guard-86M under attack
Meta introduced a machine learning model, Prompt-Guard-86M, to prevent prompt injection attacks. However, it is vulnerable to such attacks via letter spaces. This highlights the importance of security in the evolution of AI.
Meta, known for its social platforms, recently introduced Prompt-Guard-86M, a new machine learning model developed to work in synergy with Llama 3.1. The purpose of this model is to support developers in detecting and preventing prompt injection attacks and jailbreaking techniques, which aim to bypass security systems. However, ironically, it appears that Prompt-Guard-86M is itself vulnerable to the very types of attacks it is supposed to thwart.
The challenge of prompt injection attacks
Prompt injection attacks represent a persistent and unresolved challenge in the field of artificial intelligence. These attacks manipulate models to ignore default security inputs. For example, academics at Carnegie Mellon University had previously created techniques to automatically generate hostile prompts capable of bypassing such security barriers. An emblematic case is that of a Chevrolet dealership in California, where a chatbot, victim of a prompt injection, offered a $76,000 car for just $1.
Analysis and discovery of a vulnerability
The attack on Prompt-Guard-86M was discovered by Aman Priyanshu, an expert at Robust Intelligence, by analyzing the differences in embedding weights compared to a basic Microsoft model. Priyanshu found that Meta's fine-tuning minimally affected individual characters in the English alphabet. By inserting spaces between each letter of a prompt, the classifier was unable to recognize malicious content. Robust Intelligence CTO Hyrum Anderson told The Register that this simple technique can significantly increase the chance of an attack being successful, from 3% to nearly 100%.
Importance of security in the evolution of AI
While Meta did not immediately respond to requests for comment, internal sources indicate that the company is working on a fix for this vulnerability. Anderson pointed out that the model tested by Prompt-Guard still has the potential to resist malicious prompts. However, the importance of the discovery lies in raising awareness among companies about the potential risks associated with the use of AI. As AI continues to evolve, it becomes crucial to implement robust security measures to prevent abuse and malfunctions.
Follow us on Telegram for more pills like this07/30/2024 17:41
Marco Verro