DeepMind reveals flaw in AI memories
A critical vulnerability in AI: extractable storage identified in ChatGPT
A recent discovery by DeepMind sheds light on a critical vulnerability in OpenAI's ChatGPT, called 'extractable storage'. This gap allows the language model to reveal details of the material it was instructed with, potentially exposing sensitive data. Through continuous repetitions of often innocuous words, researchers have induced the AI to disclose segments accidentally memorized during its training, underlining a significant risk to user privacy.
Detailed analysis of ChatGPT behavior
The DeepMind experts used an ingenious strategy, pestering the program with incessantly repeated keywords, for example "poetry". ChatGPT's response, initially correct, ended up becoming unbalanced, revealing parts of its learning database. To delve deeper, the researchers created AUXDataSet, a database of nearly 10 terabytes of training data, which helped identify exact matches between the model's and learning texts.
Implications for data privacy and security
This security gap has serious consequences: 17% of the 15,000 sequences tested revealed personally identifiable data, revealing a dangerous potential for abuse of confidential information. Among the outputs examined, excerpts from literary works, full-length poems and non-safe-for-work (NSFW) content appeared, even though the latter should be precluded from user interactions by security rules of the system itself.
OpenAI takes action against detected weaknesses
Following the report from DeepMind to OpenAI on August 30, it appears that there have been changes to mitigate the aforementioned vulnerability in affected systems. In response, ChatGPT showed a reduced propensity to repeat words over and over again and improved its warning protocols for potential content violations. The AI community is now faced with a pressing need for an overhaul of its security practices, and this finding serves to fuel such critical evaluation of ethical alignment and privacy processes in AI models.
Follow us on Threads for more pills like this12/11/2023 09:47
Marco Verro