paint-brush
How AI Prompts Get Hacked: Prompt Injection Explainedby@whatsai
3,870 reads
3,870 reads

How AI Prompts Get Hacked: Prompt Injection Explained

by Louis BouchardMay 24th, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Prompting is the secret behind countless cool applications powered by AI models. Having the right prompts can yield amazing results, from language translations to merging with other AI applications and datasets. Prompting has certain drawbacks, such as its vulnerability to hacking and injections, which can manipulate AI models or expose private data.
featured image - How AI Prompts Get Hacked: Prompt Injection Explained
Louis Bouchard HackerNoon profile picture


Did you know prompting is the secret behind countless cool applications powered by AI models like ChatGPT? 😮


Having the right prompts can yield amazing results, from language translations to merging with other AI applications and datasets!

Prompting has certain drawbacks, such as its vulnerability to hacking and injections, which can manipulate AI models or expose private data.


You may already be familiar with instances where individuals successfully deceived ChatGPT, causing it to engage in activities that OpenAI had not intended.


Specifically, an injected prompt resulted in ChatGPT assuming the identity of a different chatbot named "DAN." This version of ChatGPT, manipulated by the user, was instructed to perform tasks under the prompt "Do Anything Now," thereby compromising OpenAI's content policy and leading to the dissemination of restricted information.


Despite OpenAI's efforts to prevent such occurrences, a single prompt allowed these safeguards to be bypassed.


Thankfully, prompt defense mechanisms are available to reduce hacking risks and ensure AI safety. Limiting the purpose of a bot (like translations only) is one basic example, but other defense techniques exist, and even emojis could play a role! 🛡️


Want to learn more about enhancing AI safety? Check out the video!

References

►Prompt hacking competition: https://www.aicrowd.com/challenges/hackaprompt-2023#introduction
►Learn prompting (everything about prompt hacking and prompt defense):https://learnprompting.org/docs/category/-prompt-hacking
►Prompting exploits:https://github.com/Cranot/chatbot-injections-exploits
►My Newsletter (A new AI application explained weekly to your emails!):https://www.louisbouchard.ai/newsletter/
►Twitter:https://twitter.com/Whats_AI
►Support me on Patreon:https://www.patreon.com/whatsai
►Support me through wearing Merch:https://whatsai.myshopify.com/
►Join Our AI Discord:https://discord.gg/learnaitogether