Researchers have come up with a sneaky way to bypass the safety measures of fancy language models (LLMs), like those smart assistants you might ask for help. They call it ArtPrompt, and it's a bit like a jailbreak for these digital helpers.
You know those old-school pictures made out of letters and symbols, right? That's ASCII art. Well, these researchers from the University of Washington, Western Washington University, and Chicago University figured out how to use ASCII art to trick these smart machines into doing things they're not supposed to.
Normally, if you ask an LLM for something dangerous, like how to make a bomb, it's programmed to say no. But these researchers found a way around it. Instead of typing "bomb," they put in an ASCII art picture of a bomb, and suddenly the LLM is happy to help.
They tested this trick on a bunch of these smart machines, and guess what? It worked on all of them! These machines are usually really good at understanding language, but when they're too busy trying to figure out the ASCII art, they forget to stop you from asking dangerous stuff.
Now, you might wonder how something as simple as ASCII art can fool these high-tech machines. Well, the researchers don't explain exactly how, but it works!
For example, when they tried it with GPT-4 and asked about making fake money, it gave them a detailed answer without any fuss.
This trick isn't just a problem for the models they tested; it could mess up even fancier ones that can understand both text and pictures.
The researchers even made a test to see how good these models are at handling tricks like ArtPrompt. They found that some models were easier to fool than others. And they hope that by sharing their findings, the people who make these machines will find a way to fix the problem.
It's a reminder that even with all their smarts, these machines aren't perfect, and there might be other tricks out there that people are using for less-than-friendly purposes.