Will AI Images Distort Our Reality?

What if AI was able to generate an image where you are doing something bad, something that you don't want other people to see?

Will AI Images Distort Our Reality?
A drawing of New York city generated by DALL-E

AI text-to-image generation is becoming good. Very good. And they are becoming more and more available for little to no money.

Just take a look at some of the galleries that are generated by AI and you will see what I mean - contains sensitive images!! (stable-diffusion, openjourney).

This is causing people to rethink what AI should and should not be able to do.

What if AI was able to generate an image where you are doing something bad, something that you don't want other people to see. Committing a crime, or caught in adultery. This is a classic blackmail scheme. If you have a reputation to protect, then you might be tempted to pay, just to make sure this image does not get out.

Or, let's say that someone wants to spread fake news to advance some political agenda. You can easily create believable images that can influence or rewrite history.

Or, you are simply a child playing with an AI tool - you should not be able to see content that is inappropriate for you.

Nobody wants to create a tool used by scam artists to blackmail people, or a tool that ignores all societies ethical norms. But this is what has been built.

So what can be done about it?

Currently AI tools tend to implement some sort of internal policies that try to ensure that their tools refuse to generate inappropriate images. They might refuse to generate images from sensitive terms like "naked", "murder", "sex".

However, research has shown that these safety filters can be bypassed. I recently found a research that demonstrates this perfectly.

SneakyPrompt is a jailbreak framework, that demonstrates how models like Stable Diffusion and DALL-E can be used to generate NSFW (not safe for work) images.

Given a prompt that is blocked by a safety filter, SneakyPrompt repeatedly queries the text-to-image generative model and strategically perturbs tokens in the prompt based on the query results to bypass the safety filter.

Essentially, prompts are represented as binary classifiers with a certain level of sensitivity and similarity. The models take both of these factors when understanding what the image should be about. Here is a visual representation to help you understand the core concept:

Intuitive explanation of SneakyPrompt’s idea in bypassing safety filters. Image from: arXiv:2305.12082 [cs.LG]

So you can find a prompt that is not considered sensitive but which the model will understand the same sensitive keyword.

It is an amazing paper which shows how text-to-image generators can be fooled into bypassing sensitivity filters. You can (and should) see the full research paper here SneakyPrompt: Jailbreaking Text-to-image Generative Models.

Now that we know that security filters can be bypassed, what are our next options?

Even if they are not perfect, the filters for text-to-image models are necessary as long as children are able to use the generation tools.

The text-to-image tools should improve their filter functionality. But as will all tech, security grows together with the attack types, meaning that we can never be fully sure that the a new attack won't exploit the tools policies.

Matrix Style Image of a Middle Aged Man Generated By DALL-E
Matrix Style Image Generated By DALL-E

Perhaps, we can disallow AI images from being created freely, without supervision?

I don't think it is possible. A lot of the AI models are open source, meaning they can be accessed and built by anyone if they have the resources.

So the cat is out of the bag. We can not return to an age where AI images where not a thing. And in my opinion we should not. We should embrace the new technology and learn to live with it.

Does it mean that we are doomed to live through an age, where you can never know what was generated through AI and what is an authentic photo?

It seems to be inevitable. Unless we think of a new way to recognize credible content. We need verification tools.

An era where the boundaries between reality and AI world are blurred
An era where the boundaries between reality and AI world are blurred / Generated by DALL-E

One more idea would be to implement stricter laws about how AI generated imagery can be used. Any AI generated art should be clearly linked to an AI tool in a way that is unmistakable, and could be proved in court.

However I do not see how this could solve the problem. Assuming that it is impossible to differentiate the AI image from a real image, this would do no good, because anyone could just claim this is not an AI image.

Perhaps if there were serious legal consequences for maliciously distributed AI content, it might help slow down the willingness to post, or make a quick buck. This would raise the bar for social and news outlets to truly fact check their information.

But what if we approach it from the other side. Instead of trying to safeguard AI images, we could just assume that all images are AI generated unless proven otherwise. So any image that can not prove it is real would have no credibility.

For this to work, we would need a way to show that a real photo, is indeed a real photo. For now forensic analysts may be able to tell the difference but what if there was no visible difference. Everything, including image metadata, can be generated. We need something that would truly identify the photo as real.

Blockchain of Cameras / Generated by DALL-E
Blockchain of Cameras / Generated by DALL-E

Perhaps cameras need an additional feature that somehow identifies the authenticity of the photo beyond just adding the info in the metadata. Something real-time, something that can be verified at any point in history that says - this image happened here and you can identify it has not been modified. Maybe a blockchain type of network between verified authors or cameras?

There actually already is a standard that help solve the problem in a similar way.

Overview - C2PA
An open technical standard providing publishers, creators, and consumers the ability to trace the origin of different types of media.

In principle C2PA would be part of the camera software. When an image is taken it would create a manifest which would include information about the photo, provenance information and a digital signature of the publisher. All 3 of these components are encrypted and forever following the image. The encryption ensures that the manifest can not be modified, it also ensures that the manifest can not be generated just by anyone, because the tools that read the manifest will also be verified. The content credentials would allow the users verify the image before rushing to any decisions. You can take a look at one of the services providing this feature here:

Content Credentials
Introducing the new standard for content authentication. Content Credentials provide deeper transparency into how content was created or edited.

Regardless of the solution, we do need something that would help us keep at least some level of sanity in our world. I believe having these AI tools is better than not having them, we just need to advance our society to cope with it.

The tools to distort our reality are already in the wild. We are yet to see if society can catch up to them, or end up in an age where digital reality is at best - questionable.