New Findings Show All Major Art Protection Tools Are Vulnerable to AI Forgery

cover
12 Dec 2024

Abstract and 1. Introduction

  1. Background and Related Work

  2. Threat Model

  3. Robust Style Mimicry

  4. Experimental Setup

  5. Results

    6.1 Main Findings: All Protections are Easily Circumvented

    6.2 Analysis

  6. Discussion and Broader Impact, Acknowledgements, and References

A. Detailed Art Examples

B. Robust Mimicry Generations

C. Detailed Results

D. Differences with Glaze Finetuning

E. Findings on Glaze 2.0

F. Findings on Mist v2

G. Methods for Style Mimicry

H. Existing Style Mimicry Protections

I. Robust Mimicry Methods

J. Experimental Setup

K. User Study

L. Compute Resources

7 Discussion and Broader Impact

Adversarial perturbations do not protect artists from style mimicry. Our work is not intended as an exhaustive search for the best robust mimicry method, but as a demonstration of the brittleness of existing protections. Because these protections have received significant attention, artists may believe they are effective. But our experiments show they are not. As we have learned from adversarial ML, whoever acts first (in this case, the artist) is at a fundamental disadvantage (Radiya-Dixit et al., 2021). We urge the community to acknowledge these limitations and think critically when performing future evaluations.

Just like adversarial examples defenses, mimicry protections should be evaluated adaptively. In adversarial settings, where one group wants to prevent another group from achieving some goal, it is necessary to consider “adaptive attacks” that are specifically designed to evade the defense (Carlini & Wagner, 2017). Unfortunately, as repeatedly seen in the literature on machine learning robustness, even after adaptive attacks were introduced, many evaluations remained flawed and defenses were broken by (stronger) adaptive attacks (Tramer et al., 2020). We show it is the same with mimicry protections: simple adaptive attacks significantly reduce their effectiveness. Surprisingly, most protections we study claim robustness against input transformations (Liang et al., 2023; Shan et al., 2023a), but minor modifications were sufficient to circumvent them.

We hope that the literature on style mimicry prevention will learn from the failings of the adversarial example literature: performing reliable, future-proof evaluations is much harder than proposing a new defense. Especially when techniques are widely publicized in the popular press, we believe it is necessary to provide users with exceptionally high degrees of confidence in their efficacy.

Protections are broken from day one, and cannot improve over time. Our most successful robust style mimicry methods rely solely on techniques that existed before the protections were introduced. Also, protections applied to online images cannot easily be changed (i.e., even if the image is perturbed again and re-uploaded, the older version may still be available in an internet archive) (Radiya-Dixit et al., 2021). It is thus challenging for a broken protection method to be fixed retroactively. Of course, an artist can apply the new tool to their images going forward, but pre-existing images with weaker protections (or none at all) will significantly boost an attacker’s success (Shan et al., 2023a).

Nevertheless, the Glaze and Mist protection tools recently received significant updates (after we had concluded our user study). Yet, we find that the newest 2.0 versions do not protect against our robust mimicry attempts either (see Appendix E and F). A future version could explicitly target the methods we studied, but this would not change the fact that all previously protected art would remain vulnerable, and that future attacks could again attempt to adaptively evade the newest protections. The same holds true for attempts to design similar protections for other data modalities, such as video (Passananti et al., 2024) or audio (Gokul & Dubnov, 2024).

Ethics and broader impact. The goal of our research is to help artists better decide how to protect their artwork and business. We do not focus on creating the best mimicry method, but rather on highlighting limitations in popular perturbation tools—especially since using these tools incurs a cost, as they degrade the quality of published art. We will disclose our results to the affected protection tools prior to publication, so that they can determine the best course of action for their users.

Further, we argue that having no protection tools is preferable to having insecure ones. Insecure protections may mislead artists to believe it is safe to release their work, enabling forgery and putting them in a worse situation than if they had been more cautious in the absence of any protection.

With respect to our paper, all the art featured in this paper comes either from historical artists, or from contemporary artists who explicitly permitted us to display their work. We hope our results will inform improved non-technical protections for artists in the era of generative AI.

Limitations and future work. A larger study with more than 10 artists and more annotators may help us better understand the difference in vulnerability across artists. The protections we study are not designed in awareness of our robust mimicry methods. However, we do not believe this limits the extent to which our general claims hold: artists will always be at a disadvantage if attackers can design adaptive methods to circumvent the protections.

Acknowledgements

We thank all the MTurkers that engaged with our tasks, especially those that provided valuable feedback during our preliminary studies to improve the survey. We thank the contemporary artists Stas Voloshin (@nulevoy) and Gregory Fromenteau (@greg-f) for allowing us to display their artwork in this paper. JR is supported by an ETH AI Center doctoral fellowship.

References

James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo, et al. Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8, 2023.

Bochuan Cao, Changjiang Li, Ting Wang, Jinyuan Jia, Bo Li, and Jinghui Chen. Impress: Evaluating the resilience of imperceptible perturbations against unauthorized data usage in diffusion-based generative ai. Advances in Neural Information Processing Systems, 36, 2024.

Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pp. 3–14, 2017.

Valeriia Cherepanova, Micah Goldblum, Harrison Foley, Shiyuan Duan, John Dickerson, Gavin Taylor, and Tom Goldstein. Lowkey: Leveraging adversarial attacks to protect social media users from facial recognition. arXiv preprint arXiv:2101.07922, 2021.

Samantha Cole. Largest dataset powering ai images removed after discovery of child sexual abuse material. 404 Media, Dec 2023. URL https://www.404media.co/ laion-datasets-removed-stanford-csam-child-abuse/.

Liam Fowl, Micah Goldblum, Ping-yeh Chiang, Jonas Geiping, Wojciech Czaja, and Tom Goldstein. Adversarial examples make strong poisons. Advances in Neural Information Processing Systems, 34:30339–30351, 2021.

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-or. An image is worth one word: Personalizing text-to-image generation using textual inversion. In The Eleventh International Conference on Learning Representations, 2022.

Vignesh Gokul and Shlomo Dubnov. Poscuda: Position based convolution for unlearnable audio datasets. arXiv preprint arXiv:2401.02135, 2024.

Melissa Heikkila. This artist is dominating ai-generated art. and he’s not happy about it. ¨ MIT Technology Review, 125(6):9–10, 2022.

Melissa Heikkila. This artist is dominating ai-generated art. and he’s not happy about it. ¨ Technology Review, 2022.

Kashmir Hill. This tool could protect artists from ai-generated art that steals their style. The New York Times, 2023.

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.

Hanxun Huang, Xingjun Ma, Sarah Monazam Erfani, James Bailey, and Yisen Wang. Unlearnable examples: Making personal data unexploitable. arXiv preprint arXiv:2101.04898, 2021.

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusionbased generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.

Ryan Kennedy, Scott Clifford, Tyler Burleigh, Philip D. Waggoner, Ryan Jewell, and Nicholas J. G. Winter. The shape of and solutions to the mturk quality crisis. Political Science Research and Methods, 8(4):614–629, 2020. doi: 10.1017/psrm.2020.6.

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Lauren Leffer. Your personal information is probably being used to train generative ai models. 2023.

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, pp. 19730–19742. PMLR, 2023.

Chumeng Liang and Xiaoyu Wu. Mist: Towards improved adversarial examples for diffusion models. arXiv preprint arXiv:2305.12683, 2023.

Chumeng Liang, Xiaoyu Wu, Yang Hua, Jiaru Zhang, Yiming Xue, Tao Song, Zhengui Xue, Ruhui Ma, and Haibing Guan. Adversarial example does good: Preventing painting imitation from diffusion models via adversarial examples. In International Conference on Machine Learning, pp. 20763–20786. PMLR, 2023.

Chumeng Liang, Xiaoyu Wu, Yang Hua, Jiaru Zhang, Yiming Xue, Tao Song, Zhengui Xue, Ruhui Ma, and Haibing Guan. Adversarial example does good: Preventing painting imitation from diffusion models via adversarial examples. In International Conference on Machine Learning, pp. 20763–20786. PMLR, 2023.

Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds. In International Conference on Learning Representations, 2021.

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. 2022.

Daiki Miyake, Akihiro Iohara, Yu Saito, and Toshiyuki Tanaka. Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models. arXiv preprint arXiv:2305.16807, 2023.

muerrilla. Negative prompt weight: Extension for stable diffusion web ui. https://github.com/ muerrilla/stable-diffusion-NPW, 2023.

Aamir Mustafa, Salman H Khan, Munawar Hayat, Jianbing Shen, and Ling Shao. Image superresolution as a defense against adversarial attacks. IEEE Transactions on Image Processing, 29: 1711–1724, 2019.

Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Animashree Anandkumar. Diffusion models for adversarial purification. In International Conference on Machine Learning, pp. 16805–16827. PMLR, 2022.

Josephine Passananti, Stanley Wu, Shawn Shan, Haitao Zheng, and Ben Y Zhao. Disrupting style mimicry attacks on video imagery. arXiv preprint arXiv:2405.06865, 2024.

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Muller, Joe ¨ Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis. In The Twelfth International Conference on Learning Representations, 2023.

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.

Evani Radiya-Dixit, Sanghyun Hong, Nicholas Carlini, and Florian Tramer. Data poisoning won’t ` save you from facial recognition. arXiv preprint arXiv:2106.14851, 2021.

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical textconditional image generation with clip latents.

Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhipkin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, Andrey Kuznetsov, and Denis Dimitrov. Kandinsky: an improved text-to-image synthesis with image prior and latent diffusion. arXiv preprint arXiv:2310.03502, 2023.

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High- ¨ resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35:36479–36494, 2022.

Hadi Salman, Alaa Khaddaj, Guillaume Leclerc, Andrew Ilyas, and Aleksander Madry. Raising the cost of malicious ai-powered image editing. arXiv preprint arXiv:2302.06588, 2023.

Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models, 2018.

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294, 2022.

Shawn Shan, Emily Wenger, Jiayun Zhang, Huiying Li, Haitao Zheng, and Ben Y Zhao. Fawkes: Protecting privacy against unauthorized deep learning models. In 29th USENIX security symposium (USENIX Security 20), pp. 1589–1604, 2020.

Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, and Ben Y Zhao. Glaze: Protecting artists from style mimicry by {Text-to-Image} models. In 32nd USENIX Security Symposium (USENIX Security 23), pp. 2187–2204, 2023a.

Shawn Shan, Stanley Wu, Haitao Zheng, and Ben Y Zhao. A response to glaze purification via impress. arXiv preprint arXiv:2312.07731, 2023b.

Changhao Shi, Chester Holtz, and Gal Mishne. Online adversarial purification based on selfsupervised learning. In International Conference on Learning Representations, 2020.

Stability AI. Stable diffusion 2.1. https://huggingface.co/stabilityai/ stable-diffusion-2-1, 2022. Accessed: 2024-04-03.

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.

Wei Ren Tan, Chee Seng Chan, Hernan Aguirre, and Kiyoshi Tanaka. Improved artgan for conditional synthesis of natural image and artwork. IEEE Transactions on Image Processing, 28(1):394– 409, 2019. doi: 10.1109/TIP.2018.2866698. URL https://doi.org/10.1109/TIP.2018. 2866698.

Lue Tao, Lei Feng, Jinfeng Yi, Sheng-Jun Huang, and Songcan Chen. Better safe than sorry: Preventing delusive adversaries with adversarial training. Advances in Neural Information Processing Systems, 34:16209–16225, 2021.

Catherine Thorbecke. It gave us some way to fight back: New tools aim to protect art and images from ai’s grasp. 2023.

Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. On adaptive attacks to adversarial example defenses. Advances in neural information processing systems, 33:1633–1645, 2020.

Thanh Van Le, Hao Phung, Thuan Hoang Nguyen, Quan Dao, Ngoc N Tran, and Anh Tran. Antidreambooth: Protecting users from personalized text-to-image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2116–2127, 2023.

Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, Dhruv Nair, Sayak Paul, Steven Liu, William Berman, Yiyi Xu, and Thomas Wolf. Diffusers: State-of-the-art diffusion models, apr 2024. URL https://github.com/ huggingface/diffusers. If you use this software, please cite it using the metadata from this file.

Stephen J Wright. Numerical optimization, 2006.

Jongmin Yoon, Sung Ju Hwang, and Juho Lee. Adversarial purification with score-based generative models. In International Conference on Machine Learning, pp. 12062–12072. PMLR, 2021.

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595, 2018.

Boyang Zheng, Chumeng Liang, Xiaoyu Wu, and Yan Liu. Understanding and improving adversarial attacks on latent diffusion model. arXiv preprint arXiv:2310.04687, 2023.

Authors:

(1) Robert Honig, ETH Zurich (robert.hoenig@inf.ethz.ch);

(2) Javier Rando, ETH Zurich (javier.rando@inf.ethz.ch);

(3) Nicholas Carlini, Google DeepMind;

(4) Florian Tramer, ETH Zurich (florian.tramer@inf.ethz.ch).


This paper is available on arxiv under CC BY 4.0 license.