๐Ÿ”AI RiskAtlas
โ† Real-world cases

Watermarks in the Sand: Impossibility of Strong LLM Watermarking

Research demonstration07 Nov 2023

Zhang, Edelman, Francati, Venturi, Ateniese and Barak (ICML 2024) prove that strong watermarking for generative models is theoretically impossible under natural assumptions, even when the detector uses a secret key. The proof is constructive: a generic removal attack needs only black-box access to the watermarked model plus a much weaker open-source model used as a quality oracle and a quality-preserving random-walk perturbation oracle, with no knowledge of the secret key. The authors instantiate it to strip the watermarks of Kirchenbauer et al. (2023), Kuditipudi et al. (2023) and Zhao et al. (2023) with only minor quality degradation, showing provenance watermarks on text are evadable rather than tamper-proof.

More cases on Watermark & Provenance Evasion

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning โ€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading โ†’ยทBuilt by Shi Yuan โ†—