โ Real-world cases
Watermarks in the Sand: Impossibility of Strong LLM Watermarking
Research demonstration07 Nov 2023Zhang, Edelman, Francati, Venturi, Ateniese and Barak (ICML 2024) prove that strong watermarking for generative models is theoretically impossible under natural assumptions, even when the detector uses a secret key. The proof is constructive: a generic removal attack needs only black-box access to the watermarked model plus a much weaker open-source model used as a quality oracle and a quality-preserving random-walk perturbation oracle, with no knowledge of the secret key. The authors instantiate it to strip the watermarks of Kirchenbauer et al. (2023), Kuditipudi et al. (2023) and Zhao et al. (2023) with only minor quality degradation, showing provenance watermarks on text are evadable rather than tamper-proof.