“Newspaper Eat” Means “Not Tasty”: A Taxonomy and Benchmark for Coded Languages in Real-World Chinese Online Reviews
arXiv:2601.19932v1 Announce Type: new Abstract: Coded language is an important part of human communication. It refers to cases where users intentionally encode meaning so that the surface text differs from the intended meaning and must be decoded to be understood. Current language models handle coded language poorly. Progress has been limited by the lack of real-world datasets and clear taxonomies. This paper introduces CodedLang, a dataset of 7,744 Chinese Google Maps reviews, including 900 reviews with span-level annotations […]