CoEditor++: Instruction-based Visual Editing via Cognitive Reasoning
arXiv:2603.05518v1 Announce Type: new Abstract: Recent advances in large multimodal models (LMMs) have enabled instruction-based image editing, allowing users to modify visual content via natural language descriptions. However, existing approaches often struggle with high-level semantic reasoning and visual consistency, particularly under ambiguous or complex instructions. To address these challenges, we propose CoEditor++, a cognitively structured, training-free framework that decomposes editing into “what to edit” and “how to edit” through two cognitive stages with a reflective self-selection mechanism, enabling […]