I Gave Qwen3.7-Plus a Screenshot and It Found the Exact Pixel to Click for $0.40

Author(s): Chew Loong Nian – AI ENGINEER Originally published on Towards AI. I Gave Qwen3.7-Plus a Screenshot and It Found the Exact Pixel to Click for $0.40 I uploaded a messy AWS console screenshot and asked one question: which pixel do I click to launch an instance? The model came back with click at (x=1147, y=283). I overlaid that coordinate on the image. It landed dead center on the orange “Launch instance” button. Then I checked the price: $0.40 per million input tokens — one-sixth what Alibaba charges for the text-only Qwen3.7-Max, and the model scores 79.0 on ScreenSpot Pro, the benchmark that decides whether a “computer use” agent actually works. The author argues that successful “computer use” hinges on GUI grounding: given a screenshot and an instruction, the model must output exact pixel coordinates for the right UI element. They explain how Qwen3.7-Plus (a vision-capable variant that only outputs text) achieves a strong ScreenSpot Pro score (79.0), compare it to Qwen3.7-Max and other benchmarks, and show how to implement it quickly using Alibaba Cloud Model Studio via the OpenAI-compatible SDK. The article walks through four practical “glue” calls—(1) screenshot-to-JSON coordinates, (2) converting coordinates into real clicks with a confidence gate, (3) running an observe-act loop in Playwright for browser tasks, and (4) “screenshot to code” to recreate UI components. Finally, it discusses when to use Plus versus alternatives, highlights the key limitation that Plus is proprietary/API-only (no open weights or self-hosting), and concludes that it’s a cost-effective way to prototype frontier-grade screen grounding before moving to more polished managed or self-hostable solutions. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI

Liked Liked