[P] I made Screen Vision, turn any confusing UI into a step-by-step guide via screen sharing (open source)

[P] I made Screen Vision, turn any confusing UI into a step-by-step guide via screen sharing (open source)

I built Screen Vision, an open source website that guides you through any task by screen sharing with AI.

  • Privacy Focused: Your screen data is never stored or used to train models.
  • Local LLM Support: If you don’t trust cloud APIs, the app has a “Local Mode” that connects to local AI models running on your own machine. Your data never leaves your computer.
  • Web-Native: No desktop app or extension required. Works directly on your browser.

How it works:

  1. Instruction & Grounding: The system uses GPT-5.2 to determine the next logical step based on your goal and current screen state. These instructions are then passed to Qwen 3VL (30B), which identifies the exact screen coordinates for the action.
  2. Visual Verification: The app monitors your screen for changes every 200ms using a pixel-comparison loop. Once a change is detected, it compares before and after snapshots using Gemini 3 Flash to confirm the step was completed successfully before automatically moving to the next task.

Source Code: https://github.com/bullmeza/screen.vision
Demo: https://screen.vision

I’m looking for feedback, please let me know what you think!

submitted by /u/bullmeza
[link] [comments]

Liked Liked