The Video Frontier: When AI Stopped Watching and Started Understanding
Author(s): Ampatishan Sivalingam Originally published on Towards AI. Part IV of the Multimodal Intelligence Series · The model learned to see. Then it learned to remember what it saw. This stack did not exist in 2023. The U-Net diffusion models that produced Stable Diffusion’s images could not, in any architecturally coherent sense, be extended to handle the temporal dimension of video. The entire infrastructure had to be rebuilt from first principles. This article is the story of that […]