Google Maps can now write captions for your photos using AI
Gemini can now create captions when users are looking to share a photo or video.
Gemini can now create captions when users are looking to share a photo or video.
How a 180-year-old diagnosis explains what AI is doing to work, meaning, and identity in 2026 Part 5 of a six-part series using science fiction as a lens for understanding AI, work, and power in 2026. Work is about a search for daily meaning as well as daily bread. Studs Terkel, Working, 1974 Studs Terkel (an American writer and historian among other things) spent three years interviewing Americans about their jobs. What he found was a story about […]
arXiv:2603.20270v1 Announce Type: new Abstract: Generating executable simulations from natural language specifications remains a challenging problem due to the limited reasoning capacity of large language models (LLMs) when confronted with large, interconnected codebases. This paper presents FactorSmith, a framework that synthesizes playable game simulations in code from textual descriptions by combining two complementary ideas: factored POMDP decomposition for principled context reduction and a hierarchical planner-designer-critic agentic workflow for iterative quality refinement at every generation step. Drawing on the […]
arXiv:2602.21406v1 Announce Type: new Abstract: Temporal Action Segmentation (TAS) requires dividing videos into action segments, yet the vast space of activities and alternative breakdowns makes collecting comprehensive datasets infeasible. Existing methods remain limited to closed vocabularies and fixed label sets. In this work, we explore the largely unexplored problem of Open-Vocabulary Zero-Shot Temporal Action Segmentation (OVTAS) by leveraging the strong zero-shot capabilities of Vision-Language Models (VLMs). We introduce a training-free pipeline that follows a segmentation-by-classification design: Frame-Action Embedding […]
arXiv:2602.05029v1 Announce Type: new Abstract: Operating effectively in novel real-world environments requires robotic systems to estimate and interact with previously unseen objects. Current state-of-the-art models address this challenge by using large amounts of training data and test-time samples to build black-box scene representations. In this work, we introduce a differentiable neuro-graphics model that combines neural foundation models with physics-based differentiable rendering to perform zero-shot scene reconstruction and robot grasping without relying on any additional 3D data or test-time […]
Large language models (LLMs) have been championed as tools that could democratize access to information worldwide, offering knowledge in a user-friendly interface regardless of a person’s background or location. However, new research from MIT’s Center for Constructive Communication (CCC) suggests these artificial intelligence systems may actually perform worse for the very users who could most benefit from them. A study conducted by researchers at CCC, which is based at the MIT Media Lab, found that state-of-the-art AI chatbots […]
arXiv:2603.22304v1 Announce Type: new Abstract: Vector Quantization (VQ) has become the cornerstone of tokenization for many multimodal Large Language Models and diffusion synthesis. However, existing VQ paradigms suffer from a fundamental conflict: they enforce discretization before the encoder has captured the underlying data manifold. We term this phenomenon Premature Discretization. To resolve this, we propose Progressive Quantization (ProVQ), which incorporates the dynamics of quantization hardness as a fundamental yet previously overlooked axis in VQ training. By treating quantization […]
Machine unlearning, a process enabling pre-trained models to remove the influence of specific training samples, has attracted significant attention in recent years. Although extensive research has focused on developing efficient machine unlearning strategies, we argue that these methods mainly aim at removing samples rather than removing samples’ influence on the model, thus overlooking the fundamental definition of machine unlearning. In this paper, we first conduct a comprehensive study to evaluate the effectiveness of existing unlearning schemes when the […]
Universal and Existential Quantification in Haskell In logic, there are two common quantifiers: the universal quantifier and the existential quantifier. You might recognize them as ∀forall∀ (for all) and ∃exists∃ (there exists). They are relevant to Haskellers as well, since both universal and existential quantification are possible in Haskell. In this article, we’ll cover both types of quantification. You’ll learn how to: Make universal quantification explicit with ExplicitForAll. Create a heterogeneous list with existential data types. Use existentially […]
arXiv:2602.00994v1 Announce Type: new Abstract: Agentic Reinforcement Learning (ARL) focuses on training large language models (LLMs) to interleave reasoning with external tool execution to solve complex tasks. Most existing ARL methods train a single shared model parameters to support both reasoning and tool use behaviors, implicitly assuming that joint training leads to improved overall agent performance. Despite its widespread adoption, this assumption has rarely been examined empirically. In this paper, we systematically investigate this assumption by introducing a […]