From Cloud to On-Device: What Gemma 4 Means for the Voice AI Pipeline
Google just dropped its most capable open model family and it might be the missing piece for on-device voice AI. Continue reading on Towards AI »
Google just dropped its most capable open model family and it might be the missing piece for on-device voice AI. Continue reading on Towards AI »
Robert Goddard, a Massachusetts-born physicist, launched the world’s first liquid-fueled rocket on this date 100 years ago. It was not an overly impressive flight. The rocket, fueled by gasoline and liquid oxygen, rose just 41 feet into the air, and the flight lasted 2.5 seconds before it struck ice and snow. Nevertheless, this rocket, named “Nell,” represented a historic achievement that would help launch the modern age of spaceflight. Three decades later, the first objects would begin to […]
A second fueling test on NASA’s Space Launch System rocket ended Thursday night, giving senior managers enough confidence to move forward with plans to launch four astronauts around the Moon as soon as March 6. Unlike the first attempt to load propellants into the SLS rocket on February 2, there were no major leaks during Thursday’s practice countdown at Kennedy Space Center in Florida. Technicians swapped seals at the launch pad after hydrogen gas leaked from the rocket’s […]
Hey there, we’re sharing KidGym, an interactive 2D grid-based benchmark for evaluating MLLMs in continuous, trajectory-based interaction, accepted to ICLR 2026. Motivation: Many existing MLLM benchmarks are static and focus on isolated skills, which makes them less faithful for characterizing model capabilities in continuous interactive settings. Inspired by the Wechsler Intelligence Scale for Children (WISC), we organize evaluation into five cognitive dimensions and design tasks to probe both single abilities and compositional abilities. Previews of 12 tasks in […]
arXiv:2602.04982v1 Announce Type: new Abstract: With the increasing use of large language models (LLMs) for generating answers to biomedical questions, it is crucial to evaluate the quality of the generated answers and the references provided to support the facts in the generated answers. Evaluation of text generated by LLMs remains a challenge for question answering, retrieval-augmented generation (RAG), summarization, and many other natural language processing tasks in the biomedical domain, due to the requirements of expert assessment to […]
Forgetting a subset in machine unlearning is rarely an isolated task. Often, retained samples that are closely related to the forget set can be unintentionally affected, particularly when they share correlated features from pretraining or exhibit strong semantic similarities. To address this challenge, we propose a novel two-phase optimization framework specifically designed to handle such retai-forget entanglements. In the first phase, an augmented Lagrangian method increases the loss on the forget set while preserving accuracy on less-related retained […]
arXiv:2412.16765v3 Announce Type: replace-cross Abstract: Gradient-based methods successfully train highly overparameterized models in practice, even though the associated optimization problems are markedly nonconvex. Understanding the mechanisms that make such methods effective has become a central problem in modern optimization. To investigate this question in a tractable setting, we study Deep Diagonal Linear Networks. These are multilayer architectures with a reparameterization that preserves convexity in the effective parameter, while inducing a nontrivial geometry in the optimization landscape. Under mild […]
arXiv:2603.06664v1 Announce Type: new Abstract: Diffusion Transformer (DiT)-based video generation models inherently suffer from bottlenecks in long video synthesis and real-time inference, which can be attributed to the use of full spatiotemporal attention. Specifically, this mechanism leads to explosive O(N^2) memory consumption and high first-frame latency. To address these issues, we implement system-level inference optimizations for a causal autoregressive video generation pipeline. We adapt the Self-Forcing causal autoregressive framework to sequence parallel inference and implement a sequence-parallel variant […]
:::info Design, implementation, and benchmarks of a native BM25 index for Postgres. Now generally available to all Tiger Cloud customers and freely available via open source. ::: If you have used Postgres’s built-in ts_rank for full-text search at any meaningful scale, you already know the limitations. Ranking quality degrades as your corpus grows. There is no inverse document frequency, so common words carry the same weight as rare ones. There is no term frequency saturation, so a document that mentions […]
arXiv:2603.01047v1 Announce Type: cross Abstract: Generative Flow Networks (GFlowNets) were developed to learn policies for efficiently sampling combinatorial candidates by interpreting their generative processes as trajectories in directed acyclic graphs. In the value-based training workflow, the objective is to enforce the balance over partial episodes between the flows of the learned policy and the estimated flows of the desired policy, implicitly encouraging policy divergence minimization. The policy-based strategy alternates between estimating the policy divergence and updating the policy, […]