March 2026

Accelerating Video Generation Inference with Sequential-Parallel 3D Positional Encoding Using a Global Time Index

digitado ⋅ 10 de March de 2026

arXiv:2603.06664v1 Announce Type: new Abstract: Diffusion Transformer (DiT)-based video generation models inherently suffer from bottlenecks in long video synthesis and real-time inference, which can be attributed to the use of full spatiotemporal attention. Specifically, this mechanism leads to explosive O(N^2) memory consumption and high first-frame latency. To address these issues, we implement system-level inference optimizations for a causal autoregressive video generation pipeline. We adapt the Self-Forcing causal autoregressive framework to sequence parallel inference and implement a sequence-parallel variant […]

Ver mais

Like 0

Liked Liked

technocracy

Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting

digitado ⋅ 10 de March de 2026

arXiv:2603.06663v1 Announce Type: new Abstract: Recent advances in training-free visual prompting, such as Set-of-Mark, have emerged as a promising direction for enhancing the grounding capabilities of multimodal language models (MLMs). These techniques operate by partitioning the input image into object regions and annotating them with marks, predominantly boxes with numeric identifiers, before feeding the augmented image to the MLM. However, these approaches treat marked objects as isolated entities, failing to capture the relationships between them. On these premises, […]

Ver mais

Like 0

Liked Liked

technocracy

HyperTokens: Controlling Token Dynamics for Continual Video-Language Understanding

digitado ⋅ 10 de March de 2026

arXiv:2603.06662v1 Announce Type: new Abstract: Continual VideoQA with multimodal LLMs is hindered by interference between tasks and the prohibitive cost of storing task-specific prompts. We introduce HyperTokens, a transformer-based token generator that produces fine-tuning tokens on demand, giving explicit control over prompt updates while keeping memory fixed. To suppress forgetting, we propose meta-inspired regularisers that look ahead to avoid task-specific sharp directions and anchor the evolving generator to prior tasks. We further connect our objective to sharpness-aware optimisation, […]

Ver mais

Like 0

Liked Liked

technocracy

EnsAug: Augmentation-Driven Ensembles for Human Motion Sequence Analysis

digitado ⋅ 10 de March de 2026

arXiv:2603.06661v1 Announce Type: new Abstract: Data augmentation is a crucial technique for training robust deep learning models for human motion, where annotated datasets are often scarce. However, generic augmentation methods often ignore the underlying geometric and kinematic constraints of the human body, risking the generation of unrealistic motion patterns that can degrade model performance. Furthermore, the conventional approach of training a single generalist model on a dataset expanded with a mixture of all available transformations does not fully […]

Ver mais

Like 0

Liked Liked

technocracy

Approximate Nearest Neighbor Search for Modern AI: A Projection-Augmented Graph Approach

digitado ⋅ 10 de March de 2026

arXiv:2603.06660v1 Announce Type: new Abstract: Approximate Nearest Neighbor Search (ANNS) is fundamental to modern AI applications. Most existing solutions optimize query efficiency but fail to align with the practical requirements of modern workloads. In this paper, we outline six critical demands of modern AI applications: high query efficiency, fast indexing, low memory footprint, scalability to high dimensionality, robustness across varying retrieval sizes, and support for online insertions. To satisfy all these demands, we introduce Projection-Augmented Graph (PAG), a […]

Ver mais

Like 0

Liked Liked

technocracy

Science Literacy: Generative AI as Enabler of Coherence in the Teaching, Learning, and Assessment of Scientific Knowledge and Reasoning

digitado ⋅ 10 de March de 2026

arXiv:2603.06659v1 Announce Type: new Abstract: This chapter examines the potential of generative AI in enhancing science literacy across the K-16+ grade span, including its benefits as well as the conceptual and practical challenges that doing so presents. It begins with a discussion of what defines science literacy in the era of AI, including how AI has changed science and the demand for future citizens to be scientifically literate when AI is applied in their careers and lives. The […]

Ver mais

Like 0

Liked Liked

technocracy

ASMIL: Attention-Stabilized Multiple Instance Learning for Whole Slide Imaging

digitado ⋅ 10 de March de 2026

arXiv:2603.06658v1 Announce Type: new Abstract: Attention-based multiple instance learning (MIL) has emerged as a powerful framework for whole slide image (WSI) diagnosis, leveraging attention to aggregate instance-level features into bag-level predictions. Despite this success, we find that such methods exhibit a new failure mode: unstable attention dynamics. Across four representative attention-based MIL methods and two public WSI datasets, we observe that attention distributions oscillate across epochs rather than converging to a consistent pattern, degrading performance. This instability adds […]

Ver mais

Like 0

Liked Liked

technocracy

GameVerse: Can Vision-Language Models Learn from Video-based Reflection?

digitado ⋅ 10 de March de 2026

arXiv:2603.06656v1 Announce Type: new Abstract: Human gameplay is a visually grounded interaction loop in which players act, reflect on failures, and watch tutorials to refine strategies. Can Vision-Language Models (VLMs) also learn from video-based reflection? We present GameVerse, a comprehensive video game benchmark that enables a reflective visual interaction loop. Moving beyond traditional fire-and-forget evaluations, it uses a novel reflect-and-retry paradigm to assess how VLMs internalize visual experience and improve policies. To facilitate systematic and scalable evaluation, we […]

Ver mais

Like 0

Liked Liked

technocracy

A Parameter-efficient Convolutional Approach for Weed Detection in Multispectral Aerial Imagery

digitado ⋅ 10 de March de 2026

arXiv:2603.06655v1 Announce Type: new Abstract: We introduce FCBNet, an efficient model designed for weed segmentation. The architecture is based on a fully frozen ConvNeXt backbone, the proposed Feature Correction Block (FCB), which leverages efficient convolutions for feature refinement, and a lightweight decoder. FCBNet is evaluated on the WeedBananaCOD and WeedMap datasets under both RGB and multispectral modalities, showing that FCBNet outperforms models such as U-Net, DeepLabV3+, SK-U-Net, SegFormer, and WeedSense in terms of mIoU, exceeding 85%, while also […]

Ver mais

Like 0

Liked Liked

technocracy

How the Graph Construction Technique Shapes Performance in IoT Botnet Detection

digitado ⋅ 10 de March de 2026

arXiv:2603.06654v1 Announce Type: new Abstract: The increasing incidence of IoT-based botnet attacks has driven interest in advanced learning models for detection. Recent efforts have focused on leveraging attention mechanisms to model long-range feature dependencies and Graph Neural Networks (GNNs) to capture relationships between data instances. Since GNNs require graph-structured input, tabular NetFlow data must be transformed accordingly. This study evaluates how the choice of the method for constructing the graph-structured dataset impacts the classification performance of a GNN […]

Ver mais

Like 0

Liked Liked