digitado

Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels

digitado ⋅ 24 de March de 2026

arXiv:2603.22276v1 Announce Type: cross Abstract: Weight-Decomposed Low-Rank Adaptation (DoRA) extends LoRA by decoupling weight magnitude from direction, but its forward pass requires the row-wise norm of W + sBA, a computation that every major framework we surveyed implements by materializing the dense [d_out, d_in] product BA. At d_in = 8192 and rank r = 384, a single module’s norm requires about 512 MB of transient working memory in bf16, making high-rank DoRA costly and often infeasible on common […]

Ver mais

Like 0

Liked Liked

technocracy

Machine Learning Models for the Early Detection of Burnout in Software Engineering: a Systematic Literature Review

digitado ⋅ 24 de March de 2026

Burnout is an occupational syndrome that, like many other professions, affects the majority of software engineers. Past research studies showed important trends, including an increasing use of machine learning techniques to allow for an early detection of burnout. This paper is a systematic literature review (SLR) of the research papers that proposed machine learning (ML) approaches, and focused on detecting burnout in software developers and IT professionals. Our objective is to review the accuracy and precision of the […]

Ver mais

Like 0

Liked Liked

technocracy

Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement

digitado ⋅ 3 de March de 2026

arXiv:2410.04264v3 Announce Type: replace Abstract: Dynamic feature transformation (the rich regime) does not always align with predictive performance (better representation), yet accuracy is often used as a proxy for richness, limiting analysis of their relationship. We propose a computationally efficient, performance-independent metric of richness grounded in the low-rank bias of rich dynamics, which recovers neural collapse as a special case. The metric is empirically more stable than existing alternatives and captures known lazy-torich transitions (e.g., grokking) without relying […]

Ver mais

Like 0

Liked Liked

technocracy

ServiceNow Research Introduces EnterpriseOps-Gym: A High-Fidelity Benchmark Designed to Evaluate Agentic Planning in Realistic Enterprise Settings

digitado ⋅ 19 de March de 2026

Large language models (LLMs) are transitioning from conversational to autonomous agents capable of executing complex professional workflows. However, their deployment in enterprise environments remains limited by the lack of benchmarks that capture the specific challenges of professional settings: long-horizon planning, persistent state changes, and strict access protocols. To address this, researchers from ServiceNow Research, Mila and Universite de Montreal have introduced EnterpriseOps-Gym, a high-fidelity sandbox designed to evaluate agentic planning in realistic enterprise scenarios. https://arxiv.org/pdf/2603.13594 The Evaluation Environment […]

Ver mais

Like 0

Liked Liked

technocracy

Leveraging Second-Order Curvature for Efficient Learned Image Compression: Theory and Empirical Evidence

digitado ⋅ 28 de January de 2026

Training learned image compression (LIC) models entails navigating a challenging optimization landscape defined by the fundamental trade-off between rate and distortion. Standard first-order optimizers, such as SGD and Adam, struggle with emph{gradient conflicts} arising from competing objectives, leading to slow convergence and suboptimal rate-distortion performance. In this work, we demonstrate that a simple utilization of a second-order quasi-Newton optimizer, textbf{SOAP}, dramatically improves both training efficiency and final performance across diverse LICs. Our theoretical and empirical analyses reveal that […]

Ver mais

Like 0

Liked Liked

technocracy

Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation

digitado ⋅ 5 de March de 2026

arXiv:2603.03306v1 Announce Type: new Abstract: Recently presented Token-Oriented Object Notation (TOON) aims to replace JSON as a serialization format for passing structured data to LLMs with significantly reduced token usage. While showing solid accuracy in LLM comprehension, there is a lack of tests against JSON generation. Though never present in training data, TOON syntax is simple enough to suggest one-shot in-context learning could support accurate generation. The inevitable prompt overhead can be an acceptable trade-off for shorter completions. […]

Ver mais

Like 0

Liked Liked

technocracy

Towards a Physics Foundation Model

digitado ⋅ 27 de January de 2026

arXiv:2509.13805v3 Announce Type: replace-cross Abstract: Foundation models have revolutionized natural language processing through a “train once, deploy anywhere” paradigm, where a single pre-trained model adapts to countless downstream tasks without retraining. Access to a Physics Foundation Model (PFM) would be transformative – democratizing access to high-fidelity simulations, accelerating scientific discovery, and eliminating the need for specialized solver development. Yet current physics-aware machine learning approaches remain fundamentally limited to single, narrow domains and require retraining for each new system. […]

Ver mais

Like 0

Liked Liked

technocracy

FAIRFORMER: A transformer architecture for discrete fair division

digitado ⋅ 2 de February de 2026

arXiv:2601.22346v1 Announce Type: new Abstract: We propose a deep neural network-based solution to the problem of allocating indivisible goods under additive subjective valuations without monetary transfers, trading off economic efficiency with envy-based fairness. We introduce FairFormer, an amortized, permutation-equivariant two-tower transformer that encodes items and agents as unordered token sets, applies self-attention within each set, and uses item-to-agent cross-attention to produce per-item assignment distributions in a single forward pass. FairFormer is trained end-to-end to maximize expected log-Nash welfare […]

Ver mais

Like 0

Liked Liked

technocracy

Flow-based Conformal Prediction for Multi-dimensional Time Series

digitado ⋅ 23 de March de 2026

arXiv:2502.05709v3 Announce Type: replace-cross Abstract: Time series prediction underpins a broad range of downstream tasks across many scientific domains. Recent advances and increasing adoption of black-box machine learning models for time series prediction highlight the critical need for uncertainty quantification. While conformal prediction has gained attention as a reliable uncertainty quantification method, conformal prediction for time series faces two key challenges: (1) textbf{leveraging correlations in observations and non-conformity scores to overcome the exchangeability assumption}, and (2) textbf{constructing prediction […]

Ver mais

Like 0

Liked Liked

technocracy

The SJTU X-LANCE Lab System for MSR Challenge 2025

digitado ⋅ 11 de February de 2026

arXiv:2602.09042v1 Announce Type: new Abstract: This report describes the system submitted to the music source restoration (MSR) Challenge 2025. Our approach is composed of sequential BS-RoFormers, each dealing with a single task including music source separation (MSS), denoise and dereverb. To support 8 instruments given in the task, we utilize pretrained checkpoints from MSS community and finetune the MSS model with several training schemes, including (1) mixing and cleaning of datasets; (2) random mixture of music pieces for […]

Ver mais

Like 0

Liked Liked