March 2026

Gemma Needs Help: Investigating and Mitigating Emotional Instability in LLMs

digitado ⋅ 12 de March de 2026

arXiv:2603.10011v1 Announce Type: new Abstract: Large language models can generate responses that resemble emotional distress, and this raises concerns around model reliability and safety. We introduce a set of evaluations to investigate expressions of distress in LLMs, and find that these surface emotional instability in Gemma and Gemini models, but not in other families. We find evidence that this difference arises in post-training. Base models from different families (Gemma, Qwen and OLMo) show similar propensities for expressing distress. […]

Ver mais

Like 0

Liked Liked

technocracy

FERRET: Framework for Expansion Reliant Red Teaming

digitado ⋅ 12 de March de 2026

arXiv:2603.10010v1 Announce Type: new Abstract: We introduce a multi-faceted automated red teaming framework in which the goal is to generate multi-modal adversarial conversations that would break a target model and introduce various expansions that would result in more effective and efficient adversarial conversations. The introduced expansions include: 1. Horizontal expansion in which the goal is for the red team model to self-improve and generate more effective conversation starters that would shape a conversation. 2. Vertical expansion in which […]

Ver mais

Like 0

Liked Liked

technocracy

Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment

digitado ⋅ 12 de March de 2026

arXiv:2603.10009v1 Announce Type: new Abstract: Despite their sophisticated general-purpose capabilities, Large Language Models (LLMs) often fail to align with diverse individual preferences because standard post-training methods, like Reinforcement Learning with Human Feedback (RLHF), optimize for a single, global objective. While Group Relative Policy Optimization (GRPO) is a widely adopted on-policy reinforcement learning framework, its group-based normalization implicitly assumes that all samples are exchangeable, inheriting this limitation in personalized settings. This assumption conflates distinct user reward distributions and systematically […]

Ver mais

Like 0

Liked Liked

technocracy

GATech at AbjadMed: Bidirectional Encoders vs. Causal Decoders: Insights from 82-Class Arabic Medical Classification

digitado ⋅ 12 de March de 2026

arXiv:2603.10008v1 Announce Type: new Abstract: This paper presents system description for Arabic medical text classification across 82 distinct categories. Our primary architecture utilizes a fine-tuned AraBERTv2 encoder enhanced with a hybrid pooling strategies, combining attention and mean representations, and multi-sample dropout for robust regularization. We systematically benchmark this approach against a suite of multilingual and Arabic-specific encoders, as well as several large-scale causal decoders, including zero-shot re-ranking via Llama 3.3 70B and feature extraction from Qwen 3B hidden […]

Ver mais

Like 0

Liked Liked

technocracy

GATech at AbjadGenEval Shared Task: Multilingual Embeddings for Arabic Machine-Generated Text Classification

digitado ⋅ 12 de March de 2026

arXiv:2603.10007v1 Announce Type: new Abstract: We present our approach to the AbjadGenEval shared task on detecting AI-generated Arabic text. We fine-tuned the multilingual E5-large encoder for binary classification, and we explored several pooling strategies to pool token representations, including weighted layer pooling, multi-head attention pooling, and gated fusion. Interestingly, none of these outperformed simple mean pooling, which achieved an F1 of 0.75 on the test set. We believe this is because complex pooling methods introduce additional parameters that […]

Ver mais

Like 0

Liked Liked

technocracy

Adaptive Engram Memory System for Indonesian Language Model: Generative AI Based on TOBA LM for Batak and Minang Language

digitado ⋅ 12 de March de 2026

arXiv:2603.10006v1 Announce Type: new Abstract: This study presents TOBA-LM, a trilingual language model based on GPT-2 architecture with 1.2 billion parameters, trained on a corpus encompassing Indonesian, Batak, and Minangkabau using syllabic-agglutinative tokenization. The architecture integrates an Engram Memory mechanism, an adaptive n-gram-based memory system with a 500,000 x 768 embedding table that captures morphological dependencies through bigram and trigram pathways. Empirical results demonstrate a training efficiency of 80%, with the loss value dropping from 6.4 to 1.7996 […]

Ver mais

Like 0

Liked Liked

technocracy

Fine-Tune, Don’t Prompt, Your Language Model to Identify Biased Language in Clinical Notes

digitado ⋅ 12 de March de 2026

arXiv:2603.10004v1 Announce Type: new Abstract: Clinical documentation can contain emotionally charged language with stigmatizing or privileging valences. We present a framework for detecting and classifying such language as stigmatizing, privileging, or neutral. We constructed a curated lexicon of biased terms scored for emotional valence. We then used lexicon-based matching to extract text chunks from OB-GYN delivery notes (Mount Sinai Hospital, NY) and MIMIC-IV discharge summaries across multiple specialties. Three clinicians annotated all chunks, enabling characterization of valence patterns […]

Ver mais

Like 0

Liked Liked

technocracy

Probing the Limits of the Lie Detector Approach to LLM Deception

digitado ⋅ 12 de March de 2026

arXiv:2603.10003v1 Announce Type: new Abstract: Mechanistic approaches to deception in large language models (LLMs) often rely on “lie detectors”, that is, truth probes trained to identify internal representations of model outputs as false. The lie detector approach to LLM deception implicitly assumes that deception is coextensive with lying. This paper challenges that assumption. It experimentally investigates whether LLMs can deceive without producing false statements and whether truth probes fail to detect such behavior. Across three open-source LLMs, it […]

Ver mais

Like 0

Liked Liked

technocracy

SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks

digitado ⋅ 12 de March de 2026

arXiv:2603.10002v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly tasked with producing and manipulating structured artifacts. We consider the task of end-to-end spreadsheet generation, where language models are prompted to produce spreadsheet artifacts to satisfy users’ explicit and implicit constraints, specified in natural language. We introduce SpreadsheetArena, a platform for evaluating models’ performance on the task via blind pairwise evaluations of LLM-generated spreadsheet workbooks. As with other complex, open-ended tasks, relevant evaluation criteria can vary substantially […]

Ver mais

Like 0

Liked Liked

technocracy

A Retrieval-Augmented Language Assistant for Unmanned Aircraft Safety Assessment and Regulatory Compliance

digitado ⋅ 12 de March de 2026

arXiv:2603.09999v1 Announce Type: new Abstract: This paper presents the design and validation of a retrieval-based assistant that supports safety assessment, certification activities, and regulatory compliance for unmanned aircraft systems. The work is motivated by the growing complexity of drone operations and the increasing effort required by applicants and aviation authorities to apply established assessment frameworks, including the Specific Operations Risk Assessment and the Pre-defined Risk Assessment, in a consistent and efficient manner. The proposed approach uses a controlled […]

Ver mais

Like 0

Liked Liked