digitado

Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Models

digitado ⋅ 2 de March de 2026

arXiv:2602.23589v1 Announce Type: new Abstract: Recent multimodal models such as Contrastive Language-Image Pre-training (CLIP) have shown remarkable ability to align visual and linguistic representations. However, domains where small visual differences carry large semantic significance, such as diagram understanding, remain challenging due to the models’ limited sensitivity to fine-grained structural variations. We propose a new training paradigm designed to enhance diagram comprehension in vision-language models. Our approach introduces pseudo contrastive samples generated by a diagram renderer that creates synthetic […]

Ver mais

Like 0

Liked Liked

technocracy

Neocities founder stuck in chatbot hell after Bing blocked 1.5 million sites

digitado ⋅ 5 de February de 2026

One of the weirdest corners of the Internet is suddenly hard to find on Bing, after the search engine inexplicably started blocking approximately 1.5 million independent websites hosted on Neocities. Founded in 2013 to archive the “aesthetic awesomeness” of GeoCities websites, Neocities keeps the spirit of the 1990s Internet alive. It lets users design free websites without relying on standardized templates devoid of personality. For hundreds of thousands of people building websites around art, niche fandoms, and special […]

Ver mais

Like 0

Liked Liked

technocracy

FedHB: Hierarchical Bayesian Federated Learning

digitado ⋅ 3 de March de 2026

arXiv:2305.04979v2 Announce Type: replace-cross Abstract: We propose a novel hierarchical Bayesian approach to Federated Learning (FL), where our model reasonably describes the generative process of clients’ local data via hierarchical Bayesian modeling: constituting random variables of local models for clients that are governed by a higher-level global variate. Interestingly, the variational inference in our Bayesian model leads to an optimisation problem whose block-coordinate descent solution becomes a distributed algorithm that is separable over clients and allows them not […]

Ver mais

Like 0

Liked Liked

technocracy

Federated Learning Under Temporal Drift — Mitigating Catastrophic Forgetting via Experience Replay

digitado ⋅ 20 de January de 2026

Federated Learning struggles under temporal concept drift where client data distributions shift over time. We demonstrate that standard FedAvg suffers catastrophic forgetting under seasonal drift on Fashion-MNIST, with accuracy dropping from 74% to 28%. We propose client-side experience replay, where each client maintains a small buffer of past samples mixed with current data during local training. This simple approach requires no changes to server aggregation. Experiments show that a 50-sample-per-class buffer restores performance to 78-82%, effectively preventing forgetting. […]

Ver mais

Like 0

Liked Liked

technocracy

E3VA: Enhancing Emotional Expressiveness in Virtual Conversational Agents

digitado ⋅ 27 de February de 2026

arXiv:2602.22362v1 Announce Type: new Abstract: With the advent of generative AI and large language models, embodied conversational agents are becoming synonymous with online interactions. These agents possess vast amounts of knowledge but suffer from exhibiting limited emotional expressiveness. Without adequate expressions, agents might fail to adapt to users’ emotions, which may result in a sub-optimal user experience and engagement. Most current systems prioritize content based responses, neglecting the emotional context of conversations. Research in this space is currently […]

Ver mais

Like 0

Liked Liked

technocracy

Meta’s Avocado AI Model Reportedly Outperforming Rivals Even Before Launch

digitado ⋅ 5 de February de 2026

Ever since Meta announced its quarterly earnings, the upcoming foundational model, codenamed “Avocado,” has got all the hype. Now, according to an exclusive report by The Information, Avocado has completed its pre-training phase and is already outperforming leading open-source pre-trained models in testing. That’s according to an internal memo shared by Megan Fu, Product Manager of Meta’s Super Intelligence Lab. Why Avocado’s early lests have everyone buzzing inside Meta The memo, sent out last month, on January 20, […]

Ver mais

Like 0

Liked Liked

technocracy

Annealing in variational inference mitigates mode collapse: A theoretical study on Gaussian mixtures

digitado ⋅ 16 de February de 2026

arXiv:2602.12923v1 Announce Type: new Abstract: Mode collapse, the failure to capture one or more modes when targetting a multimodal distribution, is a central challenge in modern variational inference. In this work, we provide a mathematical analysis of annealing based strategies for mitigating mode collapse in a tractable setting: learning a Gaussian mixture, where mode collapse is known to arise. Leveraging a low dimensional summary statistics description, we precisely characterize the interplay between the initial temperature and the annealing […]

Ver mais

Like 0

Liked Liked

technocracy

[D] DeepDanbooru v3 PyTorch Port: Constant 0.5 or 0 output after loading weights

digitado ⋅ 25 de January de 2026

I’m porting DeepDanbooru v3 (Janouch port) to PyTorch. After mapping 209 layers from Safetensors, the model outputs exactly 0.5 for all tags. I’ve tracked it back to the Batch Normalization layers. It seems like the ‘running_var’ values are causing a collapse. Is this a known issue when converting Keras/TensorFlow weights to PyTorch for ResNet architectures? Should I manually initialize the BN stats? submitted by /u/RevolutionaryAge70 [link] [comments]

Ver mais

Like 0

Liked Liked

technocracy

Deriving Neural Scaling Laws from the statistics of natural language

digitado ⋅ 13 de February de 2026

arXiv:2602.07488v2 Announce Type: replace-cross Abstract: Despite the fact that experimental neural scaling laws have substantially guided empirical progress in large-scale machine learning, no existing theory can quantitatively predict the exponents of these important laws for any modern LLM trained on any natural language dataset. We provide the first such theory in the case of data-limited scaling laws. We isolate two key statistical properties of language that alone can predict neural scaling exponents: (i) the decay of pairwise token […]

Ver mais

Like 0

Liked Liked

technocracy

Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study

digitado ⋅ 12 de February de 2026

arXiv:2602.10140v1 Announce Type: new Abstract: Large language models (LLMs) can now synthesize non-trivial executable code from textual descriptions, raising an important question: can LLMs reliably implement agent-based models from standardized specifications in a way that supports replication, verification, and validation? We address this question by evaluating 17 contemporary LLMs on a controlled ODD-to-code translation task, using the PPHPC predator-prey model as a fully specified reference. Generated Python implementations are assessed through staged executability checks, model-independent statistical comparison against […]

Ver mais

Like 0

Liked Liked