January 2026

Stop Using Self-Joins: How Using GroupBy and Filters Instead Can Save Massive Time and Cost in…

digitado ⋅ 21 de January de 2026

Stop Using Self-Joins: How Using GroupBy and Filters Instead Can Save Massive Time and Cost in PySpark When working with large datasets in PySpark, it’s rather common in a notebook to see a table joined to itself using an inner join. While this approach is straightforward and intuitive, it often comes with a steep price: long runtimes, excessive shuffles, and inflated compute costs. In many real-world scenarios, you can replace a self–inner join with a groupBy + aggregation + filter […]

Ver mais

Like 0

Liked Liked

technocracy

Relational Graph Modeling for Credit Default Prediction: Heterogeneous GNNs and Hybrid Ensemble Learning

digitado ⋅ 21 de January de 2026

Credit default risk arises from complex interactions among borrowers, financial institutions, and transaction-level behaviors. While strong tabular models remain highly competitive in credit scoring, they may fail to explicitly capture cross-entity dependencies embedded in multi-table financial histories. In this work, we construct a massive-scale heterogeneous graph containing over 31 million nodes and more than 50 million edges, integrating borrower attributes with granular transaction-level entities such as installment payments, POS cash balances, and credit card histories. We evaluate heterogeneous […]

Ver mais

Like 0

Liked Liked

technocracy

Rethinking Reinforcement fine-tuning of LLMs: A Multi-armed Bandit Learning Perspective

digitado ⋅ 21 de January de 2026

A large number of heuristics have been proposed to optimize the reinforcement fine-tuning of LLMs. However, inconsistent claims are made from time to time, making this area elusive. Reflecting on this situation, two fundamental questions still lack a clear understanding: 1) what is the role of each optimizing choice? 2) which ones are the bottlenecks? This paper aims to shed light on them, and it faces the challenge of several entangled confounding factors in the fine-tuning process. To […]

Ver mais

Like 0

Liked Liked

technocracy

From Volumes to Slices: Computationally Efficient Contrastive Learning for Sequential Abdominal CT Analysis

digitado ⋅ 21 de January de 2026

The requirement for expert annotations limits the effectiveness of deep learning for medical image analysis. Although 3D self-supervised methods like volume contrast learning (VoCo) are powerful and partially address the labeling scarcity issue, their high computational cost and memory consumption are barriers. We propose 2D-VoCo, an efficient adaptation of the VoCo framework for slice-level self-supervised pre-training that learns spatial-semantic features from unlabeled 2D CT slices via contrastive learning. The pre-trained CNN backbone is then integrated into a CNN-LSTM […]

Ver mais

Like 0

Liked Liked

technocracy

Webb reveals a planetary nebula with phenomenal clarity, and it is spectacular

digitado ⋅ 21 de January de 2026

The Helix Nebula is one of the most well-known and commonly photographed planetary nebulae because it resembles the “Eye of Sauron.” It is also one of the closest bright nebulae to Earth, located approximately 655 light-years from our Solar System. You may not know what this particular nebula looks like when reading its name, but the Hubble Space Telescope has taken some iconic images of it over the years. And almost certainly, you’ll recognize a photograph of the […]

Ver mais

Like 0

Liked Liked

technocracy

[Free AI Resource] I released a free book on freeCodeCamp: “The Math Behind AI”

digitado ⋅ 21 de January de 2026

I have been writing articles on freeCodeCamp for a while (20+ articles, 240K+ views). Recently, I completed my biggest project! I explain the math from an engineering perspective and connect how math solves real life problems and makes billion dollar industries possible. For example, in “Chapter 6: Probability & Statistics – Learning from Uncertainty” I explain how Markov chains allow the application of the Markov decision processes, which is the foundation for all RL and DRL. The chapters: […]

Ver mais

Like 0

Liked Liked

technocracy

Zuck stuck on Trump’s bad side: FTC appeals loss in Meta monopoly case

digitado ⋅ 21 de January de 2026

Still feeling uneasy about Meta’s acquisition of Instagram in 2012 and WhatsApp in 2014, the Federal Trade Commission will be appealing a November ruling that cleared Meta of allegations that it holds an illegal monopoly in a market dubbed “personal social networking.” The FTC hopes the US Court of Appeals for the District of Columbia will agree that “robust evidence at trial” showed that Meta’s acquisitions were improper. In the initial trial, the FTC sought a breakup of […]

Ver mais

Like 0

Liked Liked

technocracy

Electricity use of AI coding agents

digitado ⋅ 21 de January de 2026

Electricity use of AI coding agents Previous work estimating the energy and water cost of LLMs has generally focused on the cost per prompt using a consumer-level system such as ChatGPT. Simon P. Couch notes that coding agents such as Claude Code use way more tokens in response to tasks, often burning through many thousands of tokens of many tool calls. As a heavy Claude Code user, Simon estimates his own usage at the equivalent of 4,400 “typical […]

Ver mais

Like 0

Liked Liked

technocracy

Verizon starts requiring 365 days of paid service before it will unlock phones

digitado ⋅ 20 de January de 2026

Verizon has started enforcing a 365-day lock period on phones purchased through its TracFone division, one week after the Federal Communications Commission waived a requirement that Verizon unlock handsets 60 days after they are activated on its network. Verizon was previously required to unlock phones automatically after 60 days due to restrictions imposed on its spectrum licenses and merger conditions that helped Verizon obtain approval of its purchase of TracFone. But an update applied today to the TracFone […]

Ver mais

Like 0

Liked Liked

technocracy

Learning PDE Solvers with Physics and Data: A Unifying View of Physics-Informed Neural Networks and Neural Operators

digitado ⋅ 20 de January de 2026

Partial differential equations (PDEs) are central to scientific modeling. Modern workflows increasingly rely on learning-based components to support model reuse, inference, and integration across large computational processes. Despite the emergence of various physics-aware data-driven approaches, the field still lacks a unified perspective to uncover their relationships, limitations, and appropriate roles in scientific workflows. To this end, we propose a unifying perspective to place two dominant paradigms: Physics-Informed Neural Networks (PINNs) and Neural Operators (NOs), within a shared design […]

Ver mais

Like 0

Liked Liked