July 2025

How LLMs Work: A Beginner’s Guide to Decoder-Only Transformers

digitado ⋅ 15 de July de 2025

A language model like GPT (which stands for Generative Pretrained Transformer) takes text, breaks it into tokens (words or subwords), converts those tokens into numbers, processes those numbers through layers of Transformer decoders, and finally outputs a probability distribution over all possible tokens in its vocabulary. It then selects the token with the highest probability. This process repeats until a full response is generated. If you’re new to the Transformer architecture, this might sound too much, but stick […]

Ver mais

Like 0

Liked Liked

technocracy

Hierarchical Clustering: A Tree-Based Approach to Data Grouping

digitado ⋅ 10 de July de 2025

In this blog, you will explore hierarchical clustering in Python, understand its application in machine learning, and review a practical hierarchical clustering example. We will delve into the hierarchical clustering algorithm, compare its implementation in R, and discuss its significance in data mining. What is Hierarchical Clustering?Types of Hierarchical ClusteringAgglomerative Clustering HierarchicalDivisive Hierarchical ClusteringSupervised vs Unsupervised LearningWhy Hierarchical Clustering?No Need to Pre-specify Number of ClustersCaptures Nested ClustersFlexibility with Cluster ShapesDistance Metrics and Linkage CriteriaHandling OutliersRobustness to InitializationVisual InterpretationPractical ExampleSteps […]

Ver mais

Like 0

Liked Liked

technocracy

How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models

digitado ⋅ 4 de July de 2025

TL;DR Vanishing or exploding gradients are common training instabilities observed in foundation models. Real-time gradient-norm monitoring using experiment trackers like neptune.ai enables early detection and mitigation. Implementing stabilization techniques such as gradient clipping and optimizing weight initialization and learning rate schedules improves the training convergence and stability. As foundation models scale to billions or even trillions of parameters, they often exhibit training instabilities, particularly vanishing and exploding gradients. During the initial training phase (pre-training), it is common to […]

Ver mais

Like 0

Liked Liked

technocracy

Whole-Body Conditioned Egocentric Video Prediction

digitado ⋅ 1 de July de 2025

× Predicting Ego-centric Video from human Actions (PEVA). Given past video frames and an action specifying a desired change in 3D pose, PEVA predicts the next video frame. Our results show that, given the first frame and a sequence of actions, our model can generate videos of atomic actions (a), simulate counterfactuals (b), and support long video generation (c). Recent years have brought significant advances in world models that learn to simulate future outcomes for planning and control. […]

Ver mais

Like 0

Liked Liked