Tags | YuanPang Blog

Light

Dark

Blog

Oct 11, 2025 Reinforcement Learning (RL) — From Fundamentals to PPO & GRPO in LLMs (II)
Oct 10, 2025 Reinforcement Learning (RL) — From Fundamentals to PPO & GRPO in LLMs (I)
Sep 10, 2025 Triton Introduction 💻
Mar 01, 2025 DeepSeek Reasoning Models Series
Mar 01, 2025 DeepSeek Base Models Series
Sep 01, 2024 Regression vs. Survival Analysis 🚀
Jun 01, 2024 Introduction of Quantization
May 01, 2024 How to evaluate NLP tasks

Blogs

Jul 01, 2026 Agent Harness Engineering Part 5: Real Agent Systems
May 27, 2026 Agent Harness Engineering Part 4: Context and Memory
May 25, 2026 Agent Harness Engineering Part 3: Lifecycle and Orchestration
May 23, 2026 Agent Harness Engineering Part 2: Tools and Protocols
May 23, 2026 Agent Harness Engineering Part 1: Execution Layer and Sandboxes
May 10, 2026 π0 Architecture Anatomy

CMU-Deep-Learning-Systems-2022

Dec 08, 2022 Generative Adversarial Networks
Dec 02, 2022 Transformer Implementation with Naive Numpy and Pytorch
Dec 02, 2022 Transformers and Autoregressive Models
Nov 24, 2022 LSTM Implementation
Nov 23, 2022 Sequence Modeling and Recurrent Networks
Nov 04, 2022 Convolutional Networks Implementation and Im2col
Oct 30, 2022 DLSys GPU Acceleration
Oct 25, 2022 DLSys Hardware Acceleration
Oct 22, 2022 Differentiating CNN
Oct 21, 2022 Implement Your Own Deep Learning Library using Automatic Differentiation II
Oct 20, 2022 Normalization and Regularization
Oct 18, 2022 Modularity in Deep Learning Package
Oct 15, 2022 Fully Connected Networks, Optimization, Initialization and Activations
Oct 10, 2022 Implement Your Own Deep Learning Library using Automatic Differentiation I
Oct 09, 2022 Introduction of Automatic Differentiation
Oct 03, 2022 Simple Neural Networks with Codes
Oct 02, 2022 Softmax Regression with Codes
Oct 01, 2022 DLSys Introduction

CMU-Robot-Learning-2024

May 03, 2025 Imitation Learning via Privileged Teachers and Generative Models like Diffusion
May 02, 2025 Markov Decision Processes (MDP) Basics and Imitation Learning
May 01, 2025 Robot Learning Overview
May 01, 2025 What is Robot Learning

MIT-Distributed-Systems-2021

Mar 01, 2023 Distributed Systems Introduction and MapReduce

MIT-Robotic-Manipulation-2023

Oct 12, 2023 Robot Basic Pick and Place III - Differential kinematics via optimization
Oct 11, 2023 Robot Basic Pick and Place II - Differential kinematics
Oct 10, 2023 Robot Basic Pick and Place I - kinematics and trajectories
Oct 04, 2023 Robot Hardware
Oct 01, 2023 Anatomy of a Manipulation System

MIT-TinyML-and-Efficient-Deep-Learning-2024

Nov 30, 2024 Quantum Machine Learning Introduction
Nov 28, 2024 On-device Training Introduction
Nov 21, 2024 Distributed Training Part 2
Nov 20, 2024 Distributed Training Part 1
Nov 15, 2024 Diffsion Models
Nov 10, 2024 GAN, Video, Point Cloud
Nov 03, 2024 Vision Transformer
Nov 02, 2024 Long-Context LLM
Oct 29, 2024 LLM Post-Training
Oct 27, 2024 LLM Deployment Techniques
Oct 25, 2024 Transformer and LLM
Oct 21, 2024 TinyML TinyEngine
Oct 20, 2024 TinyML MCUNet
Oct 09, 2024 Distillation Introduction
Oct 02, 2024 Neural Architecture Search
Sep 30, 2024 Model Quantization II
Sep 25, 2024 Model Quantization I
Sep 17, 2024 Pruning and Sparsity
Sep 10, 2024 TinyML Basics of Neural Networks
Sep 05, 2024 TinyML Introduction

Stanford-LLM-From-Scratch-2025

Oct 20, 2025 LLM Alignment - GRPO Implementation
Oct 15, 2025 LLM Alignment - Reinforcement Learning
Oct 03, 2025 LLM Alignment - SFT/RLHF
Sep 29, 2025 Filtering and Deduplication Algorithms for LLM Data Processing
Sep 29, 2025 The Crucial Role of Data in Training Language Models 💻
Sep 24, 2025 Evaluating Language Models — Beyond the Numbers 💻
Sep 23, 2025 Modern LLM Inference 💻
Sep 20, 2025 Scaling Laws Details with Examples 💻
Sep 20, 2025 Scaling laws 💻
Sep 06, 2025 LLM Training Parallelism Basics
Aug 06, 2025 GPU Kernels & Triton Programming 💻
Aug 05, 2025 GPUs for Deep Learning 🚀
Aug 03, 2025 Mixture of Experts 🤖
Aug 02, 2025 LLM Architectures and Hyperparameters 🧠
May 04, 2025 Language Modeling Resource Accounting
May 04, 2025 Language Modeling from Scratch Overview and Tokenization

UCB-Deep-Reinforcement-Learning-2023

Nov 23, 2023 Q-Functions in Reinforcement Learning
Nov 23, 2023 Value Function Methods in Reinforcement Learning
Nov 22, 2023 Actor-Critic Algorithms in Reinforcement Learning
Nov 21, 2023 Policy Gradients in Reinforcement Learning
Nov 20, 2023 Reinforcement Learning Introduction
Sep 01, 2023 Imitation Learning
Sep 01, 2023 Reinforcement Learning Introduction

UCB-LLM-Agents-2024

Nov 01, 2024 LLM Agents Introduction

YouTube

May 06, 2026 The Three Eras of Robot Learning
Feb 10, 2024 Recommender System 3 -- Ranking
Feb 05, 2024 Recommender System 2 -- Retrieval
Feb 01, 2024 Recommender System 1 -- Introduction
Jun 07, 2023 AlphaGo, AlphaGo Zero, and AlphaZero - Deep Reinforcement Learning Meets Search
Jun 05, 2023 The Actor–Critic Method
Jun 05, 2023 Policy-Based Reinforcement Learning
Jun 05, 2023 Value-Based Reinforcement Learning Foundations
Jun 05, 2023 Reinforcement Learning Basics