HOME ARCHIVES CATEGORIES TAGS ABOUT ME

✏ Welcome to Yuanpang Blog

👋 Hi there, I’m Leon—though you might know me as Zhiyuan or Yuanpang! Welcome to my blog, where I share my learning notes, experiences, and insights on AI and tech. Spoiler alert: I lean heavily on AI to keep things flowing smoothly! Thanks for dropping by and happy reading!

Light

Dark

Agent Harness Engineering Part 5: Real Agent Systems

Jul 01, 2026 About 27 mins

Agent Harness Engineering Part 5: Real Agent Systems This is the final part of my notes on Agent Harness Engineering: A Survey of LLM Infrastructure. Part 1 covered E: Execution Environment and Sandbox. Part 2 covered T: Tool Interface and Protocol Layer. Part 3 covered L: Lifecycle and Orchestration. Part 4 covered C: Context and Memory Manag... Read More

#Blogs
Agent Harness Engineering Part 4: Context and Memory

May 27, 2026 About 56 mins

Agent Harness Engineering Part 4: Context and Memory This post is the fourth part of my notes on Agent Harness Engineering: A Survey of LLM Infrastructure. Part 1 covered E: Execution Environment and Sandbox. Part 2 covered T: Tool Interface and Protocol Layer. Part 3 covered L: Lifecycle and Orchestration. This post focuses on C: Context and ... Read More

#Blogs
Agent Harness Engineering Part 3: Lifecycle and Orchestration

May 25, 2026 About 45 mins

Agent Harness Engineering Part 3: Lifecycle and Orchestration This post is the third part of my notes on Agent Harness Engineering: A Survey of LLM Infrastructure. Part 1 covered E: Execution Environment and Sandbox. Part 2 covered T: Tool Interface and Protocol Layer. This post focuses on L: Lifecycle and Orchestration from section 6 of the p... Read More

#Blogs
Agent Harness Engineering Part 2: Tools and Protocols

May 23, 2026 About 54 mins

Agent Harness Engineering Part 2: Tools and Protocols This post is the second part of my notes on Agent Harness Engineering: A Survey of LLM Infrastructure. Part 1 discussed E: Execution Environment and Sandbox: where actions run and what physically contains them. This post focuses on T: Tool Interface and Protocol Layer from the paper: how an... Read More

#Blogs
Agent Harness Engineering Part 1: Execution Layer and Sandboxes

May 23, 2026 About 53 mins

Agent Harness Engineering Part 1: Execution Layer and Sandboxes This post is the first part of my notes on Agent Harness Engineering: A Survey of LLM Infrastructure. The paper’s main idea is useful: for long-running agents, the model is no longer the only bottleneck. Reliability also depends on the harness around the model: where it runs, whic... Read More

#Blogs
π0 Architecture Anatomy

May 10, 2026 About 26 mins

π0 Architecture Anatomy π0 is a Vision-Language-Action (VLA) model for robot control. The key idea is to combine a pretrained vision-language backbone with a continuous action generator. Instead of asking a language model to emit robot actions as text tokens, π0 predicts a velocity field that transforms random action noise into a smooth chunk o... Read More

#Blogs
The Three Eras of Robot Learning

May 06, 2026 About 29 mins

The Three Eras of Robot Learning Robot learning has changed dramatically over the last decade. The field started with fragile but inspiring demos, moved into foundation models and vision-language-action policies, and is now entering a scaling phase where the central questions are no longer only about model architecture. They are also about data... Read More

#YouTube
LLM Alignment - GRPO Implementation

Oct 20, 2025 About 36 mins

LLM Alignment - GRPO Implementation The blog transitions to a practical walkthrough illustrating policy gradient mechanics through the lens of GRPO (Group Relative Policy Optimization). GRPO simplifies PPO by removing the Value function (critic) and instead leveraging the group structure of LLM rollouts — i.e., multiple responses per prompt — t... Read More

#Stanford-LLM-From-Scratch-2025
LLM Alignment - Reinforcement Learning

Oct 15, 2025 About 38 mins

LLM Alignment - Reinforcement Learning This post continues the exploration of Reinforcement Learning (RL) techniques for aligning Large Language Models (LLMs) — tracing the evolution from Direct Preference Optimization (DPO) to Proximal Policy Optimization (PPO) and Generalized Reinforcement Learning with Policy Optimization (GRPO), culminating... Read More

#Stanford-LLM-From-Scratch-2025
Reinforcement Learning (RL) — From Fundamentals to PPO & GRPO in LLMs (II)

Oct 11, 2025 About 27 mins

Reinforcement Learning (RL) — From Fundamentals to PPO & GRPO in LLMs (II) This blog continues to provide the advanced policy optimization techniques like PPO and TRPO, and concluding with the application of RL in Large Language Models (LLMs) via PPO and GRPO. 🚀 Section 2: Policy Gradient and Trust Region Optimization … In the last blog, ... Read More

#Blog
Reinforcement Learning (RL) — From Fundamentals to PPO & GRPO in LLMs (I)

Oct 10, 2025 About 31 mins

Reinforcement Learning (RL) — From Fundamentals to PPO & GRPO in LLMs (I) This blog provides a detailed study of Reinforcement Learning (RL), starting from fundamental concepts and algorithms, moving through advanced policy optimization techniques like PPO and TRPO, and concluding with the application of RL in Large Language Models (LLMs) v... Read More

#Blog
LLM Alignment - SFT/RLHF

Oct 03, 2025 About 13 mins

🎓 RLHF & Alignment: Making LLMs Useful and Safe This lecture, CS336 Lecture 15, dives into Reinforcement Learning from Human Feedback (RLHF) and alignment — the crucial post-training step that makes large pre-trained models like GPT-3 become helpful and safe assistants (like InstructGPT and ChatGPT). It follows the classic three-step proces... Read More

#Stanford-LLM-From-Scratch-2025
Filtering and Deduplication Algorithms for LLM Data Processing

Sep 29, 2025 About 6 mins

Filtering and Deduplication Algorithms for LLM Data Processing This lecture dives deeply into how raw web data is transformed into clean, usable training data for large language models (LLMs), focusing specifically on filtering and deduplication algorithms. Raw data moves from live services to dumps or crawls and must undergo HTML-to-text conve... Read More

#Stanford-LLM-From-Scratch-2025
The Crucial Role of Data in Training Language Models 💻

Sep 29, 2025 About 5 mins

The Crucial Role of Data in Training Language Models This lecture highlights the central role of data in the development of language models, following previous discussions about architectures and training strategies. It dissects the data pipeline, explores historical datasets, and addresses legal and ethical issues surrounding data use. Course... Read More

#Stanford-LLM-From-Scratch-2025
Evaluating Language Models — Beyond the Numbers 💻

Sep 24, 2025 About 22 mins

Evaluating Language Models — Beyond the Numbers 💻 This lecture provides a deep dive into the evaluation of language models, showing that while it seems simple, it’s actually a complex and profound discipline that shapes AI’s progress. It’s structured around key concepts and modern benchmark categories that define how we measure and compare inte... Read More

#Stanford-LLM-From-Scratch-2025
Modern LLM Inference 💻

Sep 23, 2025 About 31 mins

🚀 Modern LLM Inference — Workloads, Bottlenecks, and Optimization Techniques 🧠 From defining inference to lossy/lossless acceleration and dynamic serving — a full-stack view of how modern LLM inference actually works. Course link 1. 🏁 Introduction: What Is Inference and Why It’s Hard ✅ What Is Inference? Inference answers one fundamental que... Read More

#Stanford-LLM-From-Scratch-2025
Scaling Laws Details with Examples 💻

Sep 20, 2025 About 18 mins

⚖️ Scaling — Case Study and Details This lecture, “Scaling – Case Study and Details,” dives into best practices for scaling and hyperparameter tuning in large language models (LLMs). It revisits whether the Chinchilla-derived scaling methodologies still hold in modern model development and explores recent case studies (CerebrasGPT, MiniCPM, Dee... Read More

#Stanford-LLM-From-Scratch-2025
Scaling laws 💻

Sep 20, 2025 About 19 mins

⚙️ The Predictable World of Scaling Laws in Language Models Scaling laws provide simple, predictive rules 📈 that govern the performance of Language Models (LMs), offering a pathway to optimize large-scale model design without relying on expensive, full-scale experimentation. They enable developers to tune hyperparameters on small models and con... Read More

#Stanford-LLM-From-Scratch-2025
Triton Introduction 💻

Sep 10, 2025 About 1 hour 35 mins

Triton Introduction Here we use Triton to implement the weighted sum kernel (both forward and backward pass) as an example. The implementation is taking from the assignment 2 of the Stanford CS336 lecture. import triton import triton.language as tl import torch from einops import rearrange from triton import cdiv import time @triton.jit def ... Read More

#Blog
LLM Training Parallelism Basics

Sep 06, 2025 About 30 mins

LLM Training Parallelism Basics “Parallelism Basics” focuses on the system complexities behind training massive language models (LMs) that exceed a single GPU’s capacity. Goals: Understand different parallelization paradigms. Learn why multiple methods are combined. See how large-scale training is organized. Course link Code Link in th... Read More

#Stanford-LLM-From-Scratch-2025
GPU Kernels & Triton Programming 💻

Aug 06, 2025 About 18 mins

GPU Kernels & Triton Programming 💻 This lecture dives into writing high-performance GPU code, which is essential for accelerating language models. The Challenge: Bridging the gap between high-level frameworks like PyTorch and the underlying GPU hardware, which often leads to “performance mysteries.” The Goal: To effectively optimize c... Read More

#Stanford-LLM-From-Scratch-2025
GPUs for Deep Learning 🚀

Aug 05, 2025 About 12 mins

GPUs for Deep Learning 🚀 This lecture synthesizes key insights on GPUs, focusing on their architecture, performance bottlenecks, and advanced optimization techniques crucial for scaling large language models (LLMs). 🔥 Core Message: While GPU computational power (especially for matrix multiplications) has scaled exponentially, memory access ... Read More

#Stanford-LLM-From-Scratch-2025
Mixture of Experts 🤖

Aug 03, 2025 About 8 mins

Mixture of Experts 🤖 Mixture of Experts (MoE) architectures have rapidly become a cornerstone in developing high-performance, large-scale language models (LLMs). Once a “bonus lecture” topic, MoEs are now fundamental to state-of-the-art systems. 🚀 High Performance: MoEs offer significant advantages over traditional dense models in terms of ... Read More

#Stanford-LLM-From-Scratch-2025
LLM Architectures and Hyperparameters 🧠

Aug 02, 2025 About 26 mins

LLM Architectures and Hyperparameters 🧠 This lecture summarizes key architectural trends, hyperparameter choices, and stability tricks observed in modern Large Language Models (LLMs). 📈 Architectural Trends: While the field is rapidly evolving, a “convergent evolution” towards “LLaMA-like” architectures is evident. 🔑 Key Consensus: Widesp... Read More

#Stanford-LLM-From-Scratch-2025
Language Modeling Resource Accounting

May 04, 2025 About 26 mins

Language Model Training - PyTorch Primitives & Resource Accounting This blog summarizes key concepts from Stanford CS336 Lecture 2, focusing on PyTorch primitives, efficient resource accounting (memory and compute), and foundational elements of training deep learning models from scratch. Memory # Parameters num_parameters = (D * D *... Read More

#Stanford-LLM-From-Scratch-2025
Language Modeling from Scratch Overview and Tokenization

May 04, 2025 About 11 mins

Language Modeling from Scratch Overview and Tokenization Course link 1. Course Philosophy and Motivation: The “Build It from Scratch” Ethos Philosophy: “To understand it, you have to build it.” Problem: Increasing abstraction in AI research disconnects researchers from the underlying systems. Goal: Combat abstraction crisis by re-engag... Read More

#Stanford-LLM-From-Scratch-2025
Imitation Learning via Privileged Teachers and Generative Models like Diffusion

May 03, 2025 About 10 mins

🧠 Imitation Learning via Privileged Teachers and Generative Models like Diffusions This lecture builds upon previous discussions on Imitation Learning (IL) and delves into advanced techniques and current research areas. Course link 🔁 1. Recap of Previous Lecture (Imitation Learning Part 1) Lecture 6 begins with a brief review of key concepts... Read More

#CMU-Robot-Learning-2024
Markov Decision Processes (MDP) Basics and Imitation Learning

May 02, 2025 About 8 mins

🧠 Markov Decision Processes (MDP) Basics and Imitation Learning This lecture provides a review of key themes and concepts related to Imitation Learning (IL) in the context of Robot Learning Course link 🔑 Key Themes and Concepts 1. Markov Decision Processes (MDPs) and Partially Observed MDPs (POMDPs) Sequential decision-making in robotics is... Read More

#CMU-Robot-Learning-2024
Robot Learning Overview

May 01, 2025 About 7 mins

🧠 Robot Learning Overview This lecture provides a comprehensive introduction to Robot Learning, a field focused on achieving embodied intelligence in the physical world. The key challenge lies in the integration of algorithms, data, computation, and hardware to allow robots to perform tasks that are easy for humans but hard for machines — a phe... Read More

#CMU-Robot-Learning-2024
What is Robot Learning

May 01, 2025 About 3 mins

🤖 Introduction to Robot Learning (CMU 16-831) I. 📘 Course Overview and Core Concepts The “16-831: Introduction to Robot Learning” course, taught by Professor Guanya Shi at Carnegie Mellon University (CMU), focuses on the fundamental principles and applications of robot learning. Theme: “Learning to make sequential decisions in the physical... Read More

#CMU-Robot-Learning-2024
DeepSeek Reasoning Models Series

Mar 01, 2025 About 17 mins

DeepSeek Reasoning Models Series 📌 In Part Two, we focus on DeepSeek’s Reasoning Models. DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence MATH-SHEPHERD: VERIFY AND REINFORCE LLMS STEP-BY-STEP WITHOUT HUMAN A... Read More

#Blog
DeepSeek Base Models Series

Mar 01, 2025 About 36 mins

🧠 DeepSeek Base Models Series The blog is structured around two primary categories of DeepSeek’s work: Base Models and Reasoning Models. 📌 In Part One, we focus on DeepSeek’s Base Models. We’ll walk through the core ideas of four foundational DeepSeek papers, along with example PyTorch code to illustrate key components like Multi-head Latent A... Read More

#Blog
Quantum Machine Learning Introduction

Nov 30, 2024 About 5 mins

Quantum Machine Learning Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
On-device Training Introduction

Nov 28, 2024 About 4 mins

TinyML On-device Training Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumptio... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Distributed Training Part 2

Nov 21, 2024 About 5 mins

TinyML Distributed Training Part 2 Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy c... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Distributed Training Part 1

Nov 20, 2024 About 16 mins

TinyML Distributed Training Part 1 Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy c... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Diffsion Models

Nov 15, 2024 About 15 mins

TinyML Diffsion Models Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption. ... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
GAN, Video, Point Cloud

Nov 10, 2024 About 17 mins

TinyML GAN, Video, Point Cloud Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consu... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Vision Transformer

Nov 03, 2024 About 21 mins

Vision Transformer Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption. Cou... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Long-Context LLM

Nov 02, 2024 About 4 mins

Long-Context LLM Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption. Cours... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
LLM Agents Introduction

Nov 01, 2024 About 11 mins

LLM agents: brief history and overview Great talk from Shunyu Yao. Course Link What is an Agent? 🤖 In the realm of Artificial Intelligence, an “agent” is defined as an intelligent system capable of interacting with its environment. This interaction involves perceiving the environment through observations and acting upon it through actions. The... Read More

#UCB-LLM-Agents-2024
LLM Post-Training

Oct 29, 2024 About 33 mins

TinyML LLM Post-Training Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
LLM Deployment Techniques

Oct 27, 2024 About 58 mins

TinyML LLM Deployment Techniques Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy con... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Transformer and LLM

Oct 25, 2024 About 29 mins

Transformer and LLM Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption. Co... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
TinyML TinyEngine

Oct 21, 2024 About 8 mins

TinyML TinyEngine Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption. Cour... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
TinyML MCUNet

Oct 20, 2024 About 3 mins

TinyML MCUNet Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption. Course l... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Distillation Introduction

Oct 09, 2024 About 4 mins

TinyML Distillation Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption. Co... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Neural Architecture Search

Oct 02, 2024 About 17 mins

Neural Architecture Search Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumpti... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Model Quantization II

Sep 30, 2024 About 12 mins

TinyML Quantization II Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption. ... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Model Quantization I

Sep 25, 2024 About 1 hour 0 min

TinyML Quantization I Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption. ... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Pruning and Sparsity

Sep 17, 2024 About 29 mins

Pruning and Sparsity Modern AI models are becoming increasingly large, demanding substantial computational resources and memory. This creates a gap between the computational demands of these models and the available hardware capabilities. Pruning addresses this gap by reducing model size, memory footprint, and ultimately, energy consumption. C... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
TinyML Basics of Neural Networks

Sep 10, 2024 About 27 mins

Basics of Neural Networks Course link 🌟 Main Themes The growing computational demand of deep learning models is outpacing hardware advancements, creating a strong need for efficient deep learning techniques. Understanding the basic building blocks of neural networks and their associated efficiency metrics is essential for designing optim... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
TinyML Introduction

Sep 05, 2024 About 10 mins

TinyML 📚 Introduction MIT’s TinyML and Efficient Deep Learning Computing course, taught by Professor Song Han, kicks off with an introduction to optimizing and speeding up deep learning models. As models grow in complexity, hardware constraints create a gap between model needs and deployment capabilities, driving up costs and emphasizing the ne... Read More

#MIT-TinyML-and-Efficient-Deep-Learning-2024
Regression vs. Survival Analysis 🚀

Sep 01, 2024 About 11 mins

Predicting Customer Churn: Regression vs. Survival Analysis 🚀 When it comes to predicting customer churn, the choice between regression and survival analysis depends on your data and objectives. While regression models may seem simpler, survival analysis is often better suited for time-to-event problems, especially when dealing with censored da... Read More

#Blog
Introduction of Quantization

Jun 01, 2024 About 22 mins

Overview of Quantization Motivation Data Types Quantization Basics Quantization Target Post-Training Quantization (PTQ) Quantization-Aware Training (QAT) Quantization Introduction K-means-based Weight Quantization Linear Quantization Binary and Ternary Quantization Automatic... Read More

#Blog
How to evaluate NLP tasks

May 01, 2024 About 6 mins

Traditional NLP Metrics The direct use of metrics such as perplexity and BLEU score has declined in popularity, largely due to their inherent flaws in many contexts. However, it remains crucial to comprehend these metrics and discern their appropriate applications. BLEU Paper: BLEU: a Method for Automatic Evaluation of Machine Translation Ori... Read More

#Blog
Recommender System 3 -- Ranking

Feb 10, 2024 About 8 mins

This guide dives into the essentials of recommendation systems, with a focus on key metrics, experimentation methods, and the underlying architecture. Whether you’re a beginner or looking to deepen your understanding of recommendation strategies, this will give you a structured breakdown of critical components. Follow this awesome tutorial by ... Read More

#YouTube
Recommender System 2 -- Retrieval

Feb 05, 2024 About 5 mins

This guide dives into the essentials of recommendation systems, with a focus on key metrics, experimentation methods, and the underlying architecture. Whether you’re a beginner or looking to deepen your understanding of recommendation strategies, this will give you a structured breakdown of critical components. Follow this awesome tutorial by ... Read More

#YouTube
Recommender System 1 -- Introduction

Feb 01, 2024 About 15 mins

Recommendation System Overview This post summarizes the first lecture in Shusen Wang’s recommender system series. It covers the basic user funnel, common evaluation metrics, the industrial recommendation pipeline, and the online experimentation workflow used to decide whether a new model or strategy should be launched. Reference: Shusen Wang’s... Read More

#YouTube
Q-Functions in Reinforcement Learning

Nov 23, 2023 About 25 mins

Q-Functions in Reinforcement Learning Lecture 8 is about making Q-learning work in practice. Lecture 7 gave the value-function theory. Lecture 8 focuses on practical stability tricks used in deep RL systems: replay buffers target networks double Q-learning multi-step targets handling continuous actions Course Link 1. Recap: Fitt... Read More

#UCB-Deep-Reinforcement-Learning-2023
Value Function Methods in Reinforcement Learning

Nov 23, 2023 About 9 mins

Value Function Methods in Reinforcement Learning This lecture is about one core idea: Learn a value function that scores actions. Act by choosing the action with the highest score. If the lecture feels heavy, use this reading order: Section 2 (tiny tabular example) Sections 6 and 7 (FQI and online Q-learning) Section 8 (why conv... Read More

#UCB-Deep-Reinforcement-Learning-2023
Actor-Critic Algorithms in Reinforcement Learning

Nov 22, 2023 About 29 mins

Actor-Critic Algorithms in Reinforcement Learning This lecture focuses on Actor-Critic algorithms in Deep Reinforcement Learning. It covers the evolution from basic policy gradients, the role of value functions, various policy evaluation techniques, practical implementation considerations, and advanced variance reduction methods. Course Link ... Read More

#UCB-Deep-Reinforcement-Learning-2023
Policy Gradients in Reinforcement Learning

Nov 21, 2023 About 15 mins

Policy Gradients in Reinforcement Learning This lecture summarizes the core concepts, derivations, practical considerations, and advanced topics related to Policy Gradients, a fundamental algorithm in reinforcement learning (RL). Course Link 1. Introduction to Policy Gradients Policy gradients are a foundational RL algorithm that directly opt... Read More

#UCB-Deep-Reinforcement-Learning-2023
Reinforcement Learning Introduction

Nov 20, 2023 About 20 mins

Introduction to Reinforcement Learning This lecture covers fundamental definitions, the objective of RL, the anatomy of RL algorithms, and a categorization of various algorithm types along with their trade-offs and assumptions. Course Link 1. Core Terminology and Concepts The fundamental difference between reinforcement learning (RL) and imit... Read More

#UCB-Deep-Reinforcement-Learning-2023
Robot Basic Pick and Place III - Differential kinematics via optimization

Oct 12, 2023 About 8 mins

🤖 Basic Pick and Place - Differential kinematics via optimization Course Link Optimization-Based Differential Inverse Kinematics (DIK-QP) Problem with Pseudo-Inverse Limitation: Does not handle joint limits or other real-world constraints. Consequence: Can lead to clipped velocities and off-course end-effector trajectories. Pseudo-Inve... Read More

#MIT-Robotic-Manipulation-2023
Robot Basic Pick and Place II - Differential kinematics

Oct 11, 2023 About 16 mins

🤖 Basic Pick and Place - Differential kinematics Course Link The fundamental goal is to move a “red brick from one bin to the second bin” using a robot. This requires: Defining target poses: Ideal gripper positions/orientations for picking and placing Generating trajectories: Compose keyframes into smooth paths Robot control: Convert d... Read More

#MIT-Robotic-Manipulation-2023
Robot Basic Pick and Place I - kinematics and trajectories

Oct 10, 2023 About 15 mins

🤖 Basic Pick and Place - Kinematics and Spatial Algebra The lecture covers foundational techniques for robot manipulation, especially pick-and-place tasks, with a focus on kinematics and spatial algebra. Course Link I. Introduction to Robotic Manipulation and the Pick and Place Problem A core challenge in robotic manipulation is commanding a... Read More

#MIT-Robotic-Manipulation-2023
Robot Hardware

Oct 04, 2023 About 14 mins

🤖 Robot Hardware This blog summarizes key concepts from 6.4210 Fall 2023 Lecture 2: Let’s get you a robot! and supporting materials, focusing on robot arm hardware, the intricacies of simulation, and the evolving landscape of robot hands. Course Link 1. Robot Arm Hardware: Evolution and Key Characteristics 🏭 Industrial Robots (Traditional) ... Read More

#MIT-Robotic-Manipulation-2023
Anatomy of a Manipulation System

Oct 01, 2023 About 12 mins

1. Defining Robotic Manipulation The lecture delves into a nuanced definition of robotic manipulation, moving beyond simple object handling to encompass complex interactions within dynamic and unstructured environments. Course Link A. Core Definition Matt Mason’s Definition: “Manipulation refers to the agent’s control of the en... Read More

#MIT-Robotic-Manipulation-2023
Imitation Learning

Sep 01, 2023 About 16 mins

Imitation Learning: Challenges and Solutions This lecture reviews supervised Learning of Behaviors / Imitation Learning Course Link 1. Introduction to Imitation Learning and Behavioral Cloning Imitation Learning (IL), specifically Behavioral Cloning (BC), involves training a policy (a model that maps observations to actions) using a dataset o... Read More

#UCB-Deep-Reinforcement-Learning-2023
Reinforcement Learning Introduction

Sep 01, 2023 About 10 mins

🤖 Deep Reinforcement Learning This briefing document reviews the main themes and key takeaways from a collection of sources focused on deep reinforcement learning (Deep RL), including insights from CS 285 lectures and supplementary materials. Course Link 📉 The Limitations of Data-Driven AI Data-driven AI has achieved impressive results, p... Read More

#UCB-Deep-Reinforcement-Learning-2023
AlphaGo, AlphaGo Zero, and AlphaZero - Deep Reinforcement Learning Meets Search

Jun 07, 2023 About 7 mins

♟️ AlphaGo, AlphaGo Zero, and AlphaZero - Deep Reinforcement Learning Meets Search This blog explores the structure and training process of AlphaGo and its successors, AlphaGo Zero and AlphaZero, illustrating how deep reinforcement learning and search are combined to achieve superhuman performance in complex board games. 🧩 1. Introduction and ... Read More

#YouTube
The Actor–Critic Method

Jun 05, 2023 About 21 mins

🎭 The Actor–Critic Method: Bridging Policy-Based and Value-Based Reinforcement Learning The Actor–Critic (AC) method is a foundational algorithm in Reinforcement Learning that elegantly combines the strengths of Policy-Based and Value-Based methods. It uses two neural networks — the Policy Network (Actor) and the Value Network (Critic) — traine... Read More

#YouTube
Policy-Based Reinforcement Learning

Jun 05, 2023 About 16 mins

🧭 Policy-Based Reinforcement Learning — Directly Learning to Act Policy-Based Reinforcement Learning (RL), also known as Policy Learning, focuses on directly modeling and optimizing the agent’s policy $( \pi )$, i.e., the agent’s behavior function. This contrasts with Value-Based RL (like DQN), which indirectly learns the policy by estimating t... Read More

#YouTube
Value-Based Reinforcement Learning Foundations

Jun 05, 2023 About 20 mins

Value-Based Reinforcement Learning Foundations Value-based reinforcement learning (RL) focuses on estimating how valuable it is to take a particular action in a given state — quantified as the expected discounted future reward. This approach underpins algorithms like Q-learning and Deep Q-Networks (DQN). 1️. Value-Based Reinforcement Learning ... Read More

#YouTube
Reinforcement Learning Basics

Jun 05, 2023 About 19 mins

Reinforcement Learning Basics Follow this awesome tutorial by Shusen Wang, which provides a foundational overview of reinforcement learning (RL) — starting from probability theory and building up to key RL concepts such as states, actions, policies, and value functions. 1. 🎲 A Little Bit of Probability Theory The lecture begins with essentia... Read More

#YouTube
Distributed Systems Introduction and MapReduce

Mar 01, 2023 About 9 mins

Course Link 🌐 What is a Distributed System? A distributed system is a collection of computers that communicate over a network to perform a task together. Examples 📱 Popular app backends (e.g., for messaging) 🌐 Large websites 🖧 Domain Name System (DNS) 📞 Phone systems These systems often use services that are themselves ... Read More

#MIT-Distributed-Systems-2021
Generative Adversarial Networks

Dec 08, 2022 About 8 mins

📚 Generative Adversarial Networks Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. 🧾 Briefing Document: Generative Adversarial Networks (GANs) 🚀 Introduction: This document provides a detai... Read More

#CMU-Deep-Learning-Systems-2022
Transformer Implementation with Naive Numpy and Pytorch

Dec 02, 2022 About 22 mins

📚 Transformer Implementation Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. This document details the implementation of a Transformer model using NumPy, comparing it to PyTorch’s implement... Read More

#CMU-Deep-Learning-Systems-2022
Transformers and Autoregressive Models

Dec 02, 2022 About 4 mins

📚 Transformers and Autoregressive Models Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. This document summarizes key concepts from the lecture on Transformers and attention mechanisms. The... Read More

#CMU-Deep-Learning-Systems-2022
LSTM Implementation

Nov 24, 2022 About 16 mins

📚 LSTM Implementation Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. Lecture Overview The lecture focuses on implementing a Long Short-Term Memory (LSTM) network, starting with a single c... Read More

#CMU-Deep-Learning-Systems-2022
Sequence Modeling and Recurrent Networks

Nov 23, 2022 About 7 mins

📚 Sequence Modeling and Recurrent Networks Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. This lecture covers sequence modeling, RNNs, LSTMs, and their applications in complex prediction t... Read More

#CMU-Deep-Learning-Systems-2022
Convolutional Networks Implementation and Im2col

Nov 04, 2022 About 15 mins

📚 Convolutional Networks Implementation Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. 🖥️ Implementing Convolutions in Code The lecture focuses on how to implement convolutions in code, m... Read More

#CMU-Deep-Learning-Systems-2022
DLSys GPU Acceleration

Oct 30, 2022 About 18 mins

📚 GPU Acceleration Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. This lecture summarizes key concepts and techniques related to GPU (Graphics Processing Unit) acceleration, particularly w... Read More

#CMU-Deep-Learning-Systems-2022
DLSys Hardware Acceleration

Oct 25, 2022 About 27 mins

📚 Hardware Acceleration Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. I. Introduction and Motivation ⚡ Necessity of Acceleration The increasing computational demands of large models and... Read More

#CMU-Deep-Learning-Systems-2022
Differentiating CNN

Oct 22, 2022 About 10 mins

📚 Convolutional Networks Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. This document summarizes key concepts and practical considerations related to Convolutional Networks (CNNs) based on... Read More

#CMU-Deep-Learning-Systems-2022
Implement Your Own Deep Learning Library using Automatic Differentiation II

Oct 21, 2022 About 31 mins

📚 Neural Network Library Implementation Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. 🛠️ I. Introduction & Setup 🎉 Welcoming participants back to the Deep Learning Systems: Algorit... Read More

#CMU-Deep-Learning-Systems-2022
Normalization and Regularization

Oct 20, 2022 About 7 mins

📚 Normalization and Regularization Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. 🏁 Initialization and Optimization Weight Initialization Initializing weights is critical for... Read More

#CMU-Deep-Learning-Systems-2022
Modularity in Deep Learning Package

Oct 18, 2022 About 8 mins

📚 Common Abstractions for Neural Network Computations Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. 📚 Introduction to Neural Network Library Abstractions I. Introduction to Neural Networ... Read More

#CMU-Deep-Learning-Systems-2022
Fully Connected Networks, Optimization, Initialization and Activations

Oct 15, 2022 About 16 mins

📚 Fully Connected Networks, Optimization, Initialization and Activations Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. 🧠 Fully Connected Networks In a fully connected network, each neuro... Read More

#CMU-Deep-Learning-Systems-2022
Implement Your Own Deep Learning Library using Automatic Differentiation I

Oct 10, 2022 About 39 mins

📚 Automatic Differentiation Lab Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. 🔍 The “Needle” Package: A Deep Dive into Automatic Differentiation Implementation The “Needle” package is a co... Read More

#CMU-Deep-Learning-Systems-2022
Introduction of Automatic Differentiation

Oct 09, 2022 About 9 mins

📚 Introduction of Automatic Differentiation Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. 1. Machine Learning Components Every machine learning algorithm has three fundamental elements: ... Read More

#CMU-Deep-Learning-Systems-2022
Simple Neural Networks with Codes

Oct 03, 2022 About 11 mins

📚 Manual Neural Networks Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. 1. Limitations of Linear Classifiers 🧮 Linear classifiers divide the input space into linear regions, limiting thei... Read More

#CMU-Deep-Learning-Systems-2022
Softmax Regression with Codes

Oct 02, 2022 About 11 mins

📚 Softmax Regression Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. 1. 🛠️ Ingredients of a Machine Learning Algorithm: Hypothesis Function: Maps input features to output predictions. Lo... Read More

#CMU-Deep-Learning-Systems-2022
DLSys Introduction

Oct 01, 2022 About 3 mins

📚 Deep Learning Systems Introduction Course Link This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen. 1. 🚀 Why Study Deep Learning (DL) and DL Systems? The lecture emphasizes the powerful capabilities of mod... Read More

#CMU-Deep-Learning-Systems-2022

Copyright @ 2026 Leon Liu

Powered by Jekyll & Yat Theme.

Subscribe via RSS