📚 Generative Adversarial Networks

Course Link

This document reviews the main themes and key takeaways from Deep Learning Systems: Algorithms and Implementation** at Carnegie Mellon University, taught by J. Zico Kolter and Tianqi Chen.

🧾 Briefing Document: Generative Adversarial Networks (GANs)

🚀 Introduction:

This document provides a detailed overview of Generative Adversarial Networks (GANs), a powerful unsupervised learning technique for generative modeling. The briefing synthesizes information from a lecture series, a Jupyter Notebook implementation, slides, and code excerpts to present both theoretical concepts and practical details. GANs provide a framework for learning to generate data that resembles a given target distribution.

🔑 Key Themes and Concepts:

🌐 Unsupervised Learning and Generative Modeling:

Traditional supervised learning relies on labeled data to train models for classification or prediction.
Unsupervised learning, in contrast, aims to learn from unlabeled data to discover patterns or generate new samples from the learned distribution.
GANs fall under this paradigm of unsupervised generative modeling:
- The goal is to learn the underlying probability distribution of a dataset and generate new samples resembling it.
- Defining a “distance” between generated and real datasets is crucial for training.
- The desired result is that generated samples “look real”.

🥊 Adversarial Training:

GANs involve two neural networks trained simultaneously:
1. Generator (G): Takes a random noise vector z and generates samples resembling real data. The generator’s goal is to fool the discriminator.
2. Discriminator (D): Takes samples and determines if they are real (from the dataset) or fake (generated by G). The discriminator’s goal is to distinguish between real and fake data.
Training involves a “minimax” game:
- Discriminator: Maximize the ability to distinguish real from fake data.
- Generator: Minimize the discriminator’s ability to distinguish fake from real.

🔮 Discriminator as an Oracle:

Initial concept: An “oracle” discriminator perfectly distinguishes between real and fake data.

“Assume that we have an oracle discriminator that can tell the difference between real and fake data. Then we need to train the generator to ‘fool’ the oracle.”

🤖 Learning the Discriminator:

In reality, no oracle exists. The discriminator is trained using real data and data generated by the generator.

“We do not have an oracle discriminator, but we can learn it using the real and generated fake data.”

📐 Objective Functions:

Discriminator’s Objective:
- Maximize the log-likelihood of identifying real data as real and fake data as fake:
```
min_D {-E[x~Data]log D(x) - E[z~Noise]log(1-D(G(z)))}
```
Generator’s Objective:
- Maximize the log-likelihood of fooling the discriminator:
```
min_G {-E[z~Noise]log D(G(z))} or maximize max_G{-E[z~Noise]log(1-D(G(z)))}
```
- Practical implementation often involves maximizing the probability of the discriminator predicting generated images as real.

🔄 Iterative Training Process:

GAN training alternates between updating the discriminator and the generator:
- Discriminator update: Minimize its loss using real and fake samples.
- Generator update: Minimize its loss to fool the discriminator.
Key concepts:
- Discriminator: Classify real as 1 and fake as 0.
- Generator: Fool the discriminator into classifying fake data as 1.

🧱 Modularization of GANs:

GANs are highly modular, like “Lego bricks”:
- They can be used as a loss module in other models (e.g., CycleGANs).
- By treating GANs as a loss function, more complex training methods are possible.

🎨 DCGAN (Deep Convolutional GAN):

Uses convolutional layers for the generator and discriminator.
The generator maps low-dimensional input to high-dimensional output using transposed convolutions.

🔄 CycleGAN:

Unpaired image-to-image translation:
- Two generators: one for domain X → Y and another for Y → X.
- Incorporates cycle consistency loss:
  
  “When an image goes from domain X to Y and back to X, it should look similar to the original image in X.”

🌫️ Diffusion Models:

Another generative modeling technique:
- Forward process: Gradually add Gaussian noise to an image.
- Reverse process: Remove noise to generate a sample.
- Modeled using stochastic differential equations.

💻 Code Implementation (from Lecture 17 and Notebook):

Library: Needle, using numpy arrays and gradient descent.
Example Setup:
- Target data: 2D Gaussian distribution.
- Generator: Single-layer linear neural network with bias.
- Discriminator: Three-layer neural network.
Functions:
- updateG: Optimizes the generator to fool the discriminator.
- updateD: Optimizes the discriminator to distinguish real from fake data.
- train_GAN: Iteratively updates both networks.
Modularized GAN Loss: Implements the loss function, hiding discriminator updates within a forward pass.

class GANLoss:
    def __init__(self, model_D, opt_D):
        self.model_D = model_D
        self.opt_D = opt_D
        self.loss_D = nn.SoftmaxLoss()

    def _update_D(self, real_X, fake_X):
        real_Y = self.model_D(real_X)
        fake_Y = self.model_D(fake_X.detach())
        batch_size = real_X.shape[0]
        ones = ndl.ones(batch_size, dtype="int32")
        zeros = ndl.zeros(batch_size, dtype="int32")
        loss = self.loss_D(real_Y, ones) + self.loss_D(fake_Y, zeros)
        loss.backward()
        self.opt_D.step()

    def forward(self, fake_X, real_X):
        self._update_D(real_X, fake_X)
        fake_Y = self.model_D(fake_X)
        batch_size = real_X.shape[0]
        ones = ndl.ones(batch_size, dtype="int32")
        loss = self.loss_D(fake_Y, ones)
        return loss

model_G = nn.Sequential(nn.Linear(2, 2))
opt_G = ndl.optim.Adam(model_G.parameters(), lr = 0.01)

model_D = nn.Sequential(
    nn.Linear(2, 20),
    nn.ReLU(),
    nn.Linear(20, 10),
    nn.ReLU(),
    nn.Linear(10, 2)
)
opt_D = ndl.optim.Adam(model_D.parameters(), lr=0.01)
gan_loss = GANLoss(model_D, opt_D)


def train_gan(data, batch_size, num_epochs):
    assert data.shape[0] % batch_size == 0
    for epoch in range(num_epochs):
        begin = (batch_size * epoch) % data.shape[0]
        X = data[begin: begin+batch_size, :]
        Z = np.random.normal(0, 1, (batch_size, 2))
        X = ndl.Tensor(X, dtype="float32")
        Z = ndl.Tensor(Z, dtype="float32")
        fake_X = model_G(Z)
        loss = gan_loss.forward(fake_X, X)
        loss.backward()
        opt_G.step()


A = np.array([[1, 2], [-0.2, 0.5]])
mu = np.array([2, 1])
# total number of sample data to generated
num_sample = 3200
data = np.random.normal(0, 1, (num_sample, 2)) @ A + mu

train_gan(data, 32, 2000)