Download the files for Lab 4A from the following links:

Instructions on how to download and use Jupyter Notebooks can be found here. You can find a static version of the notebook below.

notebook_lab_4A FIX

Lab 4 Image processing¶

0. Environment setup¶

In [1]:

# Use this cell for Google Colab integration with Google Drive
# from google.colab import drive
# import os

# # This will prompt for authorization.
# drive.mount('/content/drive')

# # If you want to work within a specific folder, specify the path
# folder_path = '/content/drive/My Drive/ese2000 ailab/Lab3B'

# # You can then change the directory to this folder to perform operations within it
# os.chdir(folder_path)

In [2]:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, TensorDataset
import matplotlib.pyplot as plt
import numpy as np
import time

device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device("cpu")
#device = torch.device("mps")
print(f"Using device: {device}")

Using device: cpu

1. Images¶

$$ \def\bbX{{{\mathbf X}}} $$

Images are mathematically modeled as functions of two variables. These variables represent the vertical coordinate $m$ and the horizontal coordinate $n$. These variables can be discrete or continuous but we will consider them here to be discrete. In this case the coordinate pair $(m, n)$ is called a pixel. We restrict each pixel coordinate to be between 0 and $N-1$. We use $\mathbf{X}$ to denote the image and and $x(m, n)$ to represent the value at pixel $(m, n)$. Pixels of an image are arranged in a matrix, $$ \tag{1} \mathbf{X}=\left(\begin{array}{cccccc} x(0,0) & x(0,1) & \cdots & x(0, n) & \cdots & x(0, N) \\ x(1,0) & x(1,1) & \cdots & x(1, n) & \cdots & x(1, N) \\ \vdots & \vdots & & \vdots & & \vdots \\ x(m, 0) & x(m, 1) & \cdots & x(m, n) & \cdots & x(m, N) \\ \vdots & \vdots & & \vdots & & \vdots \\ x(N, 0) & x(N, 1) & \cdots & x(N, n) & \cdots & x(N, N) \end{array}\right) $$ Thus, an image $\mathbf{X}$ is a matrix in which entries $x(m, n)$ represent pixel values.

The interpretation of this matrix is that pixel values $x(m,n)$ represent the luminance of the pixel. The luminance is how much light is reflected by the pixel. This determines how bright the pixel appears in a black and white image. For that reason we do not represent images as functions. We represent them as luminance colormaps.

To represent color images we just consider several matrices with different color contents. For instance, in a red, blue, and green (RGB) representation we have matrices $\bbX_{\text{R}}$, $\bbX_{\text{G}}$, and $ \bbX_{\text{B}}$ that represent the luminance restricted to the red, blue, and green colors. This information can be represented with a tensor,

$$ \tag{2} \bbX = [\bbX_{\text{R}}, \bbX_{\text{G}}, \bbX_{\text{B}}] $$

In this tensor, each of the color matrices is called a channel.

1.1 Handwritten Digits¶

In this lab we work with black and white images that represent handwritten digits. In Figure fig_images we show an image of a handwritten number 1 and a handwritten number 3. The dataset we are given contains pairs $(\bbX_q, y_q)$ of images $\bbX_q$ and human annotations $y_q$ that identify the correct digits.

Task 1¶

Load the data and visualize three images.

Loading the Data¶

We will use torchvision’s MNIST dataset class to download the data.

In [3]:

# Image normalization
# Converts the image into a tensor and normalizes to achieve a mean of 0 and variance of 1
transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
        ])

# Download MNIST train and test set and normalize images
train_set = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_set = datasets.MNIST('./data', train=False, download=True, transform=transform)

Visualize Images¶

In [ ]:

# Creating a matplotlib grid to plot three images.
fig, axes = plt.subplots(1,3, figsize=(10, 10))

# axes[i].imshow takes a matrix and plots its pixel values as an image. The cmap argument specifies to plot 
# the image in black and white.
# .squeeze() removes the extra dimension that is added by default when the image is loaded

# Image 1
img_index1 = 20
image1, label1 = train_set[img_index1]
axes[0].imshow(image1.squeeze(), cmap='binary_r')
axes[0].set_title(f"Label: {label1}")
axes[0].axis('off')
print(f"Image 1 size: {image1.squeeze().shape}")

# Image 2
img_index2 = 100
image2, label2 = train_set[img_index2]
axes[1].imshow(image2.squeeze(), cmap='binary_r')
axes[1].set_title(f"Label: {label2}")
axes[1].axis('off')
print(f"Image 2 size: {image2.squeeze().shape}")

# Image 3
img_index3 = 1000
image3, label3 = train_set[img_index3]
axes[2].imshow(image3.squeeze(), cmap='binary_r')
axes[2].set_title(f"Label: {label3}")
axes[2].axis('off')
print(f"Image 3 size: {image3.squeeze().shape}")

plt.tight_layout()
plt.show()

2. Shifts in Space¶

Before defining convolutions to process images we have to introduce vertical and horizontal shift operators. We name these operators $\mathcal{S}_{\mathrm{v}}$ and $\mathcal{S}_{\mathrm{H}}$ and define them as the operators whose action on an image $\mathbf{X}$ results in a shifting of vertical and horizontal coordinates, respectively. Thus, if applying the vertical shift $\mathcal{S}_{\mathrm{v}}$ to the image $\mathbf{X}$ yields the image $\mathbf{U}_{10}=\mathcal{S}_{\mathrm{v}} \mathbf{X}$, the entries $u_{10}(m, n)$ of the vertically shifted image $\mathbf{U}_{10}$ are

$$ \tag{4} u_{10}(m, n)=x(m-1, n), \text { when } \mathbf{U}_{10}=\mathcal{S}_{\mathrm{v}} \mathbf{X} $$

Likewise if applying the vertical shift $\mathcal{S}_{\mathrm{H}}$ to the image $\mathbf{X}$ yields the image $\mathbf{U}_{01}=\mathcal{S}_{\mathrm{v}} \mathbf{X}$, the entries $u_{01}(m, n)$ of the horizontally shifted image $\mathbf{U}_{01}$ are

$$ u_{01}(m, n)=x(m, n-1), \quad \text { when } \mathbf{U}_{01}=\mathcal{S}_{\mathrm{H}} \mathbf{X} $$

In (4) and (5) we adopt the convention that $x(m,n)=0$ when either of the arguments $m$ or $n$ are negative.

To understand (4) and (5) it is convenient to represent them in matrix form. Applying the vertical shift to the image (1) yields the matrix

$$ \def\Sv{\mathcal{S}_{\mathrm{V}}} \def\Sh{\mathcal{S}_{\mathrm{H}}} $$

$$ \begin{equation}\tag{6} \Sv\bbX\!=\! \left( \begin{array} {} 0 & 0 & \cdots & 0 & \cdots & 0 \\ x(0,0) & x(0,1) & \cdots & x(0,n) & \cdots & x(0,N) \\ \vdots & \vdots & & \vdots & & \vdots \\ x(m,0) & x(m,1) & \cdots & x(m,n) & \cdots & x(m,N) \\ \vdots & \vdots & & \vdots & & \vdots \\ x(N-1,0) & x(N-1,1) & \cdots & x(N-1,n) & \cdots & x(N-1,N) \\ \end{array} \right) . \end{equation} $$

$$ \def\bbU{{{\mathbf U}}} \def\bbX{{{\mathbf X}}} $$

In the vertically shifted image $\bbU_{10}=\Sv\bbX$ all of the rows of the matrix $\bbX$ are shifted down. We fill the first row with zeros because there are no pixels that can be shifted into the first row.

When we apply the horizontal shift $\Sh$ to the image $\bbX$ in (1) we obtain the image $$ \def \Nmo {N\!-\!1} \begin{equation}\tag{7} \Sh\bbX \!=\! \left( \begin{array} {} 0 & x(0,0) & \cdots & x(0,n) & \cdots & x(0,\Nmo) \\ 0 & x(1,0) & \cdots & x(1,n) & \cdots & x(1,\Nmo) \\ \vdots & \vdots & & \vdots & & \vdots \\ 0 & x(m,0) & \cdots & x(m,n) & \cdots & x(m,\Nmo) \\ \vdots & \vdots & & \vdots & & \vdots \\ 0 & x(N,0) & \cdots & x(N,n) & \cdots & x(N,\Nmo) \\ \end{array} \right) . \end{equation} $$

In the horizontally shifted image $\bbU_{01}=\Sh\bbX$ all of the columns of the matrix $\bbX$ are shifted right. We fill the first column with zeros because there are no pixels that can be shifted into the first column.

2.1 Shift Compositions and Spatial Shift Sequences¶

As in the case of time signals, image shifts can be composed. For instance, applying the vertical shift twice yields the signal

$$ \begin{equation}\tag{8} \bbU_{20} = \Sv \,\bbU_{10} = \Sv^2 \,\bbX. \end{equation} $$

This is a signal in which the entries are $u_{20}(m, n) = x(m-2, n)$. That is, the image $\bbU_{20}$ is one in which the pixels are shifted two pixels down.

As another example consider the application of $l$ shifts in the horizontal direction. We can define define this recursively as

$$ \begin{equation}\tag{9} \bbU_{0l} = \Sh \,\bbU_{0(l-1)} = \Sh^2 \,\bbU_{0(l-2)} = \ldots = \Sh^{l-1} \,\bbU_{01} = \Sh^l \,\bbX. \end{equation} $$

This is a signal in which the entries are $u_{0l}(m, n) = x(m, n-l)$. That is, the image $\bbU_{0l}$ is one in which the pixels are shifted $l$ pixels to the right.

In general, we can compose any number of vertical shifts with any number of horizontal shifts. This results in the definition of the spatial shift sequence composed of images $$ \begin{equation}\tag{10} \bbU_{kl} = \Sv^k \Sh^l \bbX. \end{equation} $$ This is an image in which the entries are $u_{kl}(m, n) = x(m-k, n-l)$. That is, the image $\bbU_{kl}$ is one in which the pixels are shifted $k$ pixels down and $l$ pixels to the right.

It is important to note that the spatial shift sequence in (10) can be computed recursively. To do that we need to define the vertical and horizontal recursions

$$ \begin{equation}\tag{11} \bbU_{kl} = \Sv \bbU_{(k-1)l}, \qquad \bbU_{kl} = \Sh \bbU_{k(l-1)}, \end{equation} $$

with the initial condition $\bbU_{00} = \bbX$. Recursive computation of the spatial diffusion sequence is important in practical implementations.

2.2 Negative shifts¶

A negative vertical shift is a shift that moves the pixels up. We denote this shift by $\mathcal{S}_{\mathrm{v}}^{-1}$ and define it as the shift that when acting on the image $\mathbf{X}$ yields the image $\mathbf{U}_{(-1) 0}=\mathcal{S}_v^{-1} \mathbf{X}$ whose entries are given by $$ u_{(-1) 0}(m, n)=x(m+1, n), \quad \text { when } \mathbf{U}_{(-1) 0}=\mathcal{S}_{\mathrm{v}}^{-1} \mathbf{X} . $$ In this definition we adopt the convention that $x(N+1, n)=0$. This means that when applying the negative shift $\mathcal{S}_{\mathrm{v}}^{-1}$ we fill the last row of $\mathbf{U}_{(-1) 0}$ with zeros. This is because there are no entries of $\mathbf{X}$ that can be shifted into this row.

Likewise a negative horizontal shift is a shift that moves the pixels left. We denote this shift by $\mathcal{S}_{\mathrm{H}}^{-1}$ and define it as the shift that when acting on the image $\mathbf{X}$ yields the image $\mathbf{U}_{0(-1)}=\mathcal{S}_{\mathrm{H}}^{-1} \mathbf{X}$ whose entries are given by $$ u_{0(-1)}(m, n)=x(m, n+1) \text {, when } \mathbf{U}_{0(-1) 0}=\mathcal{S}_{\mathrm{H}}^{-1} \mathbf{X} . $$ In this definition we adopt the convention that $x(m, N+1)=0$. This means that when applying the negative shift $\mathcal{S}_{\mathrm{H}}^{-1}$ we fill the last column of $\mathbf{U}_{(-1) 0}$ with zeros. This is because there are no entries of $\mathbf{X}$ that can be shifted into this column.

As is the case of the positive shifts $\Sv$ and $\Sh$, the negative shifts $\Sv^{-1}$ and $\Sh^{-1}$ can be composed. It is also possible to compose the positive vertical shift $\Sv$ with the negative horizontal shift $\Sh^{-1}$. The converse composition of the positive horizontal shift $\Sh$ with the negative vertical shift $\Sv^{-1}$ is also possible. We can therefore generalize (10) into $$ \begin{equation}\tag{14} \bbU_{kl} = \Sv^k \Sh^l \bbX. \end{equation} $$ This is, in fact, the exact same equation as (10). The difference is that we now allow for $k$ and $l$ to be positive or negative numbers. The entries of $\bbU_{kl}$ are $ u_{kl}(m,n)=x(m-k,n-l)$. When $k$ is positive this entails shifting pixels down and when $k$ is negative this entails shifting pixels up. When $l$ is positive this entails shifting pixels to the right and when $k$ is negative this entails shifting pixels to the left.

3. Convolutions in space¶

A spatial convolution is a linear combination of the components of the spatial diffusion sequence in (14). For a formal definition we consider a filter range $K$ and define filter coefficients $h_{k l}$ for $k$ and $l$ ranging from $-K$ to $K$. The outcome of applying the filter with coefficients $h_{k l}$ to the image $\mathbf{X}$ is the image $$ \mathbf{Y}=\sum_{k=-K}^K \sum_{l=-K}^K h_{k l} \mathcal{S}_{\mathrm{v}}^k \mathcal{S}_{\mathrm{H}}^l \mathbf{X} . $$ Observe that in this definition we allow for positive and negative shifts. For simplicity we assume that the maximum number of shifts in either direction is the same. They can be made different, but it is standard practice to keep them equal.

We note that the total number of filter coefficients of a two dimensional convolution is $(2 K+1)^2$. These coefficients are arranged in a matrix $\mathbf{H}$.

Task 2¶

$$ \def\bbH{{{\mathbf H}}} $$

Write a filter object and endow it with a function that takes an image $\bbX$ as an input and returns as an output the result of convolving the image with the filter $\bbH$ that is stored as an attribute of the class. Make sure that this function can operate on a batch of input images $\blacksquare$

In [5]:

class ConvolutionalFilter2D(nn.Module):
    """
    A very simple implementation of a convolutional filter. Given an image, it returns the result of convolving the image with the filter.
    It is very simplified and processes each filter sequentially, to make the operation easier to understand. 
    It also assumes F=1 because MNIST has only one input channel.

    A more sophisticated version of this filter would process all the filters and images in parallel, to take advantage of the batch processing.
    Args:
        num_filters (Int): The number of filters in the bank.
        K (Int): Defines the size of each filter in the bank. The true size is (2K+1)^2. 
                    Recall that we sweep from -K to K. So if K is 1, each filter in our bank is a 3x3 grid.
                    The filter is assumed to be square.
    """
    
    def __init__(self,  num_filters, K):
        super(ConvolutionalFilter2D, self).__init__()
        # L is the number of pixels in each dimension (For the example above L = (1) * 2 + 1 = 3 and each filter is LxL = 3x3)
        L = K*2+1

        # The number of filters in the bank determines the number of output channels AKA output features
        self.num_filters = num_filters
        self.filters = torch.zeros((num_filters, L, L))

    def forward(self, X):
        """
        Performs a 2D convolution on the given image with the given 
        Args:
            X (torch.Tensor): input image of shape (B, F=1, M, N)
            filters (torch.Tensor): filters of shape (G, 2K+1, 2K+1). 
        Returns:
            torch.Tensor: Convolution of the image with shape (G, M, N).
        """
        
        # We are assuming square filters. The true size is filter_size^2
        G, num_coefficients, _ = self.filters.shape
        B, M, N = X.shape # Here F=1
         
        # By definition num_coefficients = 2*K+1
        K = (num_coefficients-1)//2

        # First we will create a padded version of our image, to follow the convention that SX_ij = 0 for i,j out of bounds
        # Create the padded dimensions. There are 2K-1 elements to be added to each dimension of the image.
        V_pad = M + 2*K
        H_pad = N + 2*K

        # Initialize padded image with the new dimensions
        X_padded = torch.zeros((B, G, V_pad, H_pad))
        # Place the original images in the center of the X_padded tensor.
        X_padded[:, :, K:(V_pad-K), K:(H_pad-K)] = X.unsqueeze(1)

        # Initialize the convolution output tensor
        Y = torch.zeros((B, G, M, N))
        
        # Iterate over the filters
        for g in range(G):
            # Get the current filter H^(g)
            H = self.filters[g]
            # Iterate over the filter taps (these are the parameters we are learning)
            for u in range(2*K+1):
                for v in range(2*K+1):
                    # Shift the image vertically and horizontally
                    # This operation is equivalent to Z_uv = S_V^u * S_H^v * X.
                    # u:u + M shifts in the vertical direction
                    # v:v + N shifts in the horizontal direction
                    Z_uv = X_padded[:, g, u:u + M, v:v + N]

                    # (ALTERNATIVE IMPLEMENTATION)
                    # Another way to implement the line above is using torch.roll: roll u to the top and v to the left
                    # Roll-based version: roll u to the top and v to the left, then take the first MxN elements (to match the dimensions of Y)
                    #Z_uv = torch.roll(torch.roll(X_padded[g], -u, dims=0), -v, dims=1)[:M, :N]

                    # Multiply the kernel with the rolled image and sum the result:
                    # conceptually this operation is Y[g] = Z_uv * H
                    Y[:,g] += Z_uv * H[u, v]

        return Y


# An example to test our implementation, with a (4x4) image with all ones, and a filter with values:
# H = [[1,2,3],[4,5,6],[7,8,9]]
X = torch.ones(1,1,4,4)
convolution = ConvolutionalFilter2D(num_filters=1, K=1)
# Manually set the filters to the values (just to test)
convolution.filters[:,:,:] = torch.arange(1,10).reshape(1,3,3)

print("Input image:")
display(X[0,0,:,:])
print("Filter:")
display(convolution.filters[0])
print("Convolution output:")
convolution(X[:,0,:,:])

Input image:

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

Filter:

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

Convolution output:

Out[5]:

tensor([[[[28., 39., 39., 24.],
          [33., 45., 45., 27.],
          [33., 45., 45., 27.],
          [16., 21., 21., 12.]]]])

Task 3¶

Define filters $\bbH$ with entries $h_{kl} = 1/((2K+1)^2)$. Use the class in Task (2) to instantiate these filters with $K=2$, $K=4$, and $K=8$. Apply these filters to the images you visualized in Task (1).

What is the effect of applying these filters? $\blacksquare$

In [ ]:

# Take any sample image
img = train_set[0][0]

# Plot an original image (no convolution) to compare against
fig, ax = plt.subplots(1,4, figsize=(14,3.5))
ax[0].imshow(img.squeeze(), cmap = 'binary_r')
ax[0].set_title("Original image")

K_vals = [2,4,8]

for i, K in enumerate(K_vals):
   # Instantiate Filter Bank with one filter (single output channel)
    H = ConvolutionalFilter2D(num_filters = 1, K = K)
    
    # Instantiate each filter in the bank with a set of uniform values
    # (Later on, these filter values will be learned)
    H.filters = torch.ones((1, 2*K+1, 2*K+1))*(1/(2*K+1)**2)
    
    # Perform 2D convolution
    out = H(img)
    
    # Plot the convolved image for each K
    ax[i+1].imshow(out.squeeze(), cmap = 'binary_r')
    ax[i+1].set_title(f"Covolved image with K={K}")

Task 4¶

Pytorch comes with its own function for computing two dimensional convolutions. Consider the same filters of Task 3 with $K=8$. Execute your own convolution code and the code that comes built in in Pytorch to compute convolutions for a batch of $B=100$ images. Compare the execution times. $\blacksquare$

In [7]:

# We can create a DataLoader from the train set
loader = DataLoader(train_set, batch_size=1024, shuffle=True)
imgs = next(iter(loader))[0][:100]

# Perform convolution with function from torch
K = 8
G = 40 # 40 output filters to really see the effect of Pytorch's implementation
conv = nn.Conv2d(in_channels = 1, out_channels = G, kernel_size = 2*K+1, padding = 'same', bias = False)

# Time convolution
start = time.time()
out = conv(imgs)
end = time.time()

print(f"PyTorch's Convolution took: {end-start:.3f} seconds")

# Perform convolution with your function
filterbank = ConvolutionalFilter2D(num_filters = G, K = 8)
# Initialize filters with the torch initialization
filterbank.filters = conv.weight.squeeze(1)

# Time our implementation
start = time.time()
out_manual = filterbank(imgs.squeeze())
end = time.time()
print(f"Manual Convolution took: {end-start:.3f} seconds")

# Check that outputs match
print(f"Manual convolution and torch convolution output the same tensor: {torch.allclose(out_manual, out, atol = 1e-5)}")

PyTorch's Convolution took: 0.027 seconds
Manual Convolution took: 0.403 seconds
Manual convolution and torch convolution output the same tensor: True

In solving Task 4 you must have noticed that the Pytorch code is much faster. This is because the Pytorch convolution function is just a wrapper that calls a compiled subroutine written in C. If this were the 1990’s we would say that “true men code in C,” which was indeed a quip that was common in the 1990’s. Lucky for us all we live in a time when we do not require men to be tough, nor women to stay away from engineering, nor persons to define themselves as either men or women. So let us just remember that industry level data sciences still relies on low level languages. Python and Pytorch are prototyping languages.

We will use the Pytorch convolution function in the remainder of Lab 4.

4. Filterbanks¶

$$ \def\bbZ{{{\mathbf Z}}} \def\bbA{{{\mathbf A}}} \def\bbx{{{\mathbf x}}} $$

To use convolutional filters in classification tasks we will build convolutional neural networks (CNNs). An intermediate step is to use the energy content of the outputs of a convolutional filterbank.

Begin then by defining the energy of an image as the sum of the squares of the values of each pixel, $$ \begin{equation}\tag{16} \|\bbX\|^2 = \sum_{m=0}^{M-1} \sum_{n=0}^{M-1} x^2(m,n). \end{equation} $$

Continue by defining a collection of $G$ filters $\bbH^g$ each of which contains filter coefficients $h^g_{kl}$. Processing the input image $\bbX$ with each of these filters produces the output feature, $$ \begin{equation}\tag{17} \bbZ^g = \sum_{~k={-K}}^{K~} \sum_{~l={-K}~}^{K} h^g_{kl} \, \Sv^k \, \Sh^l \, \bbX . \end{equation} $$

Each of the $\bbZ^g$ features is an image. This set of images has to be converted into a set of class scores as we did in Lab 2C. We will do that in two steps. The first step is to create a vector that groups the energies of the outputs of each filterbank

$$ \begin{equation}\tag{18} \bbx_1 = \Big[~ \big\| \bbZ^1 \big\|^2;\, \big\| \bbZ^2 \big\|^2;\, \ldots;\, \big\| \bbZ^G \big\|^2 ~\Big]. \end{equation} $$ The second step is to process this vector with a readout layer. This is just a multiplication with a matrix $\bbA$ that matches the dimension $G$ of the filterbank with the number of classes $C$,

$$ \begin{equation}\tag{19} \bbx_2 = \ \bbA \bbx_1. \end{equation} $$ Each of the entries of $\bbx_2$ is a class score. We will train this filterbank to correctly predict class labels.

Task 5¶

Create a filterbank class. In this class the filter coefficients $H_g$ the matrix A are attributes. The class has a method that takes an image X as an input and implements the filterbank equations in (17) – (19). The method returns the vector of scores $x_2$ as an output.

In [8]:

class ConvolutionalFilterWithPytorch(nn.Module):
    """
    A filter bank. This class is very similar to the one in the previous task, 
    however now we are using torch's built in convolution functions.
    
    Args:
        num_filters (Int): The number of filters in the bank (these filters are in parallel!)
        K (Int): Defines the size of each filter in the bank. The true size is (2K+1)^2. 
                    Recall that we sweep from -K to K. So if K is 1, each filter in our bank is a 3x3 grid.
                    The filter is assumed to be square.
    """
    def __init__(self, num_filters, K):
        super(ConvolutionalFilterWithPytorch, self).__init__()

        # In the MNIST dataset each class corresponds to a digit
        num_classes = 10
        
        # Filter Bank with num_filter filters and num_taps taps
        self.conv1 = nn.Conv2d(in_channels = 1, 
                               out_channels = num_filters, 
                               kernel_size = 2*K+1, 
                               stride = 1, 
                               padding = 'same')
        
        # Linear layer. This layer takes our num_filters number of output channels and transforms it into an output with dimension num_classes
        self.linear = nn.Linear(num_filters, num_classes)

    def forward(self, x):
        # Pass through filter bank
        x = self.conv1(x)
        
        # Compute energy of each output feature
        # Notice that this condenses filter bank output into a vector of shape (b, num_filters)
        x = torch.sum( x**2, dim=(2,3) )
        
        # Linear map between feature energies and class scores
        x = self.linear(x)
        return x

Task 6¶

Split the handwritten digit dataset into train and test sets. Instantiate the class of Task 5 to train a filterbank that classifies images to the corresponding digit class. Use the crossentropy loss as the optimization objective.

Train¶

We already had train and test datasets from Pytorch’s MNIST loaders. All we need to do is take our usual training and evaluation code.

In [9]:

# We define a function to evaluate the loss. This is an auxiliary function. We do not need it to 
# implement or train the neural network. We use it for evaluation after training and/or at some
# intermediate training checkpoints. This is just for visualization.

def evaluate(test_dataloader, estimator):
    """
    Evaluate the performance of the estimator on the given dataloader.

    Args:
        dataloader (torch.utils.data.DataLoader): Data on which to evaluate performance.
        estimator (torch.nn.Module): Learning parameterization to evaluate.

    Returns:
        float: Accuracy of the estimator evaluated on the dataloader.
    """
    correct = 0  # Initialize counter for correct predictions
    
    with torch.no_grad():  # Disable gradient computations (not needed for evaluation)
        for data, target in test_dataloader:  # Sweep over all batches
            data = data.to(device) # Move the data tensor to the device (e.g. from CPU memory to GPU memory)
            target = target.to(device) # Move the target tensor to the device (e.g. from CPU memory to GPU memory)
            output = estimator(data)  # Get model predictions for the batch

            # Select the most likely index (class) from the output tensor
            pred = output.argmax(dim=-1)
            
            # Count the number of correct predictions in the batch
            correct += torch.sum(pred == target)
    
    # Calculate overall accuracy: percentage of correct predictions 
    test_accuracy = correct / len(test_dataloader.dataset)
    
    return test_accuracy

In [10]:

# Instatiate model
estimator = ConvolutionalFilterWithPytorch(num_filters = 30, K = 1).to(device)

# Instantiate cross entropy loss
cross_entropy = nn.CrossEntropyLoss()

# Set the parameters of stochastic gradient descent (SGD). These include the learning rate, 
# batch size, and number of epochs. Here we increased the learning rate to 0.03 and the number of epochs to 20.
lr = 0.0001
batch_size = 64
n_epochs = 10
optimizer = optim.SGD(estimator.parameters(), lr=lr)

# Specify the loss function. For the CNN, we use cross-entropy loss since this is a classification task.
loss = nn.CrossEntropyLoss()

# The train and test dataloaders handle randomized batches in the training set and non-shuffled
# batches in the test set. We keep the same structure as before, where we use these dataloaders
# to handle loading data into memory.
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_set, batch_size=1024, shuffle=False)

# Initialize null structures for storing the evolution of the training loss and test accuracy
# at the end of each epoch. This is not needed for training, just for displaying results.
losses = []
test_acc = []
print('\n')

# Begin the training loop. This is similar to the stochastic gradient descent (SGD) implementation 
# used for the convolutional filter. In the inner loop, we sweep over batches of size batch_size.
# In the outer loop, we sweep over epochs. One epoch is a complete pass through the shuffled dataset.
#
# We follow the three steps required to run SGD: (i) Load the data, (ii) Evaluate Gradients,
# (iii) Take a gradient descent step. The only difference is the estimator, which is now a CNN.
for epoch in range(n_epochs):  # Iterate over n_epochs epochs

    for batch_idx, (x_batch, y_batch) in enumerate(train_loader):  # Iterate over all batches in the dataset

        # (Step i) Load the data. These commands send the data to the GPU memory.
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)

        # (Step ii) Compute the gradients. Use automatic differentiation.
        estimator.zero_grad() # Reset the gradients to zero

        y_hat = estimator(x_batch).squeeze()  # Forward pass through the CNN and squeeze the output.
        cross_entropy_value = loss(y_hat, y_batch.type(torch.LongTensor).to(device))  # Compute the loss.

        cross_entropy_value.backward()  # Compute gradients by moving backwards to the gradient reset.

        # (Step iii) Update parameters by taking an SGD (or other optimizer) step.
        optimizer.step()

        # Print training stats at specified intervals to track progress.
        if batch_idx 
            print(f"Train Epoch: {epoch+1} \tLoss: {cross_entropy_value.item():.3f}")

        # Record the loss at each iteration for visualization.
        losses.append(cross_entropy_value.item())

    # End of batch loop.

    # Evaluate the performance of the CNN on the test set at the end of each epoch.
    test_accuracy = evaluate(test_loader, estimator)
    test_acc.append(test_accuracy.cpu())

    # Print the test accuracy at the end of each epoch to track performance.
    print(f'Epoch {epoch+1} / {n_epochs}: Test Accuracy: {test_accuracy*100:.2f}

# End of epoch loop.
print('\n\nDone\n')

# Plot the training loss versus the number of iterations.
plt.plot(losses)
plt.title("training loss")


Train Epoch: 1 	Loss: 386.865
Train Epoch: 1 	Loss: 52.115
Train Epoch: 1 	Loss: 22.191
Train Epoch: 1 	Loss: 14.315
Train Epoch: 1 	Loss: 9.729
Train Epoch: 1 	Loss: 7.722
Train Epoch: 1 	Loss: 6.955
Train Epoch: 1 	Loss: 4.477
Train Epoch: 1 	Loss: 3.963
Train Epoch: 1 	Loss: 3.824
Train Epoch: 1 	Loss: 3.733
Train Epoch: 1 	Loss: 3.938
Train Epoch: 1 	Loss: 7.096
Train Epoch: 1 	Loss: 3.136
Train Epoch: 1 	Loss: 2.092
Train Epoch: 1 	Loss: 2.906
Train Epoch: 1 	Loss: 2.939
Train Epoch: 1 	Loss: 4.117
Train Epoch: 1 	Loss: 2.244
Epoch 1 / 10: Test Accuracy: 43.90
Train Epoch: 2 	Loss: 3.227
Train Epoch: 2 	Loss: 2.239
Train Epoch: 2 	Loss: 2.290
Train Epoch: 2 	Loss: 1.807
Train Epoch: 2 	Loss: 1.874
Train Epoch: 2 	Loss: 3.411
Train Epoch: 2 	Loss: 2.431
Train Epoch: 2 	Loss: 2.651
Train Epoch: 2 	Loss: 2.986
Train Epoch: 2 	Loss: 2.745
Train Epoch: 2 	Loss: 2.602
Train Epoch: 2 	Loss: 2.128
Train Epoch: 2 	Loss: 2.060
Train Epoch: 2 	Loss: 2.602
Train Epoch: 2 	Loss: 1.867
Train Epoch: 2 	Loss: 1.648
Train Epoch: 2 	Loss: 2.109
Train Epoch: 2 	Loss: 1.728
Train Epoch: 2 	Loss: 1.721
Epoch 2 / 10: Test Accuracy: 44.87
Train Epoch: 3 	Loss: 1.853
Train Epoch: 3 	Loss: 2.147
Train Epoch: 3 	Loss: 1.873
Train Epoch: 3 	Loss: 2.447
Train Epoch: 3 	Loss: 1.695
Train Epoch: 3 	Loss: 1.124
Train Epoch: 3 	Loss: 1.295
Train Epoch: 3 	Loss: 2.005
Train Epoch: 3 	Loss: 1.715
Train Epoch: 3 	Loss: 1.833
Train Epoch: 3 	Loss: 1.457
Train Epoch: 3 	Loss: 1.722
Train Epoch: 3 	Loss: 1.264
Train Epoch: 3 	Loss: 1.370
Train Epoch: 3 	Loss: 2.754
Train Epoch: 3 	Loss: 1.310
Train Epoch: 3 	Loss: 1.543
Train Epoch: 3 	Loss: 1.880
Train Epoch: 3 	Loss: 1.685
Epoch 3 / 10: Test Accuracy: 44.33
Train Epoch: 4 	Loss: 1.993
Train Epoch: 4 	Loss: 1.462
Train Epoch: 4 	Loss: 1.532
Train Epoch: 4 	Loss: 1.571
Train Epoch: 4 	Loss: 1.713
Train Epoch: 4 	Loss: 1.500
Train Epoch: 4 	Loss: 1.870
Train Epoch: 4 	Loss: 1.468
Train Epoch: 4 	Loss: 1.373
Train Epoch: 4 	Loss: 1.330
Train Epoch: 4 	Loss: 1.156
Train Epoch: 4 	Loss: 1.356
Train Epoch: 4 	Loss: 1.875
Train Epoch: 4 	Loss: 1.396
Train Epoch: 4 	Loss: 1.593
Train Epoch: 4 	Loss: 1.315
Train Epoch: 4 	Loss: 1.283
Train Epoch: 4 	Loss: 1.343
Train Epoch: 4 	Loss: 1.249
Epoch 4 / 10: Test Accuracy: 53.62
Train Epoch: 5 	Loss: 1.647
Train Epoch: 5 	Loss: 1.495
Train Epoch: 5 	Loss: 1.562
Train Epoch: 5 	Loss: 1.320
Train Epoch: 5 	Loss: 1.402
Train Epoch: 5 	Loss: 1.758
Train Epoch: 5 	Loss: 1.542
Train Epoch: 5 	Loss: 1.102
Train Epoch: 5 	Loss: 1.239
Train Epoch: 5 	Loss: 1.381
Train Epoch: 5 	Loss: 1.124
Train Epoch: 5 	Loss: 1.096
Train Epoch: 5 	Loss: 1.506
Train Epoch: 5 	Loss: 1.750
Train Epoch: 5 	Loss: 1.306
Train Epoch: 5 	Loss: 1.549
Train Epoch: 5 	Loss: 1.215
Train Epoch: 5 	Loss: 1.415
Train Epoch: 5 	Loss: 1.355
Epoch 5 / 10: Test Accuracy: 43.91
Train Epoch: 6 	Loss: 1.511
Train Epoch: 6 	Loss: 1.495
Train Epoch: 6 	Loss: 1.314
Train Epoch: 6 	Loss: 1.559
Train Epoch: 6 	Loss: 1.432
Train Epoch: 6 	Loss: 0.997
Train Epoch: 6 	Loss: 1.506
Train Epoch: 6 	Loss: 1.022
Train Epoch: 6 	Loss: 1.033
Train Epoch: 6 	Loss: 1.373
Train Epoch: 6 	Loss: 1.133
Train Epoch: 6 	Loss: 1.356
Train Epoch: 6 	Loss: 1.357
Train Epoch: 6 	Loss: 1.354
Train Epoch: 6 	Loss: 1.527
Train Epoch: 6 	Loss: 1.663
Train Epoch: 6 	Loss: 1.352
Train Epoch: 6 	Loss: 1.123
Train Epoch: 6 	Loss: 1.397
Epoch 6 / 10: Test Accuracy: 51.52
Train Epoch: 7 	Loss: 1.444
Train Epoch: 7 	Loss: 1.301
Train Epoch: 7 	Loss: 0.904
Train Epoch: 7 	Loss: 1.521
Train Epoch: 7 	Loss: 1.333
Train Epoch: 7 	Loss: 1.401
Train Epoch: 7 	Loss: 1.436
Train Epoch: 7 	Loss: 1.080
Train Epoch: 7 	Loss: 1.558
Train Epoch: 7 	Loss: 1.752
Train Epoch: 7 	Loss: 1.049
Train Epoch: 7 	Loss: 1.493
Train Epoch: 7 	Loss: 1.386
Train Epoch: 7 	Loss: 1.142
Train Epoch: 7 	Loss: 1.353
Train Epoch: 7 	Loss: 1.099
Train Epoch: 7 	Loss: 1.287
Train Epoch: 7 	Loss: 1.193
Train Epoch: 7 	Loss: 1.212
Epoch 7 / 10: Test Accuracy: 55.58
Train Epoch: 8 	Loss: 1.088
Train Epoch: 8 	Loss: 1.253
Train Epoch: 8 	Loss: 1.250
Train Epoch: 8 	Loss: 1.212
Train Epoch: 8 	Loss: 1.079
Train Epoch: 8 	Loss: 1.336
Train Epoch: 8 	Loss: 1.001
Train Epoch: 8 	Loss: 1.366
Train Epoch: 8 	Loss: 1.575
Train Epoch: 8 	Loss: 1.267
Train Epoch: 8 	Loss: 1.171
Train Epoch: 8 	Loss: 1.118
Train Epoch: 8 	Loss: 1.230
Train Epoch: 8 	Loss: 1.459
Train Epoch: 8 	Loss: 1.483
Train Epoch: 8 	Loss: 1.275
Train Epoch: 8 	Loss: 1.317
Train Epoch: 8 	Loss: 1.421
Train Epoch: 8 	Loss: 1.393
Epoch 8 / 10: Test Accuracy: 52.39
Train Epoch: 9 	Loss: 1.049
Train Epoch: 9 	Loss: 1.059
Train Epoch: 9 	Loss: 1.046
Train Epoch: 9 	Loss: 1.217
Train Epoch: 9 	Loss: 1.277
Train Epoch: 9 	Loss: 0.964
Train Epoch: 9 	Loss: 1.130
Train Epoch: 9 	Loss: 1.522
Train Epoch: 9 	Loss: 1.167
Train Epoch: 9 	Loss: 1.483
Train Epoch: 9 	Loss: 1.266
Train Epoch: 9 	Loss: 1.464
Train Epoch: 9 	Loss: 1.257
Train Epoch: 9 	Loss: 1.635
Train Epoch: 9 	Loss: 1.128
Train Epoch: 9 	Loss: 1.229
Train Epoch: 9 	Loss: 1.159
Train Epoch: 9 	Loss: 1.224
Train Epoch: 9 	Loss: 1.399
Epoch 9 / 10: Test Accuracy: 52.31
Train Epoch: 10 	Loss: 1.179
Train Epoch: 10 	Loss: 1.037
Train Epoch: 10 	Loss: 1.099
Train Epoch: 10 	Loss: 1.187
Train Epoch: 10 	Loss: 1.218
Train Epoch: 10 	Loss: 1.721
Train Epoch: 10 	Loss: 1.489
Train Epoch: 10 	Loss: 0.979
Train Epoch: 10 	Loss: 1.114
Train Epoch: 10 	Loss: 1.506
Train Epoch: 10 	Loss: 1.267
Train Epoch: 10 	Loss: 1.013
Train Epoch: 10 	Loss: 1.153
Train Epoch: 10 	Loss: 1.034
Train Epoch: 10 	Loss: 1.114
Train Epoch: 10 	Loss: 1.151
Train Epoch: 10 	Loss: 1.266
Train Epoch: 10 	Loss: 1.508
Train Epoch: 10 	Loss: 1.260
Epoch 10 / 10: Test Accuracy: 50.09


Done

Out[10]:

Text(0.5, 1.0, 'training loss')

No description has been provided for this image