Download the files for Lab 4A from the following links:
Instructions on how to download and use Jupyter Notebooks can be found here. You can find a static version of the notebook below.
Lab 4 Image processing¶
0. Environment setup¶
# Use this cell for Google Colab integration with Google Drive
# from google.colab import drive
# import os
# # This will prompt for authorization.
# drive.mount('/content/drive')
# # If you want to work within a specific folder, specify the path
# folder_path = '/content/drive/My Drive/ese2000 ailab/Lab3B'
# # You can then change the directory to this folder to perform operations within it
# os.chdir(folder_path)
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, TensorDataset
import matplotlib.pyplot as plt
import numpy as np
import time
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device("cpu")
#device = torch.device("mps")
print(f"Using device: {device}")
Using device: cpu
1. Images¶
$$ \def\bbX{{{\mathbf X}}} $$
Images are mathematically modeled as functions of two variables. These variables represent the vertical coordinate $m$ and the horizontal coordinate $n$. These variables can be discrete or continuous but we will consider them here to be discrete. In this case the coordinate pair $(m, n)$ is called a pixel. We restrict each pixel coordinate to be between 0 and $N-1$. We use $\mathbf{X}$ to denote the image and and $x(m, n)$ to represent the value at pixel $(m, n)$. Pixels of an image are arranged in a matrix, $$ \tag{1} \mathbf{X}=\left(\begin{array}{cccccc} x(0,0) & x(0,1) & \cdots & x(0, n) & \cdots & x(0, N) \\ x(1,0) & x(1,1) & \cdots & x(1, n) & \cdots & x(1, N) \\ \vdots & \vdots & & \vdots & & \vdots \\ x(m, 0) & x(m, 1) & \cdots & x(m, n) & \cdots & x(m, N) \\ \vdots & \vdots & & \vdots & & \vdots \\ x(N, 0) & x(N, 1) & \cdots & x(N, n) & \cdots & x(N, N) \end{array}\right) $$ Thus, an image $\mathbf{X}$ is a matrix in which entries $x(m, n)$ represent pixel values.
The interpretation of this matrix is that pixel values $x(m,n)$ represent the luminance of the pixel. The luminance is how much light is reflected by the pixel. This determines how bright the pixel appears in a black and white image. For that reason we do not represent images as functions. We represent them as luminance colormaps.
To represent color images we just consider several matrices with different color contents. For instance, in a red, blue, and green (RGB) representation we have matrices $\bbX_{\text{R}}$, $\bbX_{\text{G}}$, and $ \bbX_{\text{B}}$ that represent the luminance restricted to the red, blue, and green colors. This information can be represented with a tensor,
$$ \tag{2} \bbX = [\bbX_{\text{R}}, \bbX_{\text{G}}, \bbX_{\text{B}}] $$
In this tensor, each of the color matrices is called a channel.
1.1 Handwritten Digits¶
In this lab we work with black and white images that represent handwritten digits. In Figure fig_images we show an image of a handwritten number 1 and a handwritten number 3. The dataset we are given contains pairs $(\bbX_q, y_q)$ of images $\bbX_q$ and human annotations $y_q$ that identify the correct digits.
Task 1¶
Load the data and visualize three images.
Loading the Data¶
We will use torchvision’s MNIST dataset class to download the data.
# Image normalization
# Converts the image into a tensor and normalizes to achieve a mean of 0 and variance of 1
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
# Download MNIST train and test set and normalize images
train_set = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_set = datasets.MNIST('./data', train=False, download=True, transform=transform)
Visualize Images¶
# Creating a matplotlib grid to plot three images.
fig, axes = plt.subplots(1,3, figsize=(10, 10))
# axes[i].imshow takes a matrix and plots its pixel values as an image. The cmap argument specifies to plot
# the image in black and white.
# .squeeze() removes the extra dimension that is added by default when the image is loaded
# Image 1
img_index1 = 20
image1, label1 = train_set[img_index1]
axes[0].imshow(image1.squeeze(), cmap='binary_r')
axes[0].set_title(f"Label: {label1}")
axes[0].axis('off')
print(f"Image 1 size: {image1.squeeze().shape}")
# Image 2
img_index2 = 100
image2, label2 = train_set[img_index2]
axes[1].imshow(image2.squeeze(), cmap='binary_r')
axes[1].set_title(f"Label: {label2}")
axes[1].axis('off')
print(f"Image 2 size: {image2.squeeze().shape}")
# Image 3
img_index3 = 1000
image3, label3 = train_set[img_index3]
axes[2].imshow(image3.squeeze(), cmap='binary_r')
axes[2].set_title(f"Label: {label3}")
axes[2].axis('off')
print(f"Image 3 size: {image3.squeeze().shape}")
plt.tight_layout()
plt.show()
2. Shifts in Space¶
Before defining convolutions to process images we have to introduce vertical and horizontal shift operators. We name these operators $\mathcal{S}_{\mathrm{v}}$ and $\mathcal{S}_{\mathrm{H}}$ and define them as the operators whose action on an image $\mathbf{X}$ results in a shifting of vertical and horizontal coordinates, respectively. Thus, if applying the vertical shift $\mathcal{S}_{\mathrm{v}}$ to the image $\mathbf{X}$ yields the image $\mathbf{U}_{10}=\mathcal{S}_{\mathrm{v}} \mathbf{X}$, the entries $u_{10}(m, n)$ of the vertically shifted image $\mathbf{U}_{10}$ are
$$ \tag{4} u_{10}(m, n)=x(m-1, n), \text { when } \mathbf{U}_{10}=\mathcal{S}_{\mathrm{v}} \mathbf{X} $$
Likewise if applying the vertical shift $\mathcal{S}_{\mathrm{H}}$ to the image $\mathbf{X}$ yields the image $\mathbf{U}_{01}=\mathcal{S}_{\mathrm{v}} \mathbf{X}$, the entries $u_{01}(m, n)$ of the horizontally shifted image $\mathbf{U}_{01}$ are
$$ u_{01}(m, n)=x(m, n-1), \quad \text { when } \mathbf{U}_{01}=\mathcal{S}_{\mathrm{H}} \mathbf{X} $$
In (4) and (5) we adopt the convention that $x(m,n)=0$ when either of the arguments $m$ or $n$ are negative.
To understand (4) and (5) it is convenient to represent them in matrix form. Applying the vertical shift to the image (1) yields the matrix
$$ \def\Sv{\mathcal{S}_{\mathrm{V}}} \def\Sh{\mathcal{S}_{\mathrm{H}}} $$
$$ \begin{equation}\tag{6} \Sv\bbX\!=\! \left( \begin{array} {} 0 & 0 & \cdots & 0 & \cdots & 0 \\ x(0,0) & x(0,1) & \cdots & x(0,n) & \cdots & x(0,N) \\ \vdots & \vdots & & \vdots & & \vdots \\ x(m,0) & x(m,1) & \cdots & x(m,n) & \cdots & x(m,N) \\ \vdots & \vdots & & \vdots & & \vdots \\ x(N-1,0) & x(N-1,1) & \cdots & x(N-1,n) & \cdots & x(N-1,N) \\ \end{array} \right) . \end{equation} $$
$$ \def\bbU{{{\mathbf U}}} \def\bbX{{{\mathbf X}}} $$
In the vertically shifted image $\bbU_{10}=\Sv\bbX$ all of the rows of the matrix $\bbX$ are shifted down. We fill the first row with zeros because there are no pixels that can be shifted into the first row.
When we apply the horizontal shift $\Sh$ to the image $\bbX$ in (1) we obtain the image $$ \def \Nmo {N\!-\!1} \begin{equation}\tag{7} \Sh\bbX \!=\! \left( \begin{array} {} 0 & x(0,0) & \cdots & x(0,n) & \cdots & x(0,\Nmo) \\ 0 & x(1,0) & \cdots & x(1,n) & \cdots & x(1,\Nmo) \\ \vdots & \vdots & & \vdots & & \vdots \\ 0 & x(m,0) & \cdots & x(m,n) & \cdots & x(m,\Nmo) \\ \vdots & \vdots & & \vdots & & \vdots \\ 0 & x(N,0) & \cdots & x(N,n) & \cdots & x(N,\Nmo) \\ \end{array} \right) . \end{equation} $$
In the horizontally shifted image $\bbU_{01}=\Sh\bbX$ all of the columns of the matrix $\bbX$ are shifted right. We fill the first column with zeros because there are no pixels that can be shifted into the first column.
2.1 Shift Compositions and Spatial Shift Sequences¶
As in the case of time signals, image shifts can be composed. For instance, applying the vertical shift twice yields the signal
$$ \begin{equation}\tag{8} \bbU_{20} = \Sv \,\bbU_{10} = \Sv^2 \,\bbX. \end{equation} $$
This is a signal in which the entries are $u_{20}(m, n) = x(m-2, n)$. That is, the image $\bbU_{20}$ is one in which the pixels are shifted two pixels down.
As another example consider the application of $l$ shifts in the horizontal direction. We can define define this recursively as
$$ \begin{equation}\tag{9} \bbU_{0l} = \Sh \,\bbU_{0(l-1)} = \Sh^2 \,\bbU_{0(l-2)} = \ldots = \Sh^{l-1} \,\bbU_{01} = \Sh^l \,\bbX. \end{equation} $$
This is a signal in which the entries are $u_{0l}(m, n) = x(m, n-l)$. That is, the image $\bbU_{0l}$ is one in which the pixels are shifted $l$ pixels to the right.
In general, we can compose any number of vertical shifts with any number of horizontal shifts. This results in the definition of the spatial shift sequence composed of images $$ \begin{equation}\tag{10} \bbU_{kl} = \Sv^k \Sh^l \bbX. \end{equation} $$ This is an image in which the entries are $u_{kl}(m, n) = x(m-k, n-l)$. That is, the image $\bbU_{kl}$ is one in which the pixels are shifted $k$ pixels down and $l$ pixels to the right.
It is important to note that the spatial shift sequence in (10) can be computed recursively. To do that we need to define the vertical and horizontal recursions
$$ \begin{equation}\tag{11} \bbU_{kl} = \Sv \bbU_{(k-1)l}, \qquad \bbU_{kl} = \Sh \bbU_{k(l-1)}, \end{equation} $$
with the initial condition $\bbU_{00} = \bbX$. Recursive computation of the spatial diffusion sequence is important in practical implementations.
2.2 Negative shifts¶
A negative vertical shift is a shift that moves the pixels up. We denote this shift by $\mathcal{S}_{\mathrm{v}}^{-1}$ and define it as the shift that when acting on the image $\mathbf{X}$ yields the image $\mathbf{U}_{(-1) 0}=\mathcal{S}_v^{-1} \mathbf{X}$ whose entries are given by $$ u_{(-1) 0}(m, n)=x(m+1, n), \quad \text { when } \mathbf{U}_{(-1) 0}=\mathcal{S}_{\mathrm{v}}^{-1} \mathbf{X} . $$ In this definition we adopt the convention that $x(N+1, n)=0$. This means that when applying the negative shift $\mathcal{S}_{\mathrm{v}}^{-1}$ we fill the last row of $\mathbf{U}_{(-1) 0}$ with zeros. This is because there are no entries of $\mathbf{X}$ that can be shifted into this row.
Likewise a negative horizontal shift is a shift that moves the pixels left. We denote this shift by $\mathcal{S}_{\mathrm{H}}^{-1}$ and define it as the shift that when acting on the image $\mathbf{X}$ yields the image $\mathbf{U}_{0(-1)}=\mathcal{S}_{\mathrm{H}}^{-1} \mathbf{X}$ whose entries are given by $$ u_{0(-1)}(m, n)=x(m, n+1) \text {, when } \mathbf{U}_{0(-1) 0}=\mathcal{S}_{\mathrm{H}}^{-1} \mathbf{X} . $$ In this definition we adopt the convention that $x(m, N+1)=0$. This means that when applying the negative shift $\mathcal{S}_{\mathrm{H}}^{-1}$ we fill the last column of $\mathbf{U}_{(-1) 0}$ with zeros. This is because there are no entries of $\mathbf{X}$ that can be shifted into this column.
As is the case of the positive shifts $\Sv$ and $\Sh$, the negative shifts $\Sv^{-1}$ and $\Sh^{-1}$ can be composed. It is also possible to compose the positive vertical shift $\Sv$ with the negative horizontal shift $\Sh^{-1}$. The converse composition of the positive horizontal shift $\Sh$ with the negative vertical shift $\Sv^{-1}$ is also possible. We can therefore generalize (10) into $$ \begin{equation}\tag{14} \bbU_{kl} = \Sv^k \Sh^l \bbX. \end{equation} $$ This is, in fact, the exact same equation as (10). The difference is that we now allow for $k$ and $l$ to be positive or negative numbers. The entries of $\bbU_{kl}$ are $ u_{kl}(m,n)=x(m-k,n-l)$. When $k$ is positive this entails shifting pixels down and when $k$ is negative this entails shifting pixels up. When $l$ is positive this entails shifting pixels to the right and when $k$ is negative this entails shifting pixels to the left.
3. Convolutions in space¶
A spatial convolution is a linear combination of the components of the spatial diffusion sequence in (14). For a formal definition we consider a filter range $K$ and define filter coefficients $h_{k l}$ for $k$ and $l$ ranging from $-K$ to $K$. The outcome of applying the filter with coefficients $h_{k l}$ to the image $\mathbf{X}$ is the image $$ \mathbf{Y}=\sum_{k=-K}^K \sum_{l=-K}^K h_{k l} \mathcal{S}_{\mathrm{v}}^k \mathcal{S}_{\mathrm{H}}^l \mathbf{X} . $$ Observe that in this definition we allow for positive and negative shifts. For simplicity we assume that the maximum number of shifts in either direction is the same. They can be made different, but it is standard practice to keep them equal.
We note that the total number of filter coefficients of a two dimensional convolution is $(2 K+1)^2$. These coefficients are arranged in a matrix $\mathbf{H}$.
Task 2¶
$$ \def\bbH{{{\mathbf H}}} $$
Write a filter object and endow it with a function that takes an image $\bbX$ as an input and returns as an output the result of convolving the image with the filter $\bbH$ that is stored as an attribute of the class. Make sure that this function can operate on a batch of input images $\blacksquare$
class ConvolutionalFilter2D(nn.Module):
"""
A very simple implementation of a convolutional filter. Given an image, it returns the result of convolving the image with the filter.
It is very simplified and processes each filter sequentially, to make the operation easier to understand.
It also assumes F=1 because MNIST has only one input channel.
A more sophisticated version of this filter would process all the filters and images in parallel, to take advantage of the batch processing.
Args:
num_filters (Int): The number of filters in the bank.
K (Int): Defines the size of each filter in the bank. The true size is (2K+1)^2.
Recall that we sweep from -K to K. So if K is 1, each filter in our bank is a 3x3 grid.
The filter is assumed to be square.
"""
def __init__(self, num_filters, K):
super(ConvolutionalFilter2D, self).__init__()
# L is the number of pixels in each dimension (For the example above L = (1) * 2 + 1 = 3 and each filter is LxL = 3x3)
L = K*2+1
# The number of filters in the bank determines the number of output channels AKA output features
self.num_filters = num_filters
self.filters = torch.zeros((num_filters, L, L))
def forward(self, X):
"""
Performs a 2D convolution on the given image with the given
Args:
X (torch.Tensor): input image of shape (B, F=1, M, N)
filters (torch.Tensor): filters of shape (G, 2K+1, 2K+1).
Returns:
torch.Tensor: Convolution of the image with shape (G, M, N).
"""
# We are assuming square filters. The true size is filter_size^2
G, num_coefficients, _ = self.filters.shape
B, M, N = X.shape # Here F=1
# By definition num_coefficients = 2*K+1
K = (num_coefficients-1)//2
# First we will create a padded version of our image, to follow the convention that SX_ij = 0 for i,j out of bounds
# Create the padded dimensions. There are 2K-1 elements to be added to each dimension of the image.
V_pad = M + 2*K
H_pad = N + 2*K
# Initialize padded image with the new dimensions
X_padded = torch.zeros((B, G, V_pad, H_pad))
# Place the original images in the center of the X_padded tensor.
X_padded[:, :, K:(V_pad-K), K:(H_pad-K)] = X.unsqueeze(1)
# Initialize the convolution output tensor
Y = torch.zeros((B, G, M, N))
# Iterate over the filters
for g in range(G):
# Get the current filter H^(g)
H = self.filters[g]
# Iterate over the filter taps (these are the parameters we are learning)
for u in range(2*K+1):
for v in range(2*K+1):
# Shift the image vertically and horizontally
# This operation is equivalent to Z_uv = S_V^u * S_H^v * X.
# u:u + M shifts in the vertical direction
# v:v + N shifts in the horizontal direction
Z_uv = X_padded[:, g, u:u + M, v:v + N]
# (ALTERNATIVE IMPLEMENTATION)
# Another way to implement the line above is using torch.roll: roll u to the top and v to the left
# Roll-based version: roll u to the top and v to the left, then take the first MxN elements (to match the dimensions of Y)
#Z_uv = torch.roll(torch.roll(X_padded[g], -u, dims=0), -v, dims=1)[:M, :N]
# Multiply the kernel with the rolled image and sum the result:
# conceptually this operation is Y[g] = Z_uv * H
Y[:,g] += Z_uv * H[u, v]
return Y
# An example to test our implementation, with a (4x4) image with all ones, and a filter with values:
# H = [[1,2,3],[4,5,6],[7,8,9]]
X = torch.ones(1,1,4,4)
convolution = ConvolutionalFilter2D(num_filters=1, K=1)
# Manually set the filters to the values (just to test)
convolution.filters[:,:,:] = torch.arange(1,10).reshape(1,3,3)
print("Input image:")
display(X[0,0,:,:])
print("Filter:")
display(convolution.filters[0])
print("Convolution output:")
convolution(X[:,0,:,:])
Input image:
tensor([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]])
Filter:
tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
Convolution output:
tensor([[[[28., 39., 39., 24.], [33., 45., 45., 27.], [33., 45., 45., 27.], [16., 21., 21., 12.]]]])
Task 3¶
Define filters $\bbH$ with entries $h_{kl} = 1/((2K+1)^2)$. Use the class in Task (2) to instantiate these filters with $K=2$, $K=4$, and $K=8$. Apply these filters to the images you visualized in Task (1).
What is the effect of applying these filters? $\blacksquare$
# Take any sample image
img = train_set[0][0]
# Plot an original image (no convolution) to compare against
fig, ax = plt.subplots(1,4, figsize=(14,3.5))
ax[0].imshow(img.squeeze(), cmap = 'binary_r')
ax[0].set_title("Original image")
K_vals = [2,4,8]
for i, K in enumerate(K_vals):
# Instantiate Filter Bank with one filter (single output channel)
H = ConvolutionalFilter2D(num_filters = 1, K = K)
# Instantiate each filter in the bank with a set of uniform values
# (Later on, these filter values will be learned)
H.filters = torch.ones((1, 2*K+1, 2*K+1))*(1/(2*K+1)**2)
# Perform 2D convolution
out = H(img)
# Plot the convolved image for each K
ax[i+1].imshow(out.squeeze(), cmap = 'binary_r')
ax[i+1].set_title(f"Covolved image with K={K}")
Task 4¶
Pytorch comes with its own function for computing two dimensional convolutions. Consider the same filters of Task 3 with $K=8$. Execute your own convolution code and the code that comes built in in Pytorch to compute convolutions for a batch of $B=100$ images. Compare the execution times. $\blacksquare$
# We can create a DataLoader from the train set
loader = DataLoader(train_set, batch_size=1024, shuffle=True)
imgs = next(iter(loader))[0][:100]
# Perform convolution with function from torch
K = 8
G = 40 # 40 output filters to really see the effect of Pytorch's implementation
conv = nn.Conv2d(in_channels = 1, out_channels = G, kernel_size = 2*K+1, padding = 'same', bias = False)
# Time convolution
start = time.time()
out = conv(imgs)
end = time.time()
print(f"PyTorch's Convolution took: {end-start:.3f} seconds")
# Perform convolution with your function
filterbank = ConvolutionalFilter2D(num_filters = G, K = 8)
# Initialize filters with the torch initialization
filterbank.filters = conv.weight.squeeze(1)
# Time our implementation
start = time.time()
out_manual = filterbank(imgs.squeeze())
end = time.time()
print(f"Manual Convolution took: {end-start:.3f} seconds")
# Check that outputs match
print(f"Manual convolution and torch convolution output the same tensor: {torch.allclose(out_manual, out, atol = 1e-5)}")
PyTorch's Convolution took: 0.027 seconds Manual Convolution took: 0.403 seconds Manual convolution and torch convolution output the same tensor: True
In solving Task 4 you must have noticed that the Pytorch code is much faster. This is because the Pytorch convolution function is just a wrapper that calls a compiled subroutine written in C. If this were the 1990’s we would say that “true men code in C,” which was indeed a quip that was common in the 1990’s. Lucky for us all we live in a time when we do not require men to be tough, nor women to stay away from engineering, nor persons to define themselves as either men or women. So let us just remember that industry level data sciences still relies on low level languages. Python and Pytorch are prototyping languages.
We will use the Pytorch convolution function in the remainder of Lab 4.
4. Filterbanks¶
$$ \def\bbZ{{{\mathbf Z}}} \def\bbA{{{\mathbf A}}} \def\bbx{{{\mathbf x}}} $$
To use convolutional filters in classification tasks we will build convolutional neural networks (CNNs). An intermediate step is to use the energy content of the outputs of a convolutional filterbank.
Begin then by defining the energy of an image as the sum of the squares of the values of each pixel, $$ \begin{equation}\tag{16} \|\bbX\|^2 = \sum_{m=0}^{M-1} \sum_{n=0}^{M-1} x^2(m,n). \end{equation} $$
Continue by defining a collection of $G$ filters $\bbH^g$ each of which contains filter coefficients $h^g_{kl}$. Processing the input image $\bbX$ with each of these filters produces the output feature, $$ \begin{equation}\tag{17} \bbZ^g = \sum_{~k={-K}}^{K~} \sum_{~l={-K}~}^{K} h^g_{kl} \, \Sv^k \, \Sh^l \, \bbX . \end{equation} $$
Each of the $\bbZ^g$ features is an image. This set of images has to be converted into a set of class scores as we did in Lab 2C. We will do that in two steps. The first step is to create a vector that groups the energies of the outputs of each filterbank
$$ \begin{equation}\tag{18} \bbx_1 = \Big[~ \big\| \bbZ^1 \big\|^2;\, \big\| \bbZ^2 \big\|^2;\, \ldots;\, \big\| \bbZ^G \big\|^2 ~\Big]. \end{equation} $$ The second step is to process this vector with a readout layer. This is just a multiplication with a matrix $\bbA$ that matches the dimension $G$ of the filterbank with the number of classes $C$,
$$ \begin{equation}\tag{19} \bbx_2 = \ \bbA \bbx_1. \end{equation} $$ Each of the entries of $\bbx_2$ is a class score. We will train this filterbank to correctly predict class labels.
Task 5¶
Create a filterbank class. In this class the filter coefficients $H_g$ the matrix A are attributes. The class has a method that takes an image X as an input and implements the filterbank equations in (17) – (19). The method returns the vector of scores $x_2$ as an output.
class ConvolutionalFilterWithPytorch(nn.Module):
"""
A filter bank. This class is very similar to the one in the previous task,
however now we are using torch's built in convolution functions.
Args:
num_filters (Int): The number of filters in the bank (these filters are in parallel!)
K (Int): Defines the size of each filter in the bank. The true size is (2K+1)^2.
Recall that we sweep from -K to K. So if K is 1, each filter in our bank is a 3x3 grid.
The filter is assumed to be square.
"""
def __init__(self, num_filters, K):
super(ConvolutionalFilterWithPytorch, self).__init__()
# In the MNIST dataset each class corresponds to a digit
num_classes = 10
# Filter Bank with num_filter filters and num_taps taps
self.conv1 = nn.Conv2d(in_channels = 1,
out_channels = num_filters,
kernel_size = 2*K+1,
stride = 1,
padding = 'same')
# Linear layer. This layer takes our num_filters number of output channels and transforms it into an output with dimension num_classes
self.linear = nn.Linear(num_filters, num_classes)
def forward(self, x):
# Pass through filter bank
x = self.conv1(x)
# Compute energy of each output feature
# Notice that this condenses filter bank output into a vector of shape (b, num_filters)
x = torch.sum( x**2, dim=(2,3) )
# Linear map between feature energies and class scores
x = self.linear(x)
return x
Task 6¶
Split the handwritten digit dataset into train and test sets. Instantiate the class of Task 5 to train a filterbank that classifies images to the corresponding digit class. Use the crossentropy loss as the optimization objective.
Train¶
We already had train and test datasets from Pytorch’s MNIST loaders. All we need to do is take our usual training and evaluation code.
# We define a function to evaluate the loss. This is an auxiliary function. We do not need it to
# implement or train the neural network. We use it for evaluation after training and/or at some
# intermediate training checkpoints. This is just for visualization.
def evaluate(test_dataloader, estimator):
"""
Evaluate the performance of the estimator on the given dataloader.
Args:
dataloader (torch.utils.data.DataLoader): Data on which to evaluate performance.
estimator (torch.nn.Module): Learning parameterization to evaluate.
Returns:
float: Accuracy of the estimator evaluated on the dataloader.
"""
correct = 0 # Initialize counter for correct predictions
with torch.no_grad(): # Disable gradient computations (not needed for evaluation)
for data, target in test_dataloader: # Sweep over all batches
data = data.to(device) # Move the data tensor to the device (e.g. from CPU memory to GPU memory)
target = target.to(device) # Move the target tensor to the device (e.g. from CPU memory to GPU memory)
output = estimator(data) # Get model predictions for the batch
# Select the most likely index (class) from the output tensor
pred = output.argmax(dim=-1)
# Count the number of correct predictions in the batch
correct += torch.sum(pred == target)
# Calculate overall accuracy: percentage of correct predictions
test_accuracy = correct / len(test_dataloader.dataset)
return test_accuracy
# Instatiate model
estimator = ConvolutionalFilterWithPytorch(num_filters = 30, K = 1).to(device)
# Instantiate cross entropy loss
cross_entropy = nn.CrossEntropyLoss()
# Set the parameters of stochastic gradient descent (SGD). These include the learning rate,
# batch size, and number of epochs. Here we increased the learning rate to 0.03 and the number of epochs to 20.
lr = 0.0001
batch_size = 64
n_epochs = 10
optimizer = optim.SGD(estimator.parameters(), lr=lr)
# Specify the loss function. For the CNN, we use cross-entropy loss since this is a classification task.
loss = nn.CrossEntropyLoss()
# The train and test dataloaders handle randomized batches in the training set and non-shuffled
# batches in the test set. We keep the same structure as before, where we use these dataloaders
# to handle loading data into memory.
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_set, batch_size=1024, shuffle=False)
# Initialize null structures for storing the evolution of the training loss and test accuracy
# at the end of each epoch. This is not needed for training, just for displaying results.
losses = []
test_acc = []
print('\n')
# Begin the training loop. This is similar to the stochastic gradient descent (SGD) implementation
# used for the convolutional filter. In the inner loop, we sweep over batches of size batch_size.
# In the outer loop, we sweep over epochs. One epoch is a complete pass through the shuffled dataset.
#
# We follow the three steps required to run SGD: (i) Load the data, (ii) Evaluate Gradients,
# (iii) Take a gradient descent step. The only difference is the estimator, which is now a CNN.
for epoch in range(n_epochs): # Iterate over n_epochs epochs
for batch_idx, (x_batch, y_batch) in enumerate(train_loader): # Iterate over all batches in the dataset
# (Step i) Load the data. These commands send the data to the GPU memory.
x_batch = x_batch.to(device)
y_batch = y_batch.to(device)
# (Step ii) Compute the gradients. Use automatic differentiation.
estimator.zero_grad() # Reset the gradients to zero
y_hat = estimator(x_batch).squeeze() # Forward pass through the CNN and squeeze the output.
cross_entropy_value = loss(y_hat, y_batch.type(torch.LongTensor).to(device)) # Compute the loss.
cross_entropy_value.backward() # Compute gradients by moving backwards to the gradient reset.
# (Step iii) Update parameters by taking an SGD (or other optimizer) step.
optimizer.step()
# Print training stats at specified intervals to track progress.
if batch_idx print(f"Train Epoch: {epoch+1} \tLoss: {cross_entropy_value.item():.3f}")
# Record the loss at each iteration for visualization.
losses.append(cross_entropy_value.item())
# End of batch loop.
# Evaluate the performance of the CNN on the test set at the end of each epoch.
test_accuracy = evaluate(test_loader, estimator)
test_acc.append(test_accuracy.cpu())
# Print the test accuracy at the end of each epoch to track performance.
print(f'Epoch {epoch+1} / {n_epochs}: Test Accuracy: {test_accuracy*100:.2f}# End of epoch loop.
print('\n\nDone\n')
# Plot the training loss versus the number of iterations.
plt.plot(losses)
plt.title("training loss")
Train Epoch: 1 Loss: 386.865 Train Epoch: 1 Loss: 52.115 Train Epoch: 1 Loss: 22.191 Train Epoch: 1 Loss: 14.315 Train Epoch: 1 Loss: 9.729 Train Epoch: 1 Loss: 7.722 Train Epoch: 1 Loss: 6.955 Train Epoch: 1 Loss: 4.477 Train Epoch: 1 Loss: 3.963 Train Epoch: 1 Loss: 3.824 Train Epoch: 1 Loss: 3.733 Train Epoch: 1 Loss: 3.938 Train Epoch: 1 Loss: 7.096 Train Epoch: 1 Loss: 3.136 Train Epoch: 1 Loss: 2.092 Train Epoch: 1 Loss: 2.906 Train Epoch: 1 Loss: 2.939 Train Epoch: 1 Loss: 4.117 Train Epoch: 1 Loss: 2.244 Epoch 1 / 10: Test Accuracy: 43.90% Train Epoch: 2 Loss: 3.227 Train Epoch: 2 Loss: 2.239 Train Epoch: 2 Loss: 2.290 Train Epoch: 2 Loss: 1.807 Train Epoch: 2 Loss: 1.874 Train Epoch: 2 Loss: 3.411 Train Epoch: 2 Loss: 2.431 Train Epoch: 2 Loss: 2.651 Train Epoch: 2 Loss: 2.986 Train Epoch: 2 Loss: 2.745 Train Epoch: 2 Loss: 2.602 Train Epoch: 2 Loss: 2.128 Train Epoch: 2 Loss: 2.060 Train Epoch: 2 Loss: 2.602 Train Epoch: 2 Loss: 1.867 Train Epoch: 2 Loss: 1.648 Train Epoch: 2 Loss: 2.109 Train Epoch: 2 Loss: 1.728 Train Epoch: 2 Loss: 1.721 Epoch 2 / 10: Test Accuracy: 44.87% Train Epoch: 3 Loss: 1.853 Train Epoch: 3 Loss: 2.147 Train Epoch: 3 Loss: 1.873 Train Epoch: 3 Loss: 2.447 Train Epoch: 3 Loss: 1.695 Train Epoch: 3 Loss: 1.124 Train Epoch: 3 Loss: 1.295 Train Epoch: 3 Loss: 2.005 Train Epoch: 3 Loss: 1.715 Train Epoch: 3 Loss: 1.833 Train Epoch: 3 Loss: 1.457 Train Epoch: 3 Loss: 1.722 Train Epoch: 3 Loss: 1.264 Train Epoch: 3 Loss: 1.370 Train Epoch: 3 Loss: 2.754 Train Epoch: 3 Loss: 1.310 Train Epoch: 3 Loss: 1.543 Train Epoch: 3 Loss: 1.880 Train Epoch: 3 Loss: 1.685 Epoch 3 / 10: Test Accuracy: 44.33% Train Epoch: 4 Loss: 1.993 Train Epoch: 4 Loss: 1.462 Train Epoch: 4 Loss: 1.532 Train Epoch: 4 Loss: 1.571 Train Epoch: 4 Loss: 1.713 Train Epoch: 4 Loss: 1.500 Train Epoch: 4 Loss: 1.870 Train Epoch: 4 Loss: 1.468 Train Epoch: 4 Loss: 1.373 Train Epoch: 4 Loss: 1.330 Train Epoch: 4 Loss: 1.156 Train Epoch: 4 Loss: 1.356 Train Epoch: 4 Loss: 1.875 Train Epoch: 4 Loss: 1.396 Train Epoch: 4 Loss: 1.593 Train Epoch: 4 Loss: 1.315 Train Epoch: 4 Loss: 1.283 Train Epoch: 4 Loss: 1.343 Train Epoch: 4 Loss: 1.249 Epoch 4 / 10: Test Accuracy: 53.62% Train Epoch: 5 Loss: 1.647 Train Epoch: 5 Loss: 1.495 Train Epoch: 5 Loss: 1.562 Train Epoch: 5 Loss: 1.320 Train Epoch: 5 Loss: 1.402 Train Epoch: 5 Loss: 1.758 Train Epoch: 5 Loss: 1.542 Train Epoch: 5 Loss: 1.102 Train Epoch: 5 Loss: 1.239 Train Epoch: 5 Loss: 1.381 Train Epoch: 5 Loss: 1.124 Train Epoch: 5 Loss: 1.096 Train Epoch: 5 Loss: 1.506 Train Epoch: 5 Loss: 1.750 Train Epoch: 5 Loss: 1.306 Train Epoch: 5 Loss: 1.549 Train Epoch: 5 Loss: 1.215 Train Epoch: 5 Loss: 1.415 Train Epoch: 5 Loss: 1.355 Epoch 5 / 10: Test Accuracy: 43.91% Train Epoch: 6 Loss: 1.511 Train Epoch: 6 Loss: 1.495 Train Epoch: 6 Loss: 1.314 Train Epoch: 6 Loss: 1.559 Train Epoch: 6 Loss: 1.432 Train Epoch: 6 Loss: 0.997 Train Epoch: 6 Loss: 1.506 Train Epoch: 6 Loss: 1.022 Train Epoch: 6 Loss: 1.033 Train Epoch: 6 Loss: 1.373 Train Epoch: 6 Loss: 1.133 Train Epoch: 6 Loss: 1.356 Train Epoch: 6 Loss: 1.357 Train Epoch: 6 Loss: 1.354 Train Epoch: 6 Loss: 1.527 Train Epoch: 6 Loss: 1.663 Train Epoch: 6 Loss: 1.352 Train Epoch: 6 Loss: 1.123 Train Epoch: 6 Loss: 1.397 Epoch 6 / 10: Test Accuracy: 51.52% Train Epoch: 7 Loss: 1.444 Train Epoch: 7 Loss: 1.301 Train Epoch: 7 Loss: 0.904 Train Epoch: 7 Loss: 1.521 Train Epoch: 7 Loss: 1.333 Train Epoch: 7 Loss: 1.401 Train Epoch: 7 Loss: 1.436 Train Epoch: 7 Loss: 1.080 Train Epoch: 7 Loss: 1.558 Train Epoch: 7 Loss: 1.752 Train Epoch: 7 Loss: 1.049 Train Epoch: 7 Loss: 1.493 Train Epoch: 7 Loss: 1.386 Train Epoch: 7 Loss: 1.142 Train Epoch: 7 Loss: 1.353 Train Epoch: 7 Loss: 1.099 Train Epoch: 7 Loss: 1.287 Train Epoch: 7 Loss: 1.193 Train Epoch: 7 Loss: 1.212 Epoch 7 / 10: Test Accuracy: 55.58% Train Epoch: 8 Loss: 1.088 Train Epoch: 8 Loss: 1.253 Train Epoch: 8 Loss: 1.250 Train Epoch: 8 Loss: 1.212 Train Epoch: 8 Loss: 1.079 Train Epoch: 8 Loss: 1.336 Train Epoch: 8 Loss: 1.001 Train Epoch: 8 Loss: 1.366 Train Epoch: 8 Loss: 1.575 Train Epoch: 8 Loss: 1.267 Train Epoch: 8 Loss: 1.171 Train Epoch: 8 Loss: 1.118 Train Epoch: 8 Loss: 1.230 Train Epoch: 8 Loss: 1.459 Train Epoch: 8 Loss: 1.483 Train Epoch: 8 Loss: 1.275 Train Epoch: 8 Loss: 1.317 Train Epoch: 8 Loss: 1.421 Train Epoch: 8 Loss: 1.393 Epoch 8 / 10: Test Accuracy: 52.39% Train Epoch: 9 Loss: 1.049 Train Epoch: 9 Loss: 1.059 Train Epoch: 9 Loss: 1.046 Train Epoch: 9 Loss: 1.217 Train Epoch: 9 Loss: 1.277 Train Epoch: 9 Loss: 0.964 Train Epoch: 9 Loss: 1.130 Train Epoch: 9 Loss: 1.522 Train Epoch: 9 Loss: 1.167 Train Epoch: 9 Loss: 1.483 Train Epoch: 9 Loss: 1.266 Train Epoch: 9 Loss: 1.464 Train Epoch: 9 Loss: 1.257 Train Epoch: 9 Loss: 1.635 Train Epoch: 9 Loss: 1.128 Train Epoch: 9 Loss: 1.229 Train Epoch: 9 Loss: 1.159 Train Epoch: 9 Loss: 1.224 Train Epoch: 9 Loss: 1.399 Epoch 9 / 10: Test Accuracy: 52.31% Train Epoch: 10 Loss: 1.179 Train Epoch: 10 Loss: 1.037 Train Epoch: 10 Loss: 1.099 Train Epoch: 10 Loss: 1.187 Train Epoch: 10 Loss: 1.218 Train Epoch: 10 Loss: 1.721 Train Epoch: 10 Loss: 1.489 Train Epoch: 10 Loss: 0.979 Train Epoch: 10 Loss: 1.114 Train Epoch: 10 Loss: 1.506 Train Epoch: 10 Loss: 1.267 Train Epoch: 10 Loss: 1.013 Train Epoch: 10 Loss: 1.153 Train Epoch: 10 Loss: 1.034 Train Epoch: 10 Loss: 1.114 Train Epoch: 10 Loss: 1.151 Train Epoch: 10 Loss: 1.266 Train Epoch: 10 Loss: 1.508 Train Epoch: 10 Loss: 1.260 Epoch 10 / 10: Test Accuracy: 50.09% Done
Text(0.5, 1.0, 'training loss')