Download the assignment PDF in the following link.

Download the Jupyter Notebook for this lab in the following link.

Download the data for the lab in the following link .

Instructions on how to download and use Jupyter Notebooks can be found here. You can find a static version of the notebook below.

Lab3B

Lab 3 Collaborative Filtering with Graph Neural Networks¶

The objective of this lab is to design a recommendation system that predicts the ratings that customers would give to a certain product. Say, the rating that a moviegoer would give to a specific movie, or the rating that an online shopper would give to a particular offering. To make these predictions we leverage the ratings that customers have given to this or similar products in the past. This approach to ratings predictions is called collaborative filtering.

We model collaborative filtering as a problem that involves a graph $\mathbf{S}$ and graph signals $\mathbf{x}_u$ supported on the nodes of the graph. Nodes $p$ of the graph $\mathbf{S}$ represent different products and the weighted edges $S_{pq}$ denote simlarities between products $p$ and $q$. The entries of the graph signal $\mathbf{x}_u$ represent the ratings that user $u$ has given to each product.

The motivation for developing a recommendation system is that customers have rated a subset of prodcuts. Each of us has seen a small number of movies or bought a small number of offerings. Thus, the ratings vector $\mathbf{x}_u$ contains only some entries that correspond to rated products. For the remaining entries we adopt the convention that $x_{ui}=0$. Our goal is to learn a function $\Phi(\mathbf{x}_{u}; \mathcal{H})$ that takes as input the vector $\mathbf{x}_u$ of available ratings and fills in predictions for the ratings that are not available.

We will develop and evaluate a graph neural network (GNN) for making these rating predictions.

Graph Neural Networks¶

Graph Neural Networks (GNNs) are information processing architectures made up of a composition of layers, each of which is itself the composition of a linear graph filter with a pointwise nonlinearity.

For a network with a given number of layers $L$ we define the input output relationship through the recursion

\begin{equation} \mathbf{X}_\ell = \sigma \Big(\, \mathbf{Z}_\ell \,\Big) = \sigma \left(\, \sum_{k=0}^{K_\ell} \mathbf{S}^k \mathbf{X}_{l-1} \mathbf{H}_{\ell k} \, \right) ,\quad \end{equation}

In this recursion the output of Layer $\ell-1$ is $\mathbf{X}_{l-1}$ and it is recast as an input to Layer $\ell$. In this layer, the input $\mathbf{X}_{l-1}$ is processed with a graph filter to produce the intermediate output $\mathbf{Z}_\ell$. The coefficients of this graph filter are the matrices $\mathbf{H}_{\ell k}$. This intermediate output is processed with a pointwise nonlinearity $\sigma$ to produce the output $\mathbf{X}_\ell $ of Layer $\ell$. That the nonlinear operation is pointwise means that it is acting separately on each entry of $\mathbf{Z}_l$.

To complete the recursion we redefine the input $\mathbf{X}$ as the output of Layer 0, $\mathbf{X}_0 = \mathbf{X}$. The output of the neural network is the output of layer $L$, $\mathbf{X}_L = \mathbf{\Phi}(\mathbf{X}; \mathcal{H})$. In this notation $\mathbb{H}$ is the tensor $\mathcal{H} := [\mathbf{H}_{11},\ldots,\mathbf{H}_{LK{\ell}}]$ that groups all of the filters that are used at each of the $L$ layers.

To specify a GNN we need to specify the number of layers $L$ and the characteristics of the filters that are used at each layer. The latter are the number of filter taps $K_\ell$ and the number of features $F_\ell$ at the output of the layer. The number of features $F_0$ must match the number of features at the input and the number of features $F_L$ must match the number of features at the output. Observe that the number of features at the output of Layer $(\ell-1)$ determines the number of features at the input of Layer $\ell$. Then, the filter coefficients at Layer $\ell$ are of dimension $F_{\ell-1} \times F_\ell$.

Environment setup¶

Before we beging we need to import the necessary Python Packages. We use Numpy to load data, pytorch for training, and matplotlib to plot and visualize results.

In [4]:

from google.colab import drive
import os

# This will prompt for authorization.
drive.mount('/content/drive')

# If you want to work within a specific folder, specify the path
folder_path = '/content/drive/My Drive/ese2000 ailab/Lab3B'

# You can then change the directory to this folder to perform operations within it
os.chdir(folder_path)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

In [5]:

import numpy as np
import torch; torch.set_default_dtype(torch.float64)
import torch.nn as nn
import torch.optim as optim
import copy
import pickle as pkl
import math
import matplotlib.pyplot as plt
from tqdm import tqdm

device = 'cuda' if torch.cuda.is_available() else 'cpu'

Code from Lab 3A: data loading, loss, and Graph Filters (Run Cells)¶

In [9]:

# The following command loads the data. For this to work the file "movie_data.p"
# must be uploaded to this notebook. To do that, navigate to the Jupyter Notebook
# interface home page and use the “Upload” option.
contents = pkl.load(open("movie_data.p","rb"))
data = contents['data']

# The data file contains a graph of movie similarities. The graph has P nodes.
# This is the graph we will use to run the GNN.
S = data['S'].to(device)
P = S.shape[0]

# The data file contains a training set with entries (x_n, y_n, p_n).
# These entries are stored in the arrays xTrain, yTrain, and pTrain,
# respectively. All of these arrays have nTrain columns, with each
# column corresponding to a different entry in the dataset.
xTrain = data['train']['x'].to(device) # Available ratings for making predictions.
yTrain = data['train']['y'].to(device) # Rating to be predicted
pTrain = data['train']['p'].to(device) # Index of the movie whose rating is being predicted
nTrain = xTrain.shape[0]

# Same thing with test data
xTest = data['test']['x'].to(device)
yTest = data['test']['y'].to(device)
pTest = data['test']['p'].to(device)
nTest = xTest.shape[0]

In [27]:

def movie_mse_loss(yHat, y, idxMovie):
    """
    evaluates the square of comparing yHat(p) with y. The function
    accepts as an input tensors with multiple movie ratings.
    Inputs:
      yHat: torch.Tensor (NumUsers x 1 x NumMovies)
        A set of rating estimates. It includes estimates of all movies.
      y: torch.Tensor (NumUsers x 1)
        A specific rating of a specific movie
      idxMovie: torch.Tensor
        The index of the movie whose rating is given by y

    Outputs:
      mse: jnp.ndarray
        Computed mean squared error.

    """
    # the .squeeze() method is needed for dimension match between y and yHat
    yHat = yHat.squeeze()
    # Isolate the predictions in yHat that we will use for comparison
    prediction = yHat[:,idxMovie.cpu().numpy()]
    return torch.mean((prediction-y)**2)

In [28]:

def filter_function(x, h, S, b):
    '''
    The following function defines a Graph Filter.
    Inputs:
      x: Input to Graph Filter
      h: Filter
      S: Graph Shift Operator
      b: Bias term

      Outpus:
      y: Output of Graph Filter
    '''
    # X is  B x G x N
    # S is N x N
    B, G, N = x.shape
    K, _, F = h.shape
    y = torch.zeros((B, N, F), device=device)
    # The following for-loop is utilized to perform the graph diffusions
    for k in range(K):
        # sum S^k x * h_k
        # We permute x dimensions it to get dimensions B x N x G
        # The filter has dimensions G x F
        y += torch.matmul(x.permute((0,2,1)), h[k])
        # diffuse signal
        x = torch.matmul(x, S)
    # y has dims B x N x F,
    # add bias of shape F
    y = y + b
    # permute dimensions to get the desired dimensions B x F x N
    y = y.permute(0, 2, 1)
    return  y

In [37]:

class FilterModule(nn.Module):
    #First we initialize the graph filter class and define the attributes
    def __init__(self, k, f_in, f_out):
        """
        initialize graph filter
        :param k: order of the filter
        :param f_in: filter input dimension
        :param f_out:  filter output dimension
        """
        super().__init__()
        self.k     = k
        self.f_in  = f_in
        self.f_out = f_out
        # initialize weights and bias
        self.bias = nn.parameter.Parameter(torch.Tensor(self.f_out))
        self.h = nn.Parameter(torch.randn(self.k, self.f_in, self.f_out))
        self.reset_parameters()

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.f_in * self.k)
        self.h.data.uniform_(-stdv, stdv)
        self.bias.data.uniform_(-stdv, stdv)

    def forward(self, x, S):
        """
        PyTorch module interface implementation; just apply the filter
        """
        return filter_function(x, self.h, S, self.bias)

Task 1¶

Program a class that implements a GNN with $L$ layers. This class receives as initialization parameters a GNN specification consisting of the number of layers $L$ and vectors $[K_1, \ldots, K_L]$ and $[F_0, F_1, \ldots, F_L]$ containing the number of taps and the number of features of each layer.

Endow the class with a method that takes an input feature $\mathbf{X}$ and produces the corresponding output feature $\mathbf{\Phi}(\mathbf{X}; \mathcal{H})$.

In [46]:

class GNNModule(nn.Module):
    # setup method gets called on init
    def __init__(self, l, k,  f, sigma):
      """
      param:l: (int) number of layers
      param:k: (list of length n) number of taps for each layer
      param:f: (list of length n+1) input dimension to each layer
      sigma: (nn.Module) nonlinear activation function (same for all layers)
      """
      super().__init__()
      # non linearity
      self.sigma = sigma
      gml = []
      # append layers
      for layer in range(l-1):
          gml.append(FilterModule(k[layer], f[layer],f[layer+1]).to(device))
      self.gml = gml
      # add last linear layer
      self.readout = FilterModule(1,f[layer+1], f[layer+2])

    # Define the forward pass
    def forward(self, x, S):
        for layer in self.gml:
            # graph filter
            x = layer(x, S)
            # non-linearity
            x = self.sigma(x)
        # linear readout
        x = self.readout(x, S)
        return x

Task 2¶

Train a GNN to predict movie ratings. Plot the evolution of the training loss and evaluate the loss in the test dataset. To obtain a good loss we need to experiment with the number of layers and the number of filter taps per layer.

First, we instantiate a GNN with the following characteristics:

Number of layers: 3
Filter order at each layer: 5, 5, 1
Input dimension at each layer: 1, 64, 32, 1
Non-linearity: ReLU(x) = max(0, x)

In [59]:

l = 3
k = [5,5,1]
f = [1, 64, 32, 1]
sigma = nn.ReLU()

model = GNNModule(l, k, f, sigma).to(device)

Below we define the training parameters and specify the optimizer. The training procedure has the following charasteristics:

Number of epochs (full passes over the dataset): 8
Initial Learning Rate: 0.01
Optimizer: ADAM
Batch Size: 20

In [60]:

# learning parameters
nEpochs = 8
learningRate = 0.05
batchSize = 20
optimizer = optim.Adam(model.parameters(), lr=learningRate)

Training loop¶

Below is the main trainig loop implementation. This is a very straightforward implementation based on a fixed number of epochs.

In [61]:

# Helper variables for data loading during training.
batchIndex = np.append(np.arange(0, nTrain, batchSize), nTrain)
nBatches = int(np.ceil(nTrain/batchSize))

In [62]:

lossTrain = []
for epoch in range(nEpochs):
    randomPermutation = np.random.permutation(nTrain)
    idxEpoch = [int(i) for i in randomPermutation]
    epoch_loss=0
    for batch in tqdm(range(nBatches)):
        # Determine batch indices
        thisBatchIndices = idxEpoch[batchIndex[batch]
                                    : batchIndex[batch+1]]

        # Get the samples in this batch
        xTrainBatch = xTrain[thisBatchIndices,:,:]
        yTrainBatch = yTrain[thisBatchIndices]
        pTrainBatch = pTrain[thisBatchIndices]

        model.zero_grad()
        # Obtain the output of the architectures
        yHatTrainBatch = model(xTrainBatch, S)
        # Compute loss
        lossValueTrain = movie_mse_loss(yHatTrainBatch, yTrainBatch, pTrainBatch)
        # Compute gradients
        lossValueTrain.backward()

        # Update parameters
        optimizer.step()

        #accumulate loss
        epoch_loss+=lossValueTrain.item()
        lossTrain.append(lossValueTrain.item())
    print(f"Epoch: {epoch+1} \t Loss {epoch_loss/nBatches:.2f} \n")

100%|██████████| 138/138 [00:01<00:00, 80.08it/s]

Epoch: 1 	 Loss 2.89

100%|██████████| 138/138 [00:01<00:00, 88.27it/s]

Epoch: 2 	 Loss 1.19

100%|██████████| 138/138 [00:01<00:00, 93.09it/s]

Epoch: 3 	 Loss 1.10

100%|██████████| 138/138 [00:01<00:00, 93.57it/s]

Epoch: 4 	 Loss 1.08

100%|██████████| 138/138 [00:01<00:00, 93.82it/s]

Epoch: 5 	 Loss 1.06

100%|██████████| 138/138 [00:01<00:00, 93.17it/s]

Epoch: 6 	 Loss 1.05

100%|██████████| 138/138 [00:01<00:00, 93.11it/s]

Epoch: 7 	 Loss 1.05

100%|██████████| 138/138 [00:01<00:00, 93.19it/s]

Epoch: 8 	 Loss 1.04

Training error evolution¶

We use matplotlib to visualize the evolution of the mean squared error during training.

In [63]:

plt.figure(figsize = (8,8))
plt.plot(lossTrain, label = 'Train MSE')
plt.ylabel('Mean Squared Error')
plt.xlabel('Batches')
plt.title('GNN Training Loss Evolution')

Out[63]:

Text(0.5, 1.0, 'GNN Training Loss Evolution')

No description has been provided for this image

Evaluation¶

We evaluate the testing accuracy for the GNN.

In [64]:

yHatTest = model(xTest, S)
testMSE = movie_mse_loss(yHatTest, yTest, pTest)
print(f"GNN Test Mean Squared Error: {testMSE:.5f}")

GNN Test Mean Squared Error: 0.93896