Lab 0: Software Requirements
We use the Python programming language and the PyTorch library for machine learning. If you have not use either of these tools before, you need to install them. Please refer to the software installation instructions in this page. If you have never installed software packages before, do not give up just yet. This is straightforward.
To aid in learning we provide annotated solutions to each assignment using Jupyter Notebooks. If you haven’t used this tool before, you need to install it as well. Installation instructions are available in the same page. This is straightforward too.
• Access Software Installation Instructions
Lab 1: Admissions to the University of Pennsylvania
We are tasked with designing a system to make admission decisions at the University of Pennsylvania (Penn). To design this system we need to acquire data and create a model to substantiate our decisions. For a collection of former students we have access to their high school (HS) grade point average (GPA), their Scholastic Assessment Test (SAT) scores, their gender and their Penn GPA. We decide to build an AI that mimics Penn GPA of admitted students. This AI takes as inputs high school GPA, SAT scores and gender, and produces as an output a prediction of the student’s Penn GPA.
Lab 1A: Data, Models, and Decisions (Day 1)
In this first part of the lab we visualize the data we are given and build a simple linear model. That is, we postulate that the Penn GPA of a student is a linear combination (a weighted sum) of their high school GPA and SAT scores. This AI makes predictions that are not too good, but it help us to introduce the three basic components of an AI system: Data, models, and decisions.
• Download Jupyter Notebook and Data.
Lab 1B: Gradient Descent and Stochastic Gradient Descent (Day 2)
In Lab 1A we postulated that Penn GPAs are linear combinations of high school GPA and SAT scores. We also utilized an optimization formulation to define the optimal coefficients of the corresponding weighted sum. The formulated problems were solved analytically by taking derivatives and setting them to zero. This is not possible in general. We need recursive numerical methods to find optimal coefficients. In this lab we introduce gradient descent and stochastic gradient descent. These are the workhorse algorithms of machine learning. The two algorithms whose use in finding optimal parameters of AI models is most widespread.
• Download Jupyter Notebook and Data.
Lab 1C: Neural Networks (Days 3 and 4)
Neural Networks (NNs) are information processing architectures made up of a composition of layers, each of which is itself the composition of a linear transformation with a pointwise nonlinearity. NNs are the most widespread technology used in AI models. This lab uses NNs to predict Penn GPA scores. We introduce the concepts of training and testing set.
• Download Jupyter Notebook and Data.
LAB 1X: Training Models with PyTorch
Gradient descent and stochastic gradient descent are finicky algorithms that require the computation of gradients (derivatives). Both are painful activities. It is painful to compute derivatives and it is painful to make a finicky algorithm work. The growing popularity of machine learning has led to the implementation of packages that automate the computation of derivatives and the implementation of stochastic gradient descent. In ESE 2000 we use the PyTorch package. Pytorch is a commercial product that I do not endorse.
Learning to use a package is not part of the goals of this course. This is just a means to an end. This tutorial is an Appendix. We will not cover it in class, but I encourage you to read it.
• Download Jupyter Notebook and Data.
Lab 2: Signals in Time: Audio Processing
Audio is mathematically modeled as a function x(t) in which t represents time and x(t) is an electric signal that is generated by transforming pressure waves with a microphone. The same pressure waves can be reconstructed from the electrical signal using a speaker. In this lab we want to use what we learned in Lab 1 to process audio. In particular, we want to clean up an audio signal (Labs 2A and 2B) and we want to recognize spoken words (Lab 2C).
Lab 2A: Convolutions in Time (Day 5)
Our goal is to remove an unwanted sound from a wanted sound. In the trade, we say that we want to remove noise from a signal. Given what we learned in Lab 1, we know how to do this. We gather data in the form of examples of contaminated and clean audio and we model the input/output relationship between the signal with noise and the signal without noise. Our decisions (the clean data) are the output of this model after we find optimal parameters with gradient descent.
This is what we will do, indeed. Except that the model here is not as easy as a matrix multiplication or a neural network. The reason for this is the large dimension of the input signal. We overcome the challenge of dimensionality using convolutions. Other than learning convolutions we will learn in this lab that building models is not straightforward. There are, in fact, a handful of models known to humanity that work in a handful of situations. These models are designing by exploiting our knowledge of the input and output signals. Convolutions, as we will see, leverage the locality of time.
Lab 2B: Convolutional Neural Networks in Time (Day 6)
A convolutional neural network in time is a layered architecture in which layers are compositions of convolutional filters and pointwise nonlinearities. We can think if a CNN as a nonlinear generalization of a convolution. The bulk of operations in a CNN are convolutions. Nonlinearities are a minor modification. We can equivalently think of CNNs as particular cases of fully connected neural networks (FCNNs). The linear map is particularized to a convolutional map.
Lab 2C: Classification of Spoken Digits (Days 7 and 8)
We are given data made of spoken digits, i.e., people saying digits out loud. We leverage our knowledge of CNNs to classify these spoken digits. To solve this classification problem we need to modify CNNs to include pooling and readout layers.
Lab 3: Signals on Graphs: Recommendation Systems
The objective of this lab is to design a recommendation system that predicts the ratings that customers would give to a certain product. Say, the rating that a moviegoer would give to a specific movie, or the rating that an online shopper would give to a particular offering. A possible approach to making these predictions is to leverage the ratings that customers have given to this or similar products in the past. This is called collaborative filtering.
Lab 3A: Graphs and Graph Convolutions (Day 9)
To build the collaborative filtering system, we use the rating history of all movies and all users to compute a graph of product similarities. Ratings of a particular user are then interpreted as a signal supported on the graph. A graph convolutional filter is a linear information processing architecture inspired by convolutional filters in time. It is an architecture that leverages the locality properties off a graph.
Lab 3B: Graph Neural Networks (Day 10)
A graph neural network (GNN) s a layered architecture in which layers are compositions of graph filters and pointwise nonlinearities. We can think if a CNN as a nonlinear generalization of a convolution. The bulk of operations in a CNN are convolutions. Nonlinearities are a minor modification. We can equivalently think of CNNs as particular cases of fully connected neural networks (FCNNs). The linear map is particularized to a convolutional map.
Lab 4: Images
Lab 4A: Image Classification (Day 11)
Lab 4B: Convolutional Neural Networks in Space (Day 12)
Midterm (Day 13)
We are halfway through and it is a good time to have a midterm. In this midterm we will discuss the idea of AI as the imitation of a natural system. We will review how to formulate empirical risk minimization problems and how to train models with stochastic gradient descent.
Most of the topics covered in the midterm relate convolutional information processing architectures. The goal is to emphasize the similarity between convolutions in time, graphs, and images and the corresponding similarities between convolutional neural networks (CNNs) in 1D, graph neural networks (GNNs), and convolutional neural networks in 2D. We will also review CNNs and GNNs with multiple features and spend some time reviewing pooling and readout.
Lab 5: Times Series and Transformers (Day 14 and 15)
In this lab we consider time series and their processing with transformers.
Lab 6: Natural Language Processing (Day 16 and 17)
We can think of language as a time series. Each word in a sentence corresponds to the equivalent of a different point in time and the words themselves represent different vectors of the time series. To represent language in this manner we need to codify words with vectors. We call this a word embedding. After designing word embeddings, we process language with a transformer.
• Download Assignment Notebook.
Lab 7: Generative Diffusion Models (Days 18 and 19)
We want to train an AI system that replicates the probability distribution itself. We are not just after classifying digits pulled from the bag of all digits that have, are, or will ever be written. We want to create an artificial bag from where we can pull digits with a distribution of likelihoods equivalent to the distribution of likelihoods of the natural system.
This is the most ambitious AI task we can conceive. If we succeed, we can prescind of reality altogether.
• Download Assignment Notebook.
Lab 8: Dynamical Systems (Days 20 and 21)
A dynamical system is a system that evolves over time. It is characterized by a state that changes with time and actions that we can choose to control its evolution. Our goal is to select actions that generate desirable state trajectories.
Lab 8A: Models and Control (Day 20)
An example of a dynamical system is a car driving a circuit. The state of the system is the position and velocity of the car. The control action is the acceleration. A desirable trajectory is defined here as one that follows the path of an expert.
We will see in this lab that the fundamental challenge in dynamical systems is the fact that actions we take at a particular point in time impact the whole future evolution of the dynamical system.
Lab 8B: Model Predictive Control (Day 21)
Lab 9: Reinforcement Learning (Days 22 and 23)
• Data.
Midterm (Days 23)
In the first day of the midterm we will review the topics that we covered in the second half of the course: Generative language models, generative diffusion processes, and dynamical systems. This second part of the course is less coherent than the first. The first half deals with different convolutional information processing architectures, which provides a unifying substrate for processing time, graphs and images. We do not have such underlying theme in the second part of the class but there is some method to the madness. These three topics are different approaches to formulating risk minimization problems to train artificial intelligences. We will review generative language models, generative diffusion processes, and dynamical systems from this perspective. We will emphasize how they are all good formulations of difficult learning goals.
Final (Day 24)
At the end of any course, this course in particular, it is good to take time to reflect on the lessons learned.
In our first lecture we introduced the idea of AI as the imitation of a natural system and posited that there are two components that are we must chose: The learning parameterization and the loss function. The first half of the course was about architectures. The second part was about loss functions.
Our definition of AI is worth revisiting. I stated at the beginning of the course that imitation of natural systems is a flawed definition of artificial intelligence. I also promised that, however flawed it may be, it was going to carry us far. I stated that it was a good operational definition that captures well the current practice of AI. I must be held accountable for this statement. I am ready to stand trial and confident that I will prevail. I am hoping that will agree with me that a significant part of the current practice AI is indeed based on the imitation of natural systems and that, for the most part, the design of an AI system is about the choice of a parameterization and a loss function.