This course is an introduction to the basic concepts of artificial intelligence (AI) as is practiced in the 2020s. At the end of this course you will be able to build AI systems that you can train to solve a variety of problems. Of note, we will cover time series, images, networks, and dynamical systems.
A more detailed description of the course is that we define an AI as an artificial system that mimics some other natural system. To design AIs we need to utilize proper models. The choice of model is influenced by our knowledge of the system that is imitated and also depends on the data that we have available and on the decisions that we want the AI to make. Most often, our models have some parameters that we tune to improve the fit between the AI and the natural system. This is the process of learning.
For the above paragraph to make more sense, keep reading.
Systems
A system is an entity that processes some inputs to produce some outputs. In engineering practice, inputs and outputs are often quantified and for this reason we denote inputs with a variable $x$, outputs with a variable $y$, and the system itself is represented by the function $f(\cdot)$ that codifies the relationship $y=f(x)$. A schema is shown in Figure 1.
This definition is vague on purpose because it is intended to maintain generality. Pretty much any task can be codified as the action of a system. For instance, when we hear a word and understand its meaning we are acting as a system that takes a time varying sequence of air pressures as inputs and produces a categorical representation as an output. When we read a digit, we are a system that takes light patterns as inputs and produces numbers as outputs — a number being the common property of finite sets that can be related with a bijection. When we watch and rate a movie, we are a system that takes movies as inputs and produces ratings as outputs. Driving a car around a circuit requires a driver that takes as input the desired trajectory and produces a sequence of acceleration inputs that results in the car following the circuit. The driver is a system.
Just as important, this definition of a system is vague because we can keep it vague and yet make it useful. Despite their significant differences all of the systems that we describe above can be represented by the schema in Figure 1. They all process some input — audio, light, movies, or target trajectories — to produce some output — concepts, numbers, ratings, or acceleration sequences. They are, in fact, four systems that we will study in this course. We consider audio in Lab 2, we study images in Lab 4, we look at movie recommendations in Lab 3, and we drive cars around a circuit in Lab 5.
Artificial Intelligence
A first definition of an artificial intelligence (AI) is that of a system that mimics the input-output relationship of a natural system; see Figure 2. When the natural system is presented with the input $x$ it responds by producing the output $y$. When the AI is presented with the same input $x$ it responds by producing the output $\hhaty$. If the AI is a good AI, the outputs it produces in response to a given input are similar to the outputs produced by the natural system.
Having an AI is useful because it can be used in lieu of the natural system. For instance, suppose that the input $x$ represents an image of a digit. The natural system is a standard human that looks at this image and reads the number $y$ that it represents. If the AI can successfully mimic a human by spitting numbers $\hhaty$ that are equal to the numbers recognized by a human reader we can use it in lieu of humans to recognize digits. This is good because the AI frees humans from the drudgery of reading digits. AI’s have been in use since the 1990’s to recognize digits in checks.
We must point out that the schema in Figure 2 is a flawed definition of AI. A 100 gram tungsten ball dropped from 1 meter mimics quite well any other tungsten ball dropped from 1 meter. Some would say that the definition is less flawed if we add the restriction that the emulated natural system is intelligent but this begs the question of what it means for a natural system to be intelligent. Besides, an artificial intelligence can still be useful if the system that it imitates is not intelligent.
Flawed it might be but the most important fact is that Figure 2 is a good operational definition which captures well the current practice of AI.
Data, Models, and Decisions
To design an AI system the first step is to acquire data. This is typically in the form of a set of $N$ input-output pairs $(x_i,y_i)$. We call this collection of examples the training set.
The next step is the selection of a model. This is usually in the form of a postulated relationship between inputs and outputs. This is written in Figure 2 as the function $\Phi(\cdot, w)$. The choice of model follows from our knowledge and understanding of the system. For example, convolutional neural networks (CNNs) have invariance and stability properties that make them adequate to process times series and images — as we will see in Labs 2 and 4. The choice of model also follows from accumulated empirical evidence of which models are known to work for specific kinds of systems.
As per Figure 2, when the AI is presented with the input $x$ it produces the output $\hhaty = \Phi(x, w)$. This output is the AI’s estimate or prediction of the actual output $y$ that the system produces when presented with input $x$. We also say that $\hhaty$ is the AI’s decision.
Machine Learning
In the AI’s decisions $\hhaty = \Phi(x, w)$ the variable $w$ is a parameter that has to be chosen. To choose this parameter we introduce a metric $\ell(y, \hhaty)$ to compare predicted outputs $\hhaty$ with actual outputs $y$. We then search for the parameter $w$ that minimizes this loss over the given set of input output pairs,
\begin{equation}\label{eqn_training}
w^* = \text{argmin}_w \sum_{i=1}^{N} \ell \Big(\, y_i , \Phi(x_i, w) \,\Big) .
\end{equation}
We call this formulation a supervised machine learning problem (ML). The process of finding the parameter $w^*$ that minimizes the loss averaged over the available data is called training.
We call (1) a supervised learning problem because the AI is given examples of inputs $x_i$ and their corresponding outputs $y_i$. Alternatively, we may be given example inputs and a cost function $c(\cdot)$ that assesses the merit of the AI decision $\Phi(x_i, w)$. In this case we formulate the unsupervised ML problem,
\begin{equation}\label{eqn_training_unsupervised}
w^* = \text{argmin}_w \sum_{i=1}^{N} c \Big(\, \Phi(x_i, w) \,\Big) .
\end{equation}
We must point out that (1) and (2) are controversial definitions of learning. There is nothing in the specifications to represent understanding, although some people argue that sufficiently complex imitation is understanding. As we did with the definition of AI, we remain agnostic to this discussion. Equations (1) and (2) are good operational definitions of learning which capture well the current practice of ML.
The Goal of This Course
We teach how to choose the functions $\Phi(x_i, w)$ in (1) and (2) and how to solve the optimization problems in (1) and (2).
That’s the short version. The longer version is that this course is operational in spirit. You will learn how to chose an appropriate AI model and how to train it. You will not learn much about why a particular model is chosen, why we train models the way we train them, or why any of this makes any sense and is expected to work at all. These three questions are respectively answered in your courses on Signal and Information Processing (ESE 2240), Optimization (ESE 2040), and Statistics for Data Science (ESE 4020).
This course is also an introduction. We will also work with some AI models but not all. Since they are the most popular we will study different variations of neural networks but we will not work with other AI models. You can learn about this in the Machine Learning course (CIS 5200). We will also keep our models simple. To learn more about neural networks, you can take either or both of our Deep Learning courses (ESE 3060 and ESE 5460). And we will also keep our problems simple. We will barely touch dynamical systems, which you can learn about in the Dynamical Systems course (ESE 2100) and we will not get to natural language processing (CIS 5300), Computer Vision (CIS 5810), or Generative Models (ESE 6450).
Should I Take This Course?
Yes! It has been prepared and it is being delivered with a lot of love.
If romanticism is not for you, consider that this course has been designed to teach basic concepts of AI with minimal prerequisites. If you have graduated high school and you take some time to learn a little bit of math and a little bit of programming you should be able to follow. The math we will use is not more complicated than equations (1) and (2). The most advanced concepts we will use are vectors, matrices, and gradients (derivatives). The programming we will use never goes beyond some 20 lines of code. Programming will be used at the level of this tutorial. This is in contrast to most other introductory courses which presuppose knowledge of statistics, which in turn requires calculus, linear algebra, and probability, along with programming competency.
A brief word of warning is that I want you to reread the sentence “If you have graduated high school… you should be able to follow.” This sentence does not say that it is easy. The ideal preparation for this course is one college level course in math and one college level course in programming — which can be taken concurrently. If you don’t have these ideal prerequisites you can still follow this course but it will take extra effort to learn some math and programming along the way. We will help. As I said above, this course is being delivered with a lot of love.
If not this course, do take some other course in AI. AI is already changing the practice of engineering, businesses, and science. It will soon change the practice of everything. This has little to do with ChatGPT and the other things you read about in the press. AI is transformative because data can make any system better and AI automates the exploitation of data to some extent. Combine this fact with the continuously decreasing cost of acquiring and processing data and the likelihood of making it through your career without encountering AI becomes negligible.