Download the files for Lab 5A from the following links:
Instructions on how to download and use Jupyter Notebooks can be found here. You can find a static version of the notebook below.
Lab 8A: Dynamical Systems¶
!git clone https://github.com/Damowerko/ese2000-dynamical-systems.git
import sys
sys.path.append('./ese2000-dynamical-systems/')
fatal: destination path 'ese2000-dynamical-systems' already exists and is not an empty directory.
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import torch
from torch import nn
# Imports for the dynamical system code
from ese2000_dynamical.config import Config
from ese2000_dynamical.track import load_track
data_path = Path("./ese2000-dynamical-systems/data")
# device for pytorch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
1 Dynamical Systems¶
A dynamical system is a system that evolves in time. We will consider here a state vector $x(t)$ and a control action $u(t)$. In a dynamical system the state at time $t+1$ is given by
\begin{equation}\tag{1} x(t+1) = f ( x(t), u(t) ). \end{equation}The interpretation of the previous equation is that the state of the system $x(t)$ and the control action $u(t)$ determine the state of the system at time $t+1$. Our goal is to choose control actions $u(t)$ to manipulate the state trajectory $x(t)$ to satisfy some stated goal.
We point out that the system is time invariant, deterministic, and memoryless. Dynamical systems may also have memory, be stochastic, or time varying.
1.1 Circuit Navigation¶
A simple characterization of a car is to model it as an ideal point mass whose acceleration we control. Thus, the state of the system at time $t$ includes the position $ p(t)$ and the velocity v(t) of the car. Positions and velocities are both vectors with horizontal and vertical coordinates. We therefore write the state of the car as,
\begin{equation}\tag{2} x(t) ~=~ [ \, p(t) ; \, v(t) \, ] ~=~ [ \, p_{H(t)} ; \, p_{V(t)} ; \, v_{H(t)} ; \, v_{V(t)} \, ]. \end{equation}The control action of the car is the acceleration $ a(t)$. As is the case of positions and velocities, the acceleration contains horizontal and vertical components. We therefore write the control action as
\begin{equation}\tag{3} u(t) ~=~ a(t) ~=~ [\, a_{H(t)} ; \, a_{V(t)} \, ]. \end{equation}As we have assumed that the car is an ideal point mass, we can readily write the dynamical system using the laws of motion. To do so we just need to specify the sampling time $T_s$ that elapses between subsequent time instances and write,
\begin{align} & p(t+1) ~=~ p(t) + T_s v(t) + \frac{T_s^2}{2} a(t), \\ & v(t+1) ~=~ v(t) + T_s a(t) \tag{4}. \end{align}This is true if the acceleration $ a(t)$ is maintained between times $t T_s $ and $(t+1)T_s$. It just follows from the definitions of velocity and acceleration.
In reality, the dynamics of a car are more complex than the dynamics of an ideal point mass. The laws of motion in previous equation are therefore a model of the real dynamical system.
Task 1¶
Load the data. This data contains $Q=10$ trajectories, which correspond to different laps of the circuit shown in Figure below. For each of these trajectories we have positions $ p_q(t)$, velocities $ v_q(t)$, and accelerations $ a_q(t)$ for times $t$ ranging from $t=0$ to $t=T=10$. Positions are measured in meters ($m$), velocities in meters per second ($m/s$), and accelerations in meters per second squared ($m/s^2$). The sampling time is $T_s = 0.1s$.
Plot the positions $ p_q(t)$ for some chosen trajectories.
# Load data
x_expert = np.load(data_path / "states.npy")
u_expert = np.load(data_path / "inputs.npy")
track = load_track(data_path / "track.npz")
plt.figure(figsize=(10, 5))
plt.gca().set_aspect("equal")
track.plot(plt.gca())
plt.title("Expert Trajectories")
# Loop through each trajectory in the expert trajectories
for x_ in x_expert:
plt.plot(x_[:, 0], x_[:, 1], "-", linewidth=1.0)
1.2 Model Predicted States¶
We expect the previous model to be a reasonable approximation of the actual car dynamics. Let us test this expectation.
As a first figure of merit we consider the state predictions that the model makes. For a given trajectory $q$ and a given point in time $t$ we consider the observed state of the car $ x_q(t)$ and the control action $ u_q(t)$. We then use the ideal point mass model to predict the state at time $t+1$,
\begin{align}\label{eqn_circuit_navigation_laws_of_motion_as_predictions_one_step} &\hat{p}_q(t+1) ~=~ p_q(t) + T_s v_q(t) + \frac{T_s^2}{2} a_q(t), \\ &\hat{v}_q(t+1) ~=~ v_q(t) + T_s a_q(t) \tag{5}. \end{align}We compare predicted positions $\hat{p}_q(t+1)$ and predicted velocities $\hat{v}_q(t+1)$ with observed positions $ p_q(t+1)$ and observed velocities $ v_q(t+1)$ to test the accuracy of the model.
Task 2¶
For a trajectory $ x_q(t)$ define the state estimation errors as
\begin{align} e_q(t) = \| p_q(t) – \hat{p}_q(t)\| + \| v_q(t) – \hat{v}_q(t)\| \tag{6}. \end{align}Plot trajectory errors $e_q(t)$ for some trajectories. Compute and plot the average error $ \bar{e}_q(t)=(1/Q)\sum_{q=1}^Q e_q(t)$. Compute the average error over time $\bar{\bar{e}}_q(t)=(1/T)\sum_{t=1}^T \bar{e}_q(t)$.
Report Deliverables:
- Plot of trajectory errors $e_q(t)$
- Average error
# plot error for some trajectories
trajectories = [1, 3, 5] # choose some trajectories
T = Config.duration
Ts = Config.time_step
num_steps = int(T/Ts)
# Loop through trajectories
for trajectory_idx in trajectories:
x_trajectory = x_expert[trajectory_idx]
v = x_trajectory[:, 2:]
p = x_trajectory[:, :2]
a = u_expert[trajectory_idx]
error = []
for t in range(1, num_steps-1):# we start predicting the for t=1
# Predict velocity v[t] + a[t]
hat_v = v[t] + Ts * a[t]
# predict position
hat_p = p[t] + Ts * v[t] + (Ts**2/2)*a[t]
# compute error
e_t = np.linalg.norm(p[t+1] - hat_p)+ np.linalg.norm(v[t+1] - hat_v)
error.append(e_t)
# plot error evolution for the current trajectory
plt.plot(error, label=f"Trajectory {trajectory_idx}")
plt.xlabel("t")
plt.ylabel("$e_q(t)$")
plt.legend()
plt.show()
# Mean error
num_trajectories = x_expert.shape[0]
avg_error = []
for t in range(1, num_steps-1):# we start predicting the for t=1
error_t = 0
for trajectory_idx in range(num_trajectories):
x_trajectory = x_expert[trajectory_idx]
v = x_trajectory[:, 2:]
p = x_trajectory[:, :2]
a = u_expert[trajectory_idx]
# Predict velocity v[t] + a[t]
hat_v = v[t] + Ts * a[t]
# predict position
hat_p = p[t] + Ts * v[t] + (Ts**2/2)*a[t]
# compute error and add it to total error
error_t += np.linalg.norm(p[t+1] - hat_p)+ np.linalg.norm(v[t+1] - hat_v)
avg_error.append(error_t/num_trajectories)
# plot error evolution
plt.plot(avg_error)
plt.xlabel("t")
plt.ylabel("Average $e_q(t)$")
plt.show()
avg_error_time = np.mean(avg_error)
print(f"Average error over time is: {avg_error_time }" )
Average error over time is: 0.06329752727784835
1.3 Model Predicted Trajectories¶
The errors in Task 2 are small. This is an indication that the ideal point mass model is accurate. To further test the accuracy of this model we compare observed trajectories with model predicted trajectories. This entails selecting the actions $ a_q(t)$ of a particular trajectory to generate the model predicted trajectory as
\begin{align} &\hat{p}_q(t+1) ~=~ \hat{p}_q(t) + T_s \hat{v}_q(t) + \frac{T_s^2}{2} a_q(t), \nonumber \\ &\hat{v}_q(t+1) ~=~ \hat{v}_q(t) + T_s a_q(t) \tag{7}. \end{align}Observe how in this expression we use the observed accelerations $ a_q(t)$ as control actions to generate the trajectory $\hat{x}(t) = [ \hat{p}(t) ; \hat{v}(t) ]$ predicted by the model in (4). This differs from the previous method in (5) in which state predictions at time $t+1$ depend on the state that is observed at time $t$. In this model (7) we compound predictions.
Comparing $\hat{x}(t)$ with $ x(t)$ gives an alternative measurement of the accuracy of the ideal point mass model in (4).
Task 3¶
To compare trajectories $x_q(t)$ with model predicted trajectories $\hat{x}(t)$ we define the position errors
\begin{align} e_q(t) = \| p_q(t) – \hat{p}_q(t+1)\| \tag{8}. \end{align}Plot trajectory errors $e_q(t)$ for some trajectories. Compute and plot the average error $\bar{e}_q(t)=(1/Q)\sum_{q=1}^Q e_q(t)$.
Plot the positions $p_q(t)$ along with the predicted positions $\hat{p}(t)$ for some chosen trajectories.
These plots are likely unexpected. Comment.
Report Deliverables:
- Plot of trajectory errors $e_q(t)$
- Plot of an observed trajectory and the corresponding model predicted trajectory
- Paragraph explaining differences between observed and model predicted trajectories.
# plot error for some trajectories
trajectories = [1, 3, 5] # choose some trajectories
T = Config.duration
Ts = Config.time_step
num_steps = int(T/Ts)
for trajectory_idx in trajectories:
x_trajectory = x_expert[trajectory_idx]
v = x_trajectory[:, 2:]
p = x_trajectory[:, :2]
a = u_expert[trajectory_idx]
error = []
# initial values
hat_v = v[0]
hat_p = p[0]
for t in range(1, num_steps-1):# we start predicting the for t=1
# Predict velocity v[t] + a[t]
hat_v = hat_v + Ts * a[t]
# predict position
hat_p = hat_p + Ts * hat_v + (Ts**2/2)*a[t]
# compute error
e_t = np.linalg.norm(p[t+1] - hat_p)+ np.linalg.norm(v[t+1] - hat_v)
error.append(e_t)
# plot error evolution
plt.plot(error, label=f"Trajectory {trajectory_idx}")
plt.xlabel("t")
plt.ylabel("$e_q(t)$")
plt.legend()
plt.show()
# initialize array that stores error for all trajectories and timesteps
error = np.empty((num_trajectories, num_steps-1))
for trajectory_idx in range(num_trajectories):
# compute errors for each trajectory
x_trajectory = x_expert[trajectory_idx]
v = x_trajectory[:, 2:]
p = x_trajectory[:, :2]
a = u_expert[trajectory_idx]
# initial values
hat_v = v[0]
hat_p = p[0]
for t in range(0, num_steps-1):# we start predicting the for t=1
# Predict velocity v[t] + a[t]
hat_v = hat_v + Ts * a[t]
# predict position
hat_p = hat_p + Ts * hat_v + (Ts**2/2)*a[t]
# compute and store error
error[trajectory_idx, t] = np.linalg.norm(p[t+1] - hat_p)+ np.linalg.norm(v[t+1] - hat_v)
# Average over trajectories
avg_error = np.mean(error, axis=0)
# plot error evolution
plt.plot(avg_error)
plt.xlabel("t")
plt.ylabel("Average $e_q(t)$")
plt.show()
avg_error_time = np.mean(avg_error)
print(f"Average error over time is: {avg_error_time }" )
Average error over time is: 6.69229001081394
hat_p = np.empty((num_steps, 2))
trajectory_idx = 5
x_trajectory = x_expert[trajectory_idx]
v = x_trajectory[:, 2:]
p = x_trajectory[:, :2]
a = u_expert[trajectory_idx]
# initial values
# use the same starting point as the expert
hat_v = v[0]
hat_p[0] = p[0]
for t in range(0, num_steps-1):# we start predicting the for t=1
# Predict velocity v[t] + a[t]
hat_v = hat_v + Ts * a[t]
# predict position
hat_p[t+1] = hat_p[t] + Ts * hat_v + (Ts**2/2)*a[t]
plt.figure(figsize=(10, 10))
track.plot()
plt.grid(True)
plt.xlabel("x (m)")
plt.ylabel("y (m)")
# Plot positions
plt.plot(p[:, 0], p[:, 1], "-", label="true")
# plot predictions
plt.plot(hat_p[:, 0], hat_p[:, 1], "-", label="estimated")
plt.legend(loc="upper left", framealpha=1.0)
plt.show()
2 Model Learning¶
To mitigate the limitations of the ideal point mass model we can use observed trajectories to learn a more accurate model. To that end we consider a linear parameterization to predict the state at time $t+1$ given observations of the state and control input at time $t$,
\begin{equation}\tag{9} \hat{x\;} (t+1)= A x(t) + B u(t) \end{equation}In this equation the matrices $A$ and $B$ are parameters that we will learn. These parameters map the observed state $x(t)$ and the observed control input $u(t)$ into a prediction $\hat{x\;}(t+1)$ of the next state.
The prediction made by {9} $x(t+1)$ is, in general, different from the actual state $x(t+1)$ observed at time $t+1$. We therefore seek parameters $A$ and $B$ that make predictions and reality as close as possible. For a loss function $\ell(x(t+1),\hat{x\;}(t+1))$ this gives the empirical risk minimization (ERM) problem
\begin{align} (A^*, B^*) &~=~ argmin_{A,B}~ \frac{1}{Q}\sum_{q=1}^Q ~ \frac{1}{T}\sum_{t=1}^T ~ \ell\, \Big( \, x(t+1) , \, \hat{x\;} (t+1) \, \Big) \nonumber \\ &~=~ argmin_{A, B}~ \frac{1}{Q}\sum_{q=1}^Q ~ \frac{1}{T}\sum_{t=1}^T ~ \ell\, \Big( \, x(t+1) , \, A x(t) + B u(t) \, \Big) \tag{10} . \end{align}This problem differs from standard ERM in that we sum over trajectories and points in time. This is a minimal difference and the same techniques we use for solving standard ERM can be used to solve (10).
Task 4¶
Create an object to represent the linear dynamical system model in (9). Instantiate this object to represent a car navigating a circuit. This requires a state x(t) with the car’s position and velocity (2) and a control action u(t) representing the car’s acceleration (3).
Use the trajectories loaded in Task 1 to train a linear model for a car navigating a circuit. Use a MSE loss and remember to save some trajectories for validation. We recommend that you save a single trajectory for this purpose.
Report Deliverable: Report the training loss and the test loss.
class LinearModel(nn.Module):
def __init__(self, A, B):
"""
Linear model
param:A: torch tensor with dimensions dim_states x dim_states
param:B: torch tensor with dimensions dim_inputs x dim_states
"""
super().__init__()
# initialize weights and bias
self.A = nn.parameter.Parameter(A)
self.B = nn.parameter.Parameter(B)
def forward(self, x, u):
"""
Recieves a batch num_samples current states and inputs
x(t) has dims num_samples x dim_states, u(t) has dims num_samples x dim_inputs
returns predicted states
x[t+1] = Ax[t] + B
"""
return x @ self.A + u@ self.B
n_steps = 10000
lr = 5e-4
# convert to tensors
x = torch.from_numpy(x_expert).float()
u = torch.from_numpy(u_expert).float()
# save fith trajectory for validation
x_train = torch.cat((x[:4], x[5:]))
u_train = torch.cat((u[:4], u[5:]))
x_test = x[5]
u_test = u[5]
n_states = 4 # p and v
n_inputs = 2# acc
A_init = torch.zeros(n_states, n_states, device=device, requires_grad=True)
B_init = torch.zeros(n_inputs, n_states, device=device, requires_grad=True)
model = LinearModel(A_init, B_init)
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
for i in range(n_steps):
optimizer.zero_grad()
# compute predictions for t+1 = 1..T
#we dont need to compute predictions for last state
# and thus use state[:, :-1]
x_pred = model(x_train[:, :-1], u_train[:, :-1])
loss = (x_train[:, 1:] - x_pred).pow(2).mean()
loss.backward()
optimizer.step()
if iprint(f"Step: {i} \t Train Loss: {loss.item()}")
Step: 0 Train Loss: 57.14374542236328 Step: 1000 Train Loss: 0.02006038837134838 Step: 2000 Train Loss: 0.0004425234510563314 Step: 3000 Train Loss: 0.000254583457717672 Step: 4000 Train Loss: 0.00025277776876464486 Step: 5000 Train Loss: 0.0002527607139199972 Step: 6000 Train Loss: 0.00025276021915487945 Step: 7000 Train Loss: 0.0002527602482587099 Step: 8000 Train Loss: 0.00025276027736254036 Step: 9000 Train Loss: 0.00025276027736254036
# Compute test loss
x_pred_test = model(x_test[:-1], u_test[:-1])
test_loss = (x_test[1:] - x_pred_test).pow(2).mean()
print(f"Test Loss: {test_loss}")
Test Loss: 0.00024410879996139556
The test loss that we observe in Task 4 is smaller than the average error that we observed in Task 2. This indicates that the learned model is better than the ideal point mass model.
Observe that the loss computed in Task 4 compares the model’s state predictions in (9) with observed states. As we did in Section 1.3 we may also consider predicted trajectories. This are defined as
\begin{equation}\tag{11} x(t+1) = A\hat x(t) + Bu(t). \end{equation}In both, (9) and (11), we apply observed actions u(t). The difference is that in (9) we start from the observed state x(t) whereas in (11) we start from the predicted state xˆ(t). Concatenating state predictions results in the accumulation of errors. We evaluate this next.
Task 5¶
As in Task 3 we compare trajectories $x_q(t)$ with model predicted trajectories $\hat{x\;}_q(t)$. We do that by evaluating the position error $e_q(t)$. Plot trajectory errors $e_q(t)$ for test trajectories. If you have more than one test trajectory, compute and plot the average error.
Plot the positions $p_q(t)$ along with the predicted positions $\hat{p\;}(t)$ for a test trajectory.
Compared the learned model with the ideal point mass model.
Report Deliverables:
- Plot of test trajectory errors $e_q(t)$
- Plot of an observed trajectory and the corresponding learned model predicted trajectory
- Paragraph comparing ideal point mass model with learned model.
x_pred_test = torch.empty((num_steps, 4))
x_pred_test[0] = x_test[0]
for t in range(num_steps-1):
x_pred_test[t+1] = model(x_pred_test[t], u_test[t])
x_pred_test = x_pred_test[1:]
# array that stores error for all trajectories and timesteps
error = np.empty(num_steps-1)
error = x_test[1:] - x_pred_test
error_p = error[:, :2]
error_v = error[:, :2]
# compute error for each timestep
error = torch.linalg.norm(error_v, axis=1)+torch.linalg.norm(error_v, axis=1)
# plot error evolution
plt.plot(error.detach().numpy())
plt.xlabel("t")
plt.ylabel("$e_q(t)$")
plt.show()
avg_error_time = torch.mean(error).detach().numpy()
print(f"Average error over time is: {avg_error_time }" )
Average error over time is: 1.3796671628952026
x_test_np = x_test.detach().numpy()
x_pred_test_np = x_pred_test.detach().numpy()
plt.figure(figsize=(10, 10))
track.plot()
plt.grid(True)
plt.xlabel("x (m)")
plt.ylabel("y (m)")
plt.plot(x_test[:, 0], x_test[:, 1], "-", label="true")
plt.plot(x_pred_test_np[:, 0], x_pred_test_np[:, 1], ".", label="estimated")
plt.legend(loc="upper left", framealpha=1.0)
plt.show()
The main outcome of Task 5 is Figure 3. We see that the trajectories predicted by the learned model are more accurate. We still observe some deviation between the actual trajectory and the trajectory predicted by the model. As we said, this is a fundamental feature of dynamical systems. A more accurate model mitigates but does not eliminate the accumulation of errors. But in this case the deviation is much smaller. This learned model is a good starting point for controlling the car. We explain how to do this in the next section.
A_learnt = model.A.detach().numpy()
B_learnt = model.B.detach().numpy()
print("A =")
print(np.array2string(A_learnt, precision=2, suppress_small=True))
print("B =")
print(np.array2string(B_learnt, precision=2, suppress_small=True))
A = [[ 1. -0. -0. 0. ] [-0. 1. 0. -0. ] [ 0.05 -0. 0.99 0. ] [-0. 0.05 0. 1. ]] B = [[ 0. -0. 0.05 -0. ] [ 0. 0. -0. 0.05]]
This time around our final position is much closer to the expert trajectory but still not perfect. We would deviate further if we did more than 1 lap.
3 Dynamical Systems Control¶
We are now given control of the car. This entails finding a sequence of actions $ u(t)$ that navigates the circuit. This means that our goal is to find a policy $\pi$ that maps states $ x(t)$ to control actions $ u(t)$,
\begin{equation}\label{eqn_control_policy}\tag{12} \pi : u(t) = \pi( x(t)), \end{equation}such that the state trajectory $ x(t)$ of the dynamical system satisfies some given specifications.
The particular control problem we consider is to follow a reference trajectory $ x_{R}(t)$. To formulate this problem mathematically consider a dynamical system in which the state at time $t=0$ is given and set to $ x(0)= x_0$. We say that $ x_0$ is the initial condition of the dynamical system. Starting from this initial condition we generate a trajectory $ x_\pi(t)$ by continuous execution of the policy $\pi$. In this trajectory the control action at time $t$ is $ u(t)=\pi( x_\pi(t))$. The trajectory $ x_\pi(t)$ is generated by the recursion,
\begin{equation}\label{eqn_control_time_t}\tag{13} x_\pi(t+1) ~=~ f \Big( x_\pi(t), \pi \big( x_\pi(t) \big) \Big). \end{equation}To follow the reference trajectory $ x_{R}(t)$ we introduce a loss $\ell$ to compare the reference trajectory $ x_{R}(t)$ with trajectories $ x_\pi(t)$ generated by different policies $\pi$. We then search of the optimal policy,
\begin{equation}\label{eqn_control_definition}\tag{14} \pi^* = \text{argmin}_\pi \frac{1}{T} \sum_{t=1}^{T} \ell\, \Big( \, x_{R}(t) , \, x_\pi(t) \, \Big) . \end{equation}This optimization problem appears similar to empirical risk minimization problems we encountered in earlier labs. It appears in particular, similar to the linear dynamical system equations. This similarity is misleading. The problems are, in fact, fundamentally different.
The fundamental difference between (14) and standard risk minimization is that the goal in the control formulation is to find policies $\pi( x(t))$ that generate suitable trajectories and in a dynamical system the choice of action at time $t$ influences the trajectory of the system all future times $s>t$. Conversely, the state of the system at time $t$ depends on all of the actions that were chosen at times $s<t$.
To explain this point better, start at time $t=0$ and recall that the initial condition $ x_\pi(0)= x_0$ is known. For this initial conidition and choosing actions under policy $\pi$, the control action at time $t=0$ is $ u(0) = \pi( x_\pi(0)) = \pi( x_0)$. We now have the state $ x_\pi(0)$ given and the control policy $ u(0)$ chosen. Since the evolution of the dynamical system follows the dynamical system equations, the state $ x_\pi(1)$ at time $t=1$ is
\begin{equation}\label{eqn_control_time_1}\tag{15} x_\pi(1) ~=~ f \Big( x(0), \pi \big( x(0) \big) \Big) ~=~ f \Big( x_0, \pi \big( x_0 \big) \Big). \end{equation}We emphasize that in (15) the state as $ x_\pi(1)$ depends on the policy. Different policy choices $\pi$ result in different states $ x_\pi(1)$.
We now consider the next control action which is to be taken at time $t=1$. Since the state of the system is $ x_\pi(1)$, the choice of action under policy $\pi$ is $ u(1) = \pi( x_\pi(1))$. With state given and policy chosen, the dynamical system now moves to the state
\begin{equation}\label{eqn_control_time_2}\tag{16} x_\pi(2) ~=~ f \Big( x_\pi(1), \pi \big( x_\pi(1) \big) \Big). \end{equation}This state depends on the control action $ u(1) = \pi( x_\pi(1))$. It also depends on the choice of control action $ u(0) = \pi( x(0))$ because the state $ x_\pi(1)$ depends on this choice.
We repeat the procedure at time $t=2$. The system is in state $ x_\pi(2)$, we choose action $ u(2) = \pi( x_\pi(2))$ and the dynamical system transitions into state
\begin{equation}\label{eqn_control_time_3}\tag{17} x_\pi(3) ~=~ f \Big( x_\pi(2), \pi \big( x_\pi(2) \big) \Big). \end{equation}We see here that the state of the system at time $t=3$ depends on all of the three actions chosen by the policy $\pi$ at times $s<3$. Conversely, we see that the action chosen at time $t=0$ affects all of the states observed at times $s>0$. This propagation of dependencies between states and actions continues as we keep making control choices $ u(t)=\pi( x_\pi(t))$ that depend on states $ x_\pi(t)$ that depend on all of the control choices $ u(s)=\pi( x_\pi(s))$ taken at previous times $s<t$.
To emphasize that control problem (14) is a problem in which policy choices $\pi( x_\pi(t))$ are coupled across time we rewrite it to make the evolution of the dynamical system explicit,
$$ \def \bsa {\hspace{1.99pt}} \def \bsb {\hspace{11pt}} $$\begin{align}\label{eqn_optimal_control} \pi^* ~=~ & \text{argmin}_\pi ~ \frac{1}{T} \sum_{t=1}^{T} \ell, \Big( \, x_{R}(t) , \, x_\pi(t) \, \Big) , \\ & \text{\ with \bsb} ~ x_\pi (t+1) = f \Big( x_\pi(t), \pi \big( x_\pi(t) \big) \Big) , \\ & \phantom{\text{with}} ~ x_\pi(0) = x_0 ~. \phantom{\Big(}\tag{18} \end{align}3.1 Model Based Optimization¶
To find the optimal control policy $ \pi^*$ we need to solve the optimization problem in (18). In this problem, control actions $\pi( x_\pi(t_1))$ and $\pi( x_\pi(t_2))$ chosen at different times are coupled. They both influence the state of the dynamical system at times $s>t_1$ and $s>t_2$. This coupling notwithstanding, we can still compute gradients of the loss to implement a descent algorithm to find the optimal control policy $\pi^*$.
Here we encounter a roadblock: The true behavior of the dynamical system is unknown. We circumvent this roadblock with the used of a model. This could be the model based on our understanding of the laws of motion or the model we learned. We use the latter which turned out to be more accurate.
Consider then the learned dynamical system when controlled by the policy $\pi$. This dynamical system evolves according to
\begin{equation}\label{eqn_mode_based_control}\tag{19} \hat{x}_\pi(t+1) = A \hat{x}_\pi(t) + B \pi \big( \hat{x}_\pi(t) \big) . \end{equation}In (19) we use $\hat{x}_\pi(t)$ to denote the state of the system. This notation emphasizes that the trajectory results from the execution of actions $\pi(\hat{x}_\pi(t))$ on the model in (11).
We search now for policies that are optimal at controlling the model. They are given by the solution of the optimization problem,
\begin{align}\label{eqn_model_based_optimization} \pi^* ~=~ & \text{argmin}_\pi ~ \frac{1}{T} \sum_{t=1}^{T} \ell\, \Big( \, x_{R}(t) , \, x_\pi(t) \, \Big) , \\ & \text{\bsa with \bsb} ~ \hat{x}_\pi (t+1) = A \hat{x}_\pi(t) + B \pi \big( \hat{x}_\pi(t) \big) , \\ & \phantom{\text{\bsa with \bsb}} ~ \hat{x}_\pi(0) = x_0 ~. \phantom{\Big(}\tag{20} \end{align}The problems in (18) and (20) are optimal control problems for different dynamical systems. The formulation in the equations of optimal control (18) is on the actual dynamical system that determines the car’s motion. We can’t solve this problem because we do not know the function $f$. The model based optimization problem is an optimal control problem on the learned model of the dynamical system. We can solve this problem because we have access to $ A$ and $ B$.
Task 6¶
Solve the model based optimization problem in (20) for the dynamical system defined by the parameters $ A$ and $ B$ learned in Task 5. Use as a reference trajectory one of the trajectories that you used for validations in Task 5. Use the $L2$ loss as figure of merit.
The solution of this optimization problem is a sequence of actions $ u(0:T-1)$ with $ u(t) = \pi^*(\hat{x}_\pi(t))$. $\blacksquare$
Report Deliverable: Optimal loss
def rollout(A, B, u, x0):
"""
Integrate the system dynamics to predict the trajectory.
Args:
A: (4, 4) The state transition matrix.
B: (2, 4) The input matrix.
u: (T, 2) The control inputs.
x0: (4,) The initial state.
"""
x = torch.zeros((len(u) + 1, A.shape[0]), device=A.device)
x[0] = x0
for i in range(len(u)):
x[i + 1] = x[i] @ A + u[i] @ B
# return the predicted trajectory
return x[1:]
expert_idx = 5
x_expert = x_expert[expert_idx]
A = model.A.detach()
B = model.B.detach()
x0 = torch.tensor(x_expert[0]).float()
x_ref = torch.tensor(x_expert[1:]).float()
u_naive = torch.zeros(len(x_ref), 2, requires_grad=True).float()
optimizer = torch.optim.Adam([u_naive], lr=1.0)
for i in range(1000):
optimizer.zero_grad()
x_pred = rollout(A, B, u_naive, x0)
# penalize the difference between the predicted and reference trajectory
loss = (x_pred - x_ref).pow(2).mean()
# add loss to progress bar
if iprint(f"Step: {i} \tLoss: {loss.item()}")
loss.backward()
optimizer.step()
Step: 0 Loss: 514.6283569335938 Step: 100 Loss: 8.27013111114502 Step: 200 Loss: 3.4032528400421143 Step: 300 Loss: 1.4440983533859253 Step: 400 Loss: 0.7261592745780945 Step: 500 Loss: 0.43223798274993896 Step: 600 Loss: 0.28022682666778564 Step: 700 Loss: 0.18678425252437592 Step: 800 Loss: 0.12498848140239716 Step: 900 Loss: 0.0833144560456276
3.2 Evaluation of Control Policies¶
Solving optimal control on a model of a real system is not the same as solving optimal control on the real system. In the case of navigating a circuit, we saw in Task 5. that the learned model is accurate. We therefore expect that the solution of (20) is a good approximation to solving the optimal control problem (18). We test this expectation here.
To test this expectation we execute the control actions $u(0:T-1)$ that we found as solutions of (20) in Task 6 in the actual dynamical system $f$. Since we do not have access to an actual car, we will test actions $u(0:T-1)$ in a car simulator.
Task 7¶
Execute the control actions $u(0:T-1)$ found as solutions in Task 6. Evaluate and plot the $L2$ loss between the generated trajectory and the reference trajectory. $\blacksquare$
The main conclusion is that model based optimization fails at controlling the car. The challenge is, as usual, that errors accumulate over time.
A successful controller requires the introduction of a mechanism to correct the accumulation of errors. This is closed loop control, which we study in Lab 8B.
To test your controller we implement a class called Simulator
for you. By iteratively calling step
you can simulate on the ground truth system.
step
takes two arguments:
- an array of length 4 with the current state of the system,
- and an array of length 2 with the control inputs to apply.
Report Deliverable: Plot of L1 loss as a function of time
simulator = Simulator()
t = np.arange(0, Config.duration, Config.time_step)
x_estimate = np.zeros_like(x_expert)
# Use the starting point as the initial estimate
x_estimate[0] = x_expert[0]
for i in range(len(t) - 1):
x_estimate[i + 1] = simulator.step(x_estimate[i], u_naive[i].detach().cpu().numpy())
# Plot estimated trajectory
plt.figure(figsize=(10, 10))
track.plot(plt.gca())
plt.grid(True)
plt.xlabel("x (m)")
plt.ylabel("y (m)")
plt.plot(x_expert[:, 0], x_expert[:, 1], "-", label="true")
plt.plot(
x_estimate[:, 0],
x_estimate[:, 1],
"-",
label="estimated",
)
plt.legend(loc="upper left", framealpha=1.0)
plt.show()