Download the files for Lab 5A from the following links:
Instructions on how to download and use Jupyter Notebooks can be found here. You can find a static version of the notebook below.
Lab 5A: Dynamical Systems¶
Setup¶
!git clone https://github.com/Damowerko/ese2000-dynamical-systems.git
import sys
sys.path.append('./ese2000-dynamical-systems/')
Cloning into 'ese2000-dynamical-systems'... remote: Enumerating objects: 123, done. remote: Counting objects: 100% (123/123), done. remote: Compressing objects: 100% (99/99), done. remote: Total 123 (delta 37), reused 108 (delta 22), pack-reused 0 Receiving objects: 100% (123/123), 5.23 MiB | 3.20 MiB/s, done. Resolving deltas: 100% (37/37), done.
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import numpy.typing as npt
import torch
from tqdm.notebook import trange
from ese2000_dynamical.config import Config
from ese2000_dynamical.track import load_track
from ese2000_dynamical.simulator import Simulator, dynamics_ca
from torch import nn
data_path = Path("./ese2000-dynamical-systems/data")
# device for pytorch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
/Users/juanelenter/miniconda3/envs/ese2000/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Dynamical Systems¶
A dynamical system is a system that evolves in time. will consider here a state vector $x(t)$ and a control action $u(t)$. In a dynamical system the state at time $t+1$ is given by
\begin{equation} x(t+1) = f ( x(t), u(t) ). \end{equation}The interpretation of the previous equation is that the state of the system $x(t)$ and the control action $u(t)$ determine the state of the system at time $t+1$. Our goal is to choose control actions $u(t)$ to manipulate the state trajectory $x(t)$ to satisfy some stated goal.
We point out that the system is time invariant, deterministic, and memoryless. Dynamical systems may also have memory, be stochastic, or time varying.
A simple characterization of a car is to model it as an ideal point mass whose acceleration we control. Thus, the state of the system at time $t$ includes the position $ p(t)$ and the velocity v(t) of the car. Positions and velocities are both vectors with horizontal and vertical coordinates. We therefore write the state of the car as,
\begin{equation} x(t) ~=~ [ \, p(t) ; \, v(t) \, ] ~=~ [ \, p_{H(t)} ; \, p_{V(t)} ; \, v_{H(t)} ; \, v_{V(t)} \, ]. \end{equation}The control action of the car is the acceleration $ a(t)$. As is the case of positions and velocities, the acceleration contains horizontal and vertical components. We therefore write the control action as
\begin{equation} u(t) ~=~ a(t) ~=~ [\, a_{H(t)} ; \, a_{V(t)} \, ]. \end{equation}As we have assumed that the car is an ideal point mass, we can readily write the dynamical system using the laws of motion. To do so we just need to specify the sampling time $T_s$ that elapses between subsequent time instances and write,
\begin{align} & p(t+1) ~=~ p(t) + T_s v(t) + \frac{T_s^2}{2} a(t) ,\nonumber \\ & v(t+1) ~=~ v(t) + T_s a(t). \end{align}This is true if the acceleration $ a(t)$ is maintained between times $t T_s $ and $(t+1)T_s$. It just follows from the definitions of velocity and acceleration.
In reality, the dynamics of a car are more complex than the dynamics of an ideal point mass. The laws of motion in previous equation are therefore a model of the real dynamical system.
Task 1¶
Load the data. This data contains $Q=10$ trajectories, which correspond to different laps of the circuit shown in Figure below. For each of these trajectories we have positions $ p_q(t)$, velocities $ v_q(t)$, and accelerations $ a_q(t)$ for times $t$ ranging from $t=0$ to $t=T=10$. Positions are measured in meters ($m$), velocities in meters per second ($m/s$), and accelerations in meters per second squared ($m/s^2$). The sampling time is $T_s = 0.1s$.
Plot the positions $ p_q(t)$ for some chosen trajectories.
x_expert = np.load(data_path / "states.npy")
u_expert = np.load(data_path / "inputs.npy")
track = load_track(data_path / "track.npz")
plt.figure(figsize=(10, 5))
plt.gca().set_aspect("equal")
track.plot(plt.gca())
plt.title("Expert Trajectories")
for x_ in x_expert:
plt.plot(x_[:, 0], x_[:, 1], "-", linewidth=1.0)
We expect the previous model to be a reasonable approximation of the actual car dynamics. Let us test this expectation.
As a first figure of merit we consider the state predictions that the model makes. For a given trajectory $q$ and a given point in time $t$ we consider the observed state of the car $ x_q(t)$ and the control action $ u_q(t)$. We then use the ideal point mass model to predict the state at time $t+1$,
\begin{align} &\hat{p}_q(t+1) ~=~ p_q(t) + T_s v_q(t) + \frac{T_s^2}{2} a_q(t), \nonumber \\ &\hat{v}_q(t+1) ~=~ v_q(t) + T_s a_q(t). \end{align}We compare predicted positions $\hat{p}_q(t+1)$ and predicted velocities $\hat{v}_q(t+1)$ with observed positions $ p_q(t+1)$ and observed velocities $ v_q(t+1)$ to test the accuracy of the model.
Constant Acceleration Model¶
Let’s try see how well the constant acceleration model reflects the behaviour above.
Task 2¶
For a trajectory $ x_q(t)$ define the state estimation errors as
\begin{align} e_q(t) = \| p_q(t) – \hat{p}_q(t)\| + \| v_q(t) – \hat{v}_q(t)\|. \end{align}Plot trajectory errors $e_q(t)$ for some trajectories. Compute and plot the average error $ \bar{e}_q(t)=(1/Q)\sum_{q=1}^Q e_q(t)$. Compute the average error over time $\bar{\bar{e}}_q(t)=(1/T)\sum_{t=1}^T \bar{e}_q(t)$.
# plot error for some trajectories
trajectories = [1, 3, 5] # choose some trajectories
T = Config.duration
Ts = Config.time_step
num_steps = int(T/Ts)
for trajectory_idx in trajectories:
x_trajectory = x_expert[trajectory_idx]
v = x_trajectory[:, 2:]
p = x_trajectory[:, :2]
a = u_expert[trajectory_idx]
error = []
for t in range(1, num_steps-1):# we start predicting the for t=1
# Predict velocity v[t] + a[t]
hat_v = v[t] + Ts * a[t]
# predict position
hat_p = p[t] + Ts * v[t] + (Ts**2/2)*a[t]
# compute error
e_t = np.linalg.norm(p[t+1] - hat_p)+ np.linalg.norm(v[t+1] - hat_v)
error.append(e_t)
# plot error evolution
plt.plot(error, label=f"Trajectory {trajectory_idx}")
plt.xlabel("t")
plt.ylabel("$e_q(t)$")
plt.legend()
plt.show()
# Mean error
num_trajectories = x_expert.shape[0]
avg_error = []
for t in range(1, num_steps-1):# we start predicting the for t=1
error_t = 0
for trajectory_idx in range(num_trajectories):
x_trajectory = x_expert[trajectory_idx]
v = x_trajectory[:, 2:]
p = x_trajectory[:, :2]
a = u_expert[trajectory_idx]
# Predict velocity v[t] + a[t]
hat_v = v[t] + Ts * a[t]
# predict position
hat_p = p[t] + Ts * v[t] + (Ts**2/2)*a[t]
# compute error and add it to total error
error_t += np.linalg.norm(p[t+1] - hat_p)+ np.linalg.norm(v[t+1] - hat_v)
avg_error.append(error_t/num_trajectories)
# plot error evolution
plt.plot(avg_error)
plt.xlabel("t")
plt.ylabel("Average $e_q(t)$")
plt.show()
avg_error_time = np.mean(avg_error)
print(f"Average error over time is: {avg_error_time }" )
Average error over time is: 0.06329752727784835
Model Predicted Trajectories¶
The errors in the previous Task are small. This is an indication that the ideal point mass model is accurate. To further test the accuracy of this model we compare observed trajectories with model predicted trajectories. This entails selecting the actions $ a_q(t)$ of a particular trajectory to generate the model predicted trajectory as
\begin{align} &\hat{p}_q(t+1) ~=~ \hat{p}_q(t) + T_s \hat{v}_q(t) + \frac{T_s^2}{2} a_q(t), \nonumber \\ &\hat{v}_q(t+1) ~=~ \hat{v}_q(t) + T_s a_q(t). \end{align}Observe how in this expression we use the observed accelerations $ a_q(t)$ as control actions to generate the trajectory $\hat{x}(t) = [ \hat{p}(t) ; \hat{v}(t) ]$ predicted by the model. This differs from the previous method in which state predictions at time $t+1$ depend on the state that is observed at time $t$. In this model we compound predictions.
Comparing $\hat{x}(t)$ with $ x(t)$ gives an alternative measurement of the accuracy of the ideal point mass model.
Task 3¶
To compare trajectories $x_q(t)$ with model predicted trajectories $\hat{x}(t)$ we define the position errors
\begin{align} e_q(t) = \| p_q(t) – \hat{p}_q(t+1)\| . \end{align}Plot trajectory errors $e_q(t)$ for some trajectories. Compute and plot the average error $\bar{e}_q(t)=(1/Q)\sum_{q=1}^Q e_q(t)$.
Plot the positions $p_q(t)$ along with the predicted positions $\hat{p}(t)$ for some chosen trajectories.
These plots are likely unexpected. Comment.
# plot error for some trajectories
trajectories = [1, 3, 5] # choose some trajectories
T = Config.duration
Ts = Config.time_step
num_steps = int(T/Ts)
for trajectory_idx in trajectories:
x_trajectory = x_expert[trajectory_idx]
v = x_trajectory[:, 2:]
p = x_trajectory[:, :2]
a = u_expert[trajectory_idx]
error = []
# initial values
hat_v = v[0]
hat_p = p[0]
for t in range(1, num_steps-1):# we start predicting the for t=1
# Predict velocity v[t] + a[t]
hat_v = hat_v + Ts * a[t]
# predict position
hat_p = hat_p + Ts * hat_v + (Ts**2/2)*a[t]
# compute error
e_t = np.linalg.norm(p[t+1] - hat_p)+ np.linalg.norm(v[t+1] - hat_v)
error.append(e_t)
# plot error evolution
plt.plot(error, label=f"Trajectory {trajectory_idx}")
plt.xlabel("t")
plt.ylabel("$e_q(t)$")
plt.legend()
plt.show()
# array that stores error for all trajectories and timesteps
error = np.empty((num_trajectories, num_steps-1))
for trajectory_idx in range(num_trajectories):
#compute errors for each trajectory
x_trajectory = x_expert[trajectory_idx]
v = x_trajectory[:, 2:]
p = x_trajectory[:, :2]
a = u_expert[trajectory_idx]
# initial values
hat_v = v[0]
hat_p = p[0]
for t in range(0, num_steps-1):# we start predicting the for t=1
# Predict velocity v[t] + a[t]
hat_v = hat_v + Ts * a[t]
# predict position
hat_p = hat_p + Ts * hat_v + (Ts**2/2)*a[t]
# compute and store error
error[trajectory_idx, t] = np.linalg.norm(p[t+1] - hat_p)+ np.linalg.norm(v[t+1] - hat_v)
# Average over trajectories
avg_error = np.mean(error, axis=0)
# plot error evolution
plt.plot(avg_error)
plt.xlabel("t")
plt.ylabel("Average $e_q(t)$")
plt.show()
avg_error_time = np.mean(avg_error)
print(f"Average error over time is: {avg_error_time }" )
Average error over time is: 6.69229001081394
hat_p = np.empty((num_steps, 2))
trajectory_idx = 5
x_trajectory = x_expert[trajectory_idx]
v = x_trajectory[:, 2:]
p = x_trajectory[:, :2]
a = u_expert[trajectory_idx]
# initial values
# use the same starting point as the expert
hat_v = v[0]
hat_p[0] = p[0]
for t in range(0, num_steps-1):# we start predicting the for t=1
# Predict velocity v[t] + a[t]
hat_v = hat_v + Ts * a[t]
# predict position
hat_p[t+1] = hat_p[t] + Ts * hat_v + (Ts**2/2)*a[t]
plt.figure(figsize=(10, 10))
track.plot()
plt.grid(True)
plt.xlabel("x (m)")
plt.ylabel("y (m)")
# Plot positions
plt.plot(p[:, 0], p[:, 1], "-", label="true")
# plot predictions
plt.plot(hat_p[:, 0], hat_p[:, 1], "-", label="estimated")
plt.legend(loc="upper left", framealpha=1.0)
plt.show()
Model Learning¶
To mitigate the limitations of the ideal point mass model we can use observed trajectories to learn a more accurate model. To that end we consider a linear parameterization to predict the state at time $t+1$ given observations of the state and control input at time $t$,
\begin{equation} \hat{x\;} (t+1)= A x(t) + B u(t) \end{equation}In this equation the matrices $A$ and $B$ are parameters that we will learn. These parameters map the observed state $x(t)$ and the observed control input $u(t)$ into a prediction $\hat{x\;}(t+1)$ of the next state.
The prediction $x(t+1)$ is, in general, different from the actual state $x(t+1)$ observed at time $t+1$. We therefore seek parameters $A$ and $B$ that make predictions and reality as close as possible. For a loss function $\ell(x(t+1),\hat{x\;}(t+1))$ this gives the empirical risk minimization (ERM) problem
\begin{align} (A^*, B^*) &~=~ argmin_{A,B}~ \frac{1}{Q}\sum_{q=1}^Q ~ \frac{1}{T}\sum_{t=1}^T ~ \ell\, \Big( \, x(t+1) , \, \hat{x\;} (t+1) \, \Big) \nonumber \\ &~=~ argmin_{A, B}~ \frac{1}{Q}\sum_{q=1}^Q ~ \frac{1}{T}\sum_{t=1}^T ~ \ell\, \Big( \, x(t+1) , \, A x(t) + B u(t) \, \Big) . \end{align}This problem differs from standard ERM in that we sum over trajectories and points in time. This is a minimal difference and the same techniques we use for solving standard ERM can be used to solve it.
Task 4¶
Create an object to represent the linear dynamical system model. Instantiate this object to represent a car navigating a circuit. Use the trajectories loaded in Task 1 to train a linear model for a car navigating a circuit. Use a MSE loss and remember to save some trajectories for validation. We recommend that you save a single trajectory for this purpose.
Report the training loss and the test loss.
class LinearModel(nn.Module):
def __init__(self, A, B):
"""
Linear model
param:A: torch tensor with dimensions dim_states x dim_states
param:B: torch tensor with dimensions dim_inputs x dim_states
"""
super().__init__()
# initialize weights and bias
self.A = nn.parameter.Parameter(A)
self.B = nn.parameter.Parameter(B)
def forward(self, x, u):
"""
Recieves a batch num_samples current states and inputs
x(t) has dims num_samples x dim_states, u(t) has dims num_samples x dim_inputs
returns predicted states
x[t+1] = Ax[t] + B
"""
return x @ self.A + u@ self.B
n_steps = 10000
lr = 5e-4
# convert to tensors
x = torch.from_numpy(x_expert).float()
u = torch.from_numpy(u_expert).float()
# save fith trajectory for validation
x_train = torch.cat((x[:4], x[5:]))
u_train = torch.cat((u[:4], u[5:]))
x_test = x[5]
u_test = u[5]
n_states = 4 # p and v
n_inputs = 2# acc
A_init = torch.zeros(n_states, n_states, device=device, requires_grad=True)
B_init = torch.zeros(n_inputs, n_states, device=device, requires_grad=True)
model = LinearModel(A_init, B_init)
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
for i in range(n_steps):
optimizer.zero_grad()
# compute predictions for t+1 = 1..T
#we dont need to compute predictions for last state
# and thus use state[:, :-1]
x_pred = model(x_train[:, :-1], u_train[:, :-1])
loss = (x_train[:, 1:] - x_pred).pow(2).mean()
loss.backward()
optimizer.step()
if iprint(f"Step: {i} \t Train Loss: {loss.item()}")
Step: 0 Train Loss: 57.14374542236328 Step: 1000 Train Loss: 0.02006038837134838 Step: 2000 Train Loss: 0.0004425234510563314 Step: 3000 Train Loss: 0.00025458348682150245 Step: 4000 Train Loss: 0.00025277776876464486 Step: 5000 Train Loss: 0.0002527607139199972 Step: 6000 Train Loss: 0.00025276021915487945 Step: 7000 Train Loss: 0.0002527602482587099 Step: 8000 Train Loss: 0.00025276027736254036 Step: 9000 Train Loss: 0.00025276027736254036
# Compute test loss
x_pred_test = model(x_test[:-1], u_test[:-1])
test_loss = (x_test[1:] - x_pred_test).pow(2).mean()
print(f"Test Loss: {test_loss}")
Test Loss: 0.00024410879996139556
Task 5¶
As in Task 3 we compare trajectories $x_q(t)$ with model predicted trajectories $\hat{x\;}_q(t)$. We do that by evaluating the position error $e_q(t)$. Plot trajectory errors $e_q(t)$ for test trajectories. If you have more than one test trajectory, compute and plot the average error.
Plot the positions $p_q(t)$ along with the predicted positions $\hat{p\;}(t)$ for a test trajectory.
Compared the learned model with the ideal point mass model.
x_pred_test = torch.empty((num_steps, 4))
x_pred_test[0] = x_test[0]
for t in range(num_steps-1):
x_pred_test[t+1] = model(x_pred_test[t], u_test[t])
x_pred_test = x_pred_test[1:]
# array that stores error for all trajectories and timesteps
error = np.empty(num_steps-1)
error = x_test[1:] - x_pred_test
error_p = error[:, :2]
error_v = error[:, :2]
# compute error for each timestep
error = torch.linalg.norm(error_v, axis=1)+torch.linalg.norm(error_v, axis=1)
# plot error evolution
plt.plot(error.detach().numpy())
plt.xlabel("t")
plt.ylabel("$e_q(t)$")
plt.show()
avg_error_time = torch.mean(error).detach().numpy()
print(f"Average error over time is: {avg_error_time }" )
Average error over time is: 1.3796671628952026
x_test_np = x_test.detach().numpy()
x_pred_test_np = x_pred_test.detach().numpy()
plt.figure(figsize=(10, 10))
track.plot()
plt.grid(True)
plt.xlabel("x (m)")
plt.ylabel("y (m)")
plt.plot(x_test[:, 0], x_test[:, 1], "-", label="true")
plt.plot(x_pred_test_np[:, 0], x_pred_test_np[:, 1], ".", label="estimated")
plt.legend(loc="upper left", framealpha=1.0)
plt.show()
The main outcome of Task 5 is Figure 3. We see that the trajectories predicted by the learned model are more accurate. We still observe some deviation between the actual trajectory and the trajectory predicted by the model. As we said, this is a fundamental feature of dynamical systems. A more accurate model mitigates but does not eliminate the accumulation of errors. But in this case the deviation is much smaller. This learned model is a good starting point for controlling the car. We explain how to do this in the next section.
A_learnt = model.A.detach().numpy()
B_learnt = model.B.detach().numpy()
print("A =")
print(np.array2string(A_learnt, precision=2, suppress_small=True))
print("B =")
print(np.array2string(B_learnt, precision=2, suppress_small=True))
A = [[ 1. -0. -0. 0. ] [-0. 1. 0. -0. ] [ 0.05 -0. 0.99 0. ] [-0. 0.05 0. 1. ]] B = [[ 0. -0. 0.05 -0. ] [ 0. 0. -0. 0.05]]
This time around our final position is much closer to the expert trajectory but still not perfect. We would deviate further if we did more than 1 lap.
Dynamical Systems Control¶
We are now given control of the car. This entails finding a sequence of actions $ u(t)$ that navigates the circuit. This means that our goal is to find a policy $\pi$ that maps states $ x(t)$ to control actions $ u(t)$,
\begin{equation} \pi : u(t) = \pi( x(t)), \end{equation}such that the state trajectory $ x(t)$ of the dynamical system satisfies some given specifications.
The particular control problem we consider is to follow a reference trajectory $ x_{R}(t)$. To formulate this problem mathematically consider a dynamical system in which the state at time $t=0$ is given and set to $ x(0)= x_0$. We say that $ x_0$ is the initial condition of the dynamical system. Starting from this initial condition we generate a trajectory $ x_\pi(t)$ by continuous execution of the policy $\pi$. In this trajectory the control action at time $t$ is $ u(t)=\pi( x_\pi(t))$. The trajectory $ x_\pi(t)$ is generated by the recursion,
\begin{equation} x_\pi(t+1) ~=~ f \Big( x_\pi(t), \pi \big( x_\pi(t) \big) \Big). \end{equation}To follow the reference trajectory $ x_{R}(t)$ we introduce a loss $\ell$ to compare the reference trajectory $ x_{R}(t)$ with trajectories $ x_\pi(t)$ generated by different policies $\pi$. We then search of the optimal policy,
\begin{equation} \pi^* = \text{argmin}_\pi \frac{1}{T} \sum_{t=1}^{T} \ell\, \Big( \, x_{R}(t) , \, x_\pi(t) \, \Big) . \end{equation}This optimization problem appears similar to empirical risk minimization problems we encountered in earlier labs. It appears in particular, similar to the linear dynamical system equations. This similarity is misleading. The problems are, in fact, fundamentally different.
The fundamental differenceis that the goal in the control formulation is to find policies $\pi( x(t))$ that generate suitable trajectories and in a dynamical system the choice of action at time $t$ influences the trajectory of the system all future times $s>t$. Conversely, the state of the system at time $t$ depends on all of the actions that were chosen at times $s<t$.
To explain this point better, start at time $t=0$ and recall that the initial condition $ x_\pi(0)= x_0$ is known. For this initial conidition and choosing actions under policy $\pi$, the control action at time $t=0$ is $ u(0) = \pi( x_\pi(0)) = \pi( x_0)$. We now have the state $ x_\pi(0)$ given and the control policy $ u(0)$ chosen. Since the evolution of the dynamical system follows the dynamical system equations, the state $ x_\pi(1)$ at time $t=1$ is
\begin{equation} x_\pi(1) ~=~ f \Big( x(0), \pi \big( x(0) \big) \Big) ~=~ f \Big( x_0, \pi \big( x_0 \big) \Big). \end{equation}We emphasize that in the previous equations the state as $ x_\pi(1)$ depends on the policy. Different policy choices $\pi$ result in different states $ x_\pi(1)$.
We now consider the next control action which is to be taken at time $t=1$. Since the state of the system is $ x_\pi(1)$, the choice of action under policy $\pi$ is $ u(1) = \pi( x_\pi(1))$. With state given and policy chosen, the dynamical system now moves to the state
\begin{equation} x_\pi(2) ~=~ f \Big( x_\pi(1), \pi \big( x_\pi(1) \big) \Big). \end{equation}This state depends on the control action $ u(1) = \pi( x_\pi(1))$. It also depends on the choice of control action $ u(0) = \pi( x(0))$ because the state $ x_\pi(1)$ depends on this choice.
We repeat the procedure at time $t=2$. The system is in state $ x_\pi(2)$, we choose action $ u(2) = \pi( x_\pi(2))$ and the dynamical system transitions into state
\begin{equation} x_\pi(3) ~=~ f \Big( x_\pi(2), \pi \big( x_\pi(2) \big) \Big). \end{equation}We see here that the state of the system at time $t=3$ depends on all of the three actions chosen by the policy $\pi$ at times $s<3$. Conversely, we see that the action chosen at time $t=0$ affects all of the states observed at times $s>0$. This propagation of dependencies between states and actions continues as we keep making control choices $ u(t)=\pi( x_\pi(t))$ that depend on states $ x_\pi(t)$ that depend on all of the control choices $ u(s)=\pi( x_\pi(s))$ taken at previous times $s<t$.
To emphasize that control problem is a problem in which policy choices $\pi( x_\pi(t))$ are coupled across time we rewrite it to make the evolution of the dynamical system explicit,
\begin{align} \pi^* ~=~ & \text{argmin}_\pi ~ \frac{1}{T} \sum_{t=1}^{T} \ell\, \Big( \, x_{R}(t) , \, x_\pi(t) \, \Big) , \nonumber \\ & \text{\bsa with \bsb} ~ x_\pi (t+1) = f \Big( x_\pi(t), \pi \big( x_\pi(t) \big) \Big) , \nonumber \\ & \phantom{\text{\bsa with \bsb}} ~ x_\pi(0) = x_0 ~. \phantom{\Big(} \end{align}Model Based Optimization¶
To find the optimal control policy $ \pi^*$ we need to solve the optimization problem}. In this problem, control actions $\pi( x_\pi(t_1))$ and $\pi( x_\pi(t_2))$ chosen at different times are coupled. They both influence the state of the dynamical system at times $s>t_1$ and $s>t_2$. This coupling notwithstanding, we can still compute gradients of the loss to implement a descent algorithm to find the optimal control policy $\pi^*$.
Here we encounter a roadblock: The true behavior of the dynamical system is unknown. We circumvent this roadblock with the used of a model. This could be the model based on our understanding of the laws of motion or the model we learned. We use the latter which turned out to be more accurate.
Consider then the learned dynamical system when controlled by the policy $\pi$. This dynamical system evolves according to
\begin{equation} \hat{x}_\pi(t+1) = A \hat{x}_\pi(t) + B \pi \big( \hat{x}_\pi(t) \big) . \end{equation}We use $\hat{x}_\pi(t)$ to denote the state of the system. This notation emphasizes that the trajectory results from the execution of actions $\pi(\hat{x}_\pi(t))$ on the model.
We search now for policies that are optimal at controlling the model. They are given by the solution of the optimization problem,
\begin{align} \pi^* ~=~ & \text{argmin}_\pi ~ \frac{1}{T} \sum_{t=1}^{T} \ell\, \Big( \, x_{R}(t) , \, x_\pi(t) \, \Big) , \nonumber \\ & \text{\bsa with \bsb} ~ \hat{x}_\pi (t+1) = A \hat{x}_\pi(t) + B \pi \big( \hat{x}_\pi(t) \big) , \nonumber \\ & \phantom{\text{\bsa with \bsb}} ~ \hat{x}_\pi(0) = x_0 ~. \phantom{\Big(} \end{align}The problemsare optimal control problems for different dynamical systems. The formulation in the equations of optimal control is on the actual dynamical system that determines the car’s motion. We can’t solve this problem because we do not know the function $f$. The model based optimization problem is an optimal control problem on the learned model of the dynamical system. We can solve this problem because we have access to $ A$ and $ B$.
Task 6¶
Solve the model based optimization problem for the dynamical system defined by the parameters $ A$ and $ B$ learned in Task 5. Use as a reference trajectory one of the trajectories that you used for validations in Task 5. Use the $L2$ loss as figure of merit.
The solution of this optimization problem is a sequence of actions $ u(0:T-1)$ with $ u(t) = \pi^*(\hat{x}_\pi(t))$. $\blacksquare$
import torch
torch.set_float32_matmul_precision("medium")
def rollout(A, B, u, x0):
"""
Integrate the system dynamics to predict the trajectory.
Args:
A: (4, 4) The state transition matrix.
B: (2, 4) The input matrix.
u: (T, 2) The control inputs.
x0: (4,) The initial state.
"""
x = torch.zeros((len(u) + 1, A.shape[0]), device=A.device)
x[0] = x0
for i in range(len(u)):
x[i + 1] = x[i] @ A + u[i] @ B
# return the predicted trajectory
return x[1:]
expert_idx = 5
x_expert = x_expert[expert_idx]
A = model.A.detach()
B = model.B.detach()
x0 = torch.tensor(x_expert[0]).float()
x_ref = torch.tensor(x_expert[1:]).float()
u_naive = torch.zeros(len(x_ref), 2, requires_grad=True).float()
optimizer = torch.optim.Adam([u_naive], lr=1.0)
for i in range(1000):
optimizer.zero_grad()
x_pred = rollout(A, B, u_naive, x0)
# penalize the difference between the predicted and reference trajectory
loss = (x_pred - x_ref).pow(2).mean()
# add loss to progress bar
if iprint(f"Step: {i} \tLoss: {loss.item()}")
loss.backward()
optimizer.step()
Step: 0 Loss: 514.6283569335938 Step: 100 Loss: 8.270132064819336 Step: 200 Loss: 3.4032530784606934 Step: 300 Loss: 1.444098711013794 Step: 400 Loss: 0.7261595129966736 Step: 500 Loss: 0.4322381317615509 Step: 600 Loss: 0.28022676706314087 Step: 700 Loss: 0.1867842972278595 Step: 800 Loss: 0.12498853355646133 Step: 900 Loss: 0.08331455290317535
Evaluation of Control Policies¶
Solving optimal control on a model of a real system is not the same as solving optimal control on the real system. In the case of navigating a circuit, we saw in Task 5. that the learned model is accurate. We therefore expect that the solution is a good approximation to solving the optimal control problem. We test this expectation here.
To test this expectation we execute the control actions $u(0:T-1)$ that we found in Task 6 in the actual dynamical system $f$. Since we do not have access to an actual car, we will test actions $u(0:T-1)$ in a car simulator.
Task 7¶
Execute the control actions $u(0:T-1)$ found as solutions in Task 6. Evaluate and plot the $L2$ loss between the generated trajectory and the reference trajectory.
The main conclusion is that model based optimization fails at controlling the car. The challenge is, as usual, that errors accumulate over time.
A successful controller requires the introduction of a mechanism to correct the accumulation of errors. This is closed loop control, which we study in Lab 5B.
To test your controller we implement a class called Simulator
for you. By iteratively calling step
you can simulate on the ground truth system.
step
takes two arguments:
- an array of length 4 with the current state of the system,
- and an array of length 2 with the control inputs to apply.
simulator = Simulator()
t = np.arange(0, Config.duration, Config.time_step)
x_estimate = np.zeros_like(x_expert)
# use the starting point as the initial estimate
x_estimate[0] = x_expert[0]
for i in range(len(t) - 1):
x_estimate[i + 1] = simulator.step(x_estimate[i], u_naive[i].detach().cpu().numpy())
plt.figure(figsize=(10, 10))
track.plot(plt.gca())
plt.grid(True)
plt.xlabel("x (m)")
plt.ylabel("y (m)")
plt.plot(x_expert[:, 0], x_expert[:, 1], "-", label="true")
plt.plot(
x_estimate[:, 0],
x_estimate[:, 1],
"-",
label="estimated",
)
plt.legend(loc="upper left", framealpha=1.0)
plt.show()