Pendulum swingup

Standard 1-DOF pendulum swingup problem (2 states, 1 control input).

System parameters
MDP parameters
Value iteration and approximator parameters
Simulation parameters
Hyperparameter tuning
Value iteration
Simulation rollout
Plot V-function
Plot policy

clear MDP

System parameters

MDP.xdom = [-pi pi;             % state-space domain limits - angle
            -20 20];            % state-space domain limits - angular velocity
MDP.udom = [-2 2];              % control input domain limits
MDP.Ts = 0.02;                  % sampling period
MDP.model = 'swingupModel';     % state transition function name

MDP parameters

MDP.gamma = 0.99;               % discount factor
MDP.reward = 'swingupReward';   % reward function name
MDP.xr = [pi 0];                % desired state (pendulum pointing upward)
MDP.xrTol = [0.1 0.1];          % tolerance of desired state (hyperbox reward)
MDP.intmethod = 'linear';       % interpolation method for interpn ('linear', 'nearest', 'pchip','cubic', 'makima', or 'spline')

Value iteration and approximator parameters

MDP.n = [65 65];                % number of BF for x
MDP.m = 11;                     % number of discrete actions
MDP.maxiter = 1e4;              % number of iterations
MDP.epstheta = 1e-3;            % stopping criterion threshold
% MDP.savememory = true;          % save memory by normalizing states (only works well for uniform grids)

Simulation parameters

MDP.Tsim = 1.5;                 % simulation time
MDP.rstop = [0 10];             % terminate rollout when r >= rstop(1) for nr. samples >= rstop(2)
MDP.xi = [0 0];                 % initial state
MDP.polmethod = 'hillclimbing'; % policy derivation {'interpolated', 'hillclimbing'}
MDP.movie = 'swingup';          % save rollout as a movie

Hyperparameter tuning

[R,nGrid] = hpTune(MDP,{11:101,11:101},[-pi/4 pi/4; -1 1],10); [R,nGrid] = hpTune(MDP,{11:101,11:101},[0 0]); [R,ng,nGrid] = hpTuneParallel(MDP,{1+2.^[2:7],1+2.^[2:7]},[-pi/2 pi/2; -1 1],100);

Value iteration

MDP = vi(MDP);                  % value iteration

Precomputing state transitions ...
Iterating ...
Elapsed time is 0.595774 seconds.
Number of iterations: 276

Simulation rollout

R = rollout(MDP);               % simulation

Plot V-function

plotV(MDP);

Plot policy

plotpi(MDP);