Pendulum swingup
Standard 1-DOF pendulum swingup problem (2 states, 1 control input).
Contents
clear MDP
System parameters
MDP.xdom = [-pi pi; % state-space domain limits - angle -20 20]; % state-space domain limits - angular velocity MDP.udom = [-2 2]; % control input domain limits MDP.Ts = 0.02; % sampling period MDP.model = 'swingupModel'; % state transition function name
MDP parameters
MDP.gamma = 0.99; % discount factor MDP.reward = 'swingupReward'; % reward function name MDP.xr = [pi 0]; % desired state (pendulum pointing upward) MDP.xrTol = [0.1 0.1]; % tolerance of desired state (hyperbox reward) MDP.intmethod = 'linear'; % interpolation method for interpn ('linear', 'nearest', 'pchip','cubic', 'makima', or 'spline')
Value iteration and approximator parameters
MDP.n = [65 65]; % number of BF for x MDP.m = 11; % number of discrete actions MDP.maxiter = 1e4; % number of iterations MDP.epstheta = 1e-3; % stopping criterion threshold % MDP.savememory = true; % save memory by normalizing states (only works well for uniform grids)
Simulation parameters
MDP.Tsim = 1.5; % simulation time MDP.rstop = [0 10]; % terminate rollout when r >= rstop(1) for nr. samples >= rstop(2) MDP.xi = [0 0]; % initial state MDP.polmethod = 'hillclimbing'; % policy derivation {'interpolated', 'hillclimbing'} MDP.movie = 'swingup'; % save rollout as a movie
Hyperparameter tuning
[R,nGrid] = hpTune(MDP,{11:101,11:101},[-pi/4 pi/4; -1 1],10); [R,nGrid] = hpTune(MDP,{11:101,11:101},[0 0]); [R,ng,nGrid] = hpTuneParallel(MDP,{1+2.^[2:7],1+2.^[2:7]},[-pi/2 pi/2; -1 1],100);
Value iteration
MDP = vi(MDP); % value iteration
Precomputing state transitions ... Iterating ... Elapsed time is 0.595774 seconds. Number of iterations: 276
Simulation rollout
R = rollout(MDP); % simulation
Plot V-function
plotV(MDP);
Plot policy
plotpi(MDP);