Getting a pendulum to stand bolt upright by just giving it a swing here and there takes a lot of dexterity … or reinforcement learning.
After specifying the problem, we initialize the algorithm, which then gets to work trying to solve the problem.
The reward condition in this example is very simple: the algorithm is rewarded for each timestep in which it can keep the pendulum bolt upright.
A note to the interested viewer: in this case, the number of timesteps per episode is fixed.
In about 2-3 minutes, NXS Core’s reinforcement learning algorithm learns the problem well enough to keep the pendulum upright and stable.