Imagine you are sitting in your battered old car in the middle of a valley. There are slopes on both sides and you need to get up to the highest point of the hill to the right. You know that your car won’t make it using its engine’s power alone. You’ll have to use momentum as well!
It’s a formidable problem for reinforcement learning!
After specifying the problem, we initialize the algorithm, which then gets to work trying to solve the problem.
For every timestep in which the algorithm does not reach the highest point, it gets a negative reward. Only when it successfully reaches the highest point on the hill is it granted a positive reward.
The algorithm is only given the options of driving left, driving right or not using the engine at all.
An episode ends when the car has successfully reached the highest point on the hill.
In about 3-5 minutes, NXS Core’s reinforcement learning algorithm learns the problem well enough to get the car safely up to the highest point on the hill.