Racing Car

GitHub

I've leveraged the power of reinforcement learning within the dynamic framework provided by OpenAI's Gym library to train a racing car. The car is tasked with navigating a fixed track that is regenerated with each new episode, presenting a unique challenge every time.

2023 - present

Timeline

2023 - present

Timeline

About

The car starts at the centre of the road. The generated track is random every episode. Some indicators are shown at the bottom of the window along with the state RGB buffer. From left to right: true speed, four ABS sensors, steering wheel position, and gyroscope.

Action Space: In a discrete environment, the action space consist of 3 actions:

[0: Steering, 1: Gas, 2: Brake]

Observation space: A top-down 96x96 RGB image of the car and race track.

Action Space: In a discrete environment, the action space consist of 3 actions:

[0: Steering, 1: Gas, 2: Brake]

Observation space: A top-down 96x96 RGB image of the car and race track.

Training

Algorithm: Proximal Policy Optimisation (PPO)

Compared to other algorithms, Proximal Policy Optimisation provides simplicity, stability, and sample efficiency. Essentially, to train the right policy network, PPO takes a small policy update (step size), so the agent can reliably reach the optimal solution. A too-big step may direct policy in the false direction, thus having little possibility of recovery; a too-small step lowers overall efficiency.

Algorithm: Proximal Policy Optimisation (PPO)

Training method: Vectorized Environments

Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 environment per step, it allows us to train it on n environments per step.When using vectorized environments, the environments are automatically reset at the end of each episode.

Training method: Vectorized Environments

Timesteps