#ReinforcementLearning
#OpenAIGym
#Stable-Baselines3
Racing Car
Racing Car
Racing Car
Racing Car
Racing Car
Racing Car
I've leveraged the power of reinforcement learning within the dynamic framework provided by OpenAI's Gym library to train a racing car. The car is tasked with navigating a fixed track that is regenerated with each new episode, presenting a unique challenge every time.
I've leveraged the power of reinforcement learning within the dynamic framework provided by OpenAI's Gym library to train a racing car. The car is tasked with navigating a fixed track that is regenerated with each new episode, presenting a unique challenge every time.
I've leveraged the power of reinforcement learning within the dynamic framework provided by OpenAI's Gym library to train a racing car. The car is tasked with navigating a fixed track that is regenerated with each new episode, presenting a unique challenge every time.
I've leveraged the power of reinforcement learning within the dynamic framework provided by OpenAI's Gym library to train a racing car. The car is tasked with navigating a fixed track that is regenerated with each new episode, presenting a unique challenge every time.
I've leveraged the power of reinforcement learning within the dynamic framework provided by OpenAI's Gym library to train a racing car. The car is tasked with navigating a fixed track that is regenerated with each new episode, presenting a unique challenge every time.
I've leveraged the power of reinforcement learning within the dynamic framework provided by OpenAI's Gym library to train a racing car. The car is tasked with navigating a fixed track that is regenerated with each new episode, presenting a unique challenge every time.
2023 - present
Timeline
2023 - present
Timeline
About
About
The car starts at the centre of the road. The generated track is random every episode. Some indicators are shown at the bottom of the window along with the state RGB buffer. From left to right: true speed, four ABS sensors, steering wheel position, and gyroscope.
Action Space: In a discrete environment, the action space consist of 3 actions:
[0: Steering, 1: Gas, 2: Brake]
Observation space: A top-down 96x96 RGB image of the car and race track.
The car starts at the centre of the road. The generated track is random every episode. Some indicators are shown at the bottom of the window along with the state RGB buffer. From left to right: true speed, four ABS sensors, steering wheel position, and gyroscope.
Action Space: In a discrete environment, the action space consist of 3 actions:
[0: Steering, 1: Gas, 2: Brake]
Observation space: A top-down 96x96 RGB image of the car and race track.
Training
Training
1
Algorithm: Proximal Policy Optimisation (PPO)
Compared to other algorithms, Proximal Policy Optimisation provides simplicity, stability, and sample efficiency. Essentially, to train the right policy network, PPO takes a small policy update (step size), so the agent can reliably reach the optimal solution. A too-big step may direct policy in the false direction, thus having little possibility of recovery; a too-small step lowers overall efficiency.
1
Algorithm: Proximal Policy Optimisation (PPO)
Compared to other algorithms, Proximal Policy Optimisation provides simplicity, stability, and sample efficiency. Essentially, to train the right policy network, PPO takes a small policy update (step size), so the agent can reliably reach the optimal solution. A too-big step may direct policy in the false direction, thus having little possibility of recovery; a too-small step lowers overall efficiency.
1
Algorithm: Proximal Policy Optimisation (PPO)
Compared to other algorithms, Proximal Policy Optimisation provides simplicity, stability, and sample efficiency. Essentially, to train the right policy network, PPO takes a small policy update (step size), so the agent can reliably reach the optimal solution. A too-big step may direct policy in the false direction, thus having little possibility of recovery; a too-small step lowers overall efficiency.
1
Algorithm: Proximal Policy Optimisation (PPO)
Compared to other algorithms, Proximal Policy Optimisation provides simplicity, stability, and sample efficiency. Essentially, to train the right policy network, PPO takes a small policy update (step size), so the agent can reliably reach the optimal solution. A too-big step may direct policy in the false direction, thus having little possibility of recovery; a too-small step lowers overall efficiency.
1
Algorithm: Proximal Policy Optimisation (PPO)
Compared to other algorithms, Proximal Policy Optimisation provides simplicity, stability, and sample efficiency. Essentially, to train the right policy network, PPO takes a small policy update (step size), so the agent can reliably reach the optimal solution. A too-big step may direct policy in the false direction, thus having little possibility of recovery; a too-small step lowers overall efficiency.
2
Training method: Vectorized Environments
Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 environment per step, it allows us to train it on n environments per step.When using vectorized environments, the environments are automatically reset at the end of each episode.
2
Training method: Vectorized Environments
Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 environment per step, it allows us to train it on n environments per step.When using vectorized environments, the environments are automatically reset at the end of each episode.
2
Training method: Vectorized Environments
Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 environment per step, it allows us to train it on n environments per step.When using vectorized environments, the environments are automatically reset at the end of each episode.
2
Training method: Vectorized Environments
Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 environment per step, it allows us to train it on n environments per step.When using vectorized environments, the environments are automatically reset at the end of each episode.
2
Training method: Vectorized Environments
Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 environment per step, it allows us to train it on n environments per step.When using vectorized environments, the environments are automatically reset at the end of each episode.
3
Timesteps
The agent has been trained under 1,20,000 timesteps which resulted in the optimal result. The optimal range of training is between 1,20,000 and 1,40,000 without resulting in overfitting.
3
Timesteps
The agent has been trained under 1,20,000 timesteps which resulted in the optimal result. The optimal range of training is between 1,20,000 and 1,40,000 without resulting in overfitting.
3
Timesteps
The agent has been trained under 1,20,000 timesteps which resulted in the optimal result. The optimal range of training is between 1,20,000 and 1,40,000 without resulting in overfitting.
3
Timesteps
The agent has been trained under 1,20,000 timesteps which resulted in the optimal result. The optimal range of training is between 1,20,000 and 1,40,000 without resulting in overfitting.
3
Timesteps
The agent has been trained under 1,20,000 timesteps which resulted in the optimal result. The optimal range of training is between 1,20,000 and 1,40,000 without resulting in overfitting.
Libraries
Libraries
1
Gym
OpenAI Gym is a Pythonic API that provides simulated training environments to train and test reinforcement learning agents. It's become the industry standard API for reinforcement learning and is essentially a toolkit for training RL algorithms.
1
Gym
OpenAI Gym is a Pythonic API that provides simulated training environments to train and test reinforcement learning agents. It's become the industry standard API for reinforcement learning and is essentially a toolkit for training RL algorithms.
1
Gym
OpenAI Gym is a Pythonic API that provides simulated training environments to train and test reinforcement learning agents. It's become the industry standard API for reinforcement learning and is essentially a toolkit for training RL algorithms.
1
Gym
OpenAI Gym is a Pythonic API that provides simulated training environments to train and test reinforcement learning agents. It's become the industry standard API for reinforcement learning and is essentially a toolkit for training RL algorithms.
1
Gym
OpenAI Gym is a Pythonic API that provides simulated training environments to train and test reinforcement learning agents. It's become the industry standard API for reinforcement learning and is essentially a toolkit for training RL algorithms.
2
Stable-Baselines3
Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. Stable baselines provides features like Unified structure for all algorithms, clean code and tensorboard support.
2
Stable-Baselines3
Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. Stable baselines provides features like Unified structure for all algorithms, clean code and tensorboard support.
2
Stable-Baselines3
Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. Stable baselines provides features like Unified structure for all algorithms, clean code and tensorboard support.
2
Stable-Baselines3
Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. Stable baselines provides features like Unified structure for all algorithms, clean code and tensorboard support.
2
Stable-Baselines3
Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. Stable baselines provides features like Unified structure for all algorithms, clean code and tensorboard support.
Tools
Tools
Jupyter Notebook
Open AI Gym
Conclusion
The agent is able to drive through the track under optimal training resulting in good acceleration, braking and good turns at apex corners. Furthermore the agent is able to reach scores of 900+, exiting the track with good driving.