Break Out

One of the famous Atari games. You move a paddle and hit the ball in a brick wall at the top of the screen. Your goal is to destroy the brick wall. You can try to break through the wall and let the ball wreak havoc on the other side, all on its own!

It's fun playing trying to hit all the bricks, but curious to witness flawless gameplay where every move is calculated to perfection? That's what I did below. Watch as our trained agent effortlessly conquers every brick, achieving the ultimate high score.

2023 - present

Timeline

2023 - present

Timeline

Background

The goal of the game is to clear out all the bricks by hitting them with a ball. The agent can only control the paddle located at the bottom of the game.

The actions/action space of the agent is [0: No operation, 1: Fire, 2: Right, 3: Left]

The game is observed in a GUI environment that returns a RGB image displayed to humans.

The rewards given to the agent are on the basis of the colour of the brick that's been destroyed. For detailed reward information, see Atari Page.

The goal of the game is to clear out all the bricks by hitting them with a ball. The agent can only control the paddle located at the bottom of the game.

The actions/action space of the agent is [0: No operation, 1: Fire, 2: Right, 3: Left]

The game is observed in a GUI environment that returns a RGB image displayed to humans.

The rewards given to the agent are on the basis of the colour of the brick that's been destroyed. For detailed reward information, see Atari Page.

Training

Algorithm: A2C (Advantage Actor-Critic)

A2C (Advantage Actor-Critic) is a specific variant of the Actor-Critic algorithm that introduces the concept of the advantage function. This function measures how much better an action is compared to the average action in a given state. By incorporating this advantage information, A2C focuses the learning process on actions that have a significantly higher value than the typical action taken in that state.

Algorithm: A2C (Advantage Actor-Critic)

Training Method: Vectorized environment

In a vectorized environment, multiple parallel environments are run simultaneously to speed up training, especially for algorithms like deep reinforcement learning that can benefit from parallelism. The agent is trained under a class of vectorized environment called VecFrameStack. VecFrameStack specifically stacks consecutive observations from these parallel environments to create a single observation, effectively converting a batch of observations into a single observation that includes information from multiple time steps.

Training Method: Vectorized environment

Timesteps