In this tutorial, we’ll solve the BipedalWalker-v3 environment, which is a very hard environment in the Gym. Our agent should run very fast, should not trip himself off, should use as little energy as possible…

  • LunarLander-v2 has a Discrete(4) action space. This means there are four outputs (left engine, right engine, main engine, and do nothing), and we send to the environment…

In this step-by-step reinforcement learning tutorial with gym and TensorFlow 2. I’ll show you how to implement a PPO for teaching an AI agent how to land a rocket (Lunarlander-v2)


In this tutorial, we’ll dive into the understanding of the PPO architecture and we’ll implement a Proximal Policy Optimization (PPO) agent that learns to play Pong-v0


In this tutorial, I will provide an implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm in Tensorflow and Keras. We will use it to solve a simple challenge in the Pong environment


Today, we’ll study a Reinforcement Learning method that we can call a ‘hybrid method’: Actor-Critic. This algorithm combines the value optimization and policy optimization approaches

  • Actor: a PG…


To wrap up deep reinforcement learning, I’ll introduce the types of agents beyond DQN’s (Value, Model, Policy optimization, and Imitation Learning). We’ll implement Policy Gradient!

  1. Suppose the possible number of state-action pairs is relatively large in a given environment. In that case, the Q-function can become highly complicated, so it becomes intractable to estimate the optimal Q-value.
  2. Even in situations where finding Q is…


In this tutorial, I’ll implement a Deep Neural Network for Reinforcement Learning (Deep Q Network), and we will see it learns and finally becomes good enough to beat the computer in Pong!

  • Write a Neural Network from scratch;
  • Implement a Deep Q Network with Reinforcement Learning;
  • Build an A.I. for Pong that can beat the computer in less than 300 lines of Python;
  • Use OpenAI gym.


Now I will show you how to implement DQN with CNN. After this tutorial, you’ll be able to create an agent that successfully plays almost ‘any’ game using only pixel inputs


Now we will try to change the sampling distribution by using a criterion to define the priority of each tuple of experience.


In this part, we’ll cover the Epsilon Greedy method used in Deep Q Learning and we’ll fix/prepare our source code for PER method

Rokas Balsys

Machine learning enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store