⛷️ Chapter-1, Getting Started with RL

Welcome to the 1st Chapter, here we will discuss -

About RL

The RL problem

Inside an RL Agent

problems within RL

🐵 what is environment state in reinforcement learning ?

In reinforcement learning (RL), the environment state is the current situation or condition of the world that the agent interacts with. It encapsulates all the necessary information the agent needs to make a decision about what action to take next.

Think of it like a snapshot of the world at a specific moment in time. This snapshot includes all relevant details that can influence the agent's actions and the consequences of those actions.

Example: Chess Game

In a game of chess, the environment state would be the current configuration of the chessboard. This includes:

The position of each piece: Where are the pawns, rooks, knights, bishops, queens, and kings located?

Whose turn it is: Is it white's turn to move or black's?

Castling rights: Can either player still castle?

En passant possibilities: Is there an opportunity for a pawn to capture en passant?

The combination of these factors forms the environment state. Based on this state, the chess-playing agent (either a human or an AI) can assess the situation and decide which move to make. Each move changes the environment state, leading to a new configuration of the chessboard and a new decision to be made.

Importance of Environment State

The environment state is crucial in reinforcement learning because it allows the agent to understand its surroundings and make informed decisions. The agent's policy, which determines its actions, is often a function of the environment state. By observing the state, the agent can choose actions that are more likely to lead to positive outcomes (rewards) and avoid actions that are more likely to lead to negative outcomes.

Key Points

The environment state is a representation of the current situation in the world the agent interacts with.

It includes all relevant information needed for decision-making.

The agent's actions can change the environment state.

The agent's policy is often based on the observed environment state.

what is agent state in reinforcement learning ?

The agent state refers to the agent's internal representation of its knowledge or understanding of the environment at a given time. It's the information the agent carries with it as it interacts with the environment to make decisions and choose actions.

The agent state can be different from the environment state, which is the complete description of the environment's situation. The agent state is usually a summary or a filtered version of the environment state, containing only the information deemed relevant by the agent for its decision-making process.

Example: Robot Navigation

Consider a robot navigating a maze. The environment state would include the complete layout of the maze, the positions of all walls and obstacles, and the robot's current position. The agent state, however, might be much simpler. It could be a combination of:

Local Sensor Readings: Information about the immediate surroundings from the robot's sensors (e.g., distance to the nearest wall in each direction).

Internal Map: A partial or incomplete map of the maze built by the robot as it explores.

Goal Information: The coordinates of the target location the robot is trying to reach.

The robot uses this agent state to decide where to move next. For example, if the sensors indicate a wall directly ahead,the robot's agent state might suggest turning left or right. If the internal map shows a potential shortcut, the agent state might suggest taking that path.

Importance of Agent State

The agent state is crucial for several reasons:

Memory: It allows the agent to remember past experiences and observations, enabling it to learn from its interactions with the environment.

Planning: It provides the information necessary for the agent to plan its actions and make decisions that maximize long-term rewards.

Generalization: It can help the agent generalize its knowledge to new situations that are similar to previously encountered ones.

Types of Agent States

Markovian State: The agent state contains all the information necessary to make optimal decisions. The future depends only on the current state and the chosen action, not on the history.

Partially Observable State: The agent has limited information about the environment state and needs to make decisions based on incomplete information.

History-Based State: The agent state includes not only the current observation but also a history of past observations and actions.

what is Markov state in reinforcement learning ?

A Markov state is a state of the environment that satisfies the Markov property. The Markov property states that the future is independent of the past given the present. In other words, the current state contains all the necessary information to predict the future, and knowing the history of past states doesn't provide any additional information.

Mathematically:

A state s is a Markov state if and only if:

P(st+1 | st) = P(st+1 | st, st-1, ..., s0)

This means the probability of the next state (st+1) depends only on the current state (st) and not on any of the previous states.

Example: Grid World

Imagine a robot navigating a grid world:

+---+---+---+

| | | G |

+---+---+---+

| | x | |

+---+---+---+

| S | | |

+---+---+---+

S: Starting position of the robot

G: Goal position

x: Obstacle

In this environment, the Markov state is the robot's current position in the grid. Knowing the robot's current position is enough to determine the possible next positions it can move to and the associated probabilities of reaching those positions. It doesn't matter how the robot got to its current position; the past is irrelevant.

🤷🏻‍♂️Why Markov States are Important

Markov states are crucial in reinforcement learning because they simplify the problem of learning optimal policies. If the environment has the Markov property, the agent can focus on learning the optimal action to take in each state, without needing to consider the entire history of previous states.

Many reinforcement learning algorithms, such as Q-learning and SARSA, rely on the Markov property to efficiently learn optimal policies. By assuming Markov states, these algorithms can focus on learning the value or action-value functions for each state, which significantly simplifies the learning process.

Non-Markovian Environments

While many environments can be modeled as Markov decision processes (MDPs), not all environments satisfy the Markov property. In some cases, the history of past states can be relevant to predicting the future. In such non-Markovian environments, more complex techniques like Partially Observable Markov Decision Processes (POMDPs) are needed to account for the partial observability of the underlying state.