Reinforcement Learning
Published in SJTU-UM JI, ECE Department, 2024
This Project aims to implement Reinforcement Learning in Pac Man Problem using python.
What is Reinforcement Learning?
Reinforcement Learning (RL) is a branch of machine learning where an agent learns to make decisions by interacting with its environment. The goal of the agent is to maximize a reward signal through trial and error.
Key Concepts in Reinforcement Learning
Agent: The learner or decision maker. It takes actions in the environment to achieve a goal.
Environment: The external system with which the agent interacts. It provides feedback to the agent based on its actions.
State: A representation of the current situation or configuration of the environment. It contains all the information needed for the agent to make a decision.
Action: A decision or move that the agent can take to interact with the environment. Actions affect the state of the environment.
Reward: A scalar value that indicates how good or bad the agent’s action was in the context of its objective. The agent’s goal is to maximize the cumulative reward.
Policy: A strategy that the agent follows to choose actions given states. The policy can be deterministic or stochastic.
Value Function: A function that estimates the expected return (future cumulative reward) for being in a particular state or taking a particular action.
Q-Function: A function that estimates the expected return of taking a particular action in a particular state and following the optimal policy thereafter.
Discount Factor (γ): A factor that determines how much future rewards are taken into consideration. A discount factor of 1 means future rewards are as important as immediate ones, while a factor close to 0 makes the agent focus mostly on immediate rewards.
RL Process
The RL process typically follows these steps:
- The agent observes the state of the environment.
- The agent selects an action based on its policy.
- The environment reacts to the action and provides the agent with a new state and a reward.
- The agent updates its policy based on the received reward, aiming to maximize future rewards.
- This process is repeated over multiple steps (episodes) until the agent converges to an optimal policy.
Types of Reinforcement Learning
- Model-Free RL: The agent learns a policy directly from interactions without building a model of the environment.
- Value-based: The agent tries to estimate the value function (e.g., Q-learning).
- Policy-based: The agent directly learns the policy (e.g., REINFORCE).
- Actor-Critic: Combines value-based and policy-based methods to learn both a value function and a policy (e.g., A3C).
- Model-Based RL: The agent builds a model of the environment to plan ahead and make decisions.
Implementation:
The doc of the requirement for the implementation is Project explination.
The code files can be found here
