PPTX, PDF16,104 views

Reinforcement Learning : A Beginners Tutorial

This document provides an overview of reinforcement learning concepts including: 1) It defines the key components of a Markov Decision Process (MDP) including states, actions, transitions, rewards, and discount rate. 2) It describes value functions which estimate the expected return for following a particular policy from each state or state-action pair. 3) It discusses several elementary solution methods for reinforcement learning problems including dynamic programming, Monte Carlo methods, temporal-difference learning, and actor-critic methods.

Markov Decision ProcessAn MDP is defined by < S, A, p, r, >S - set of states of the environmentA(s)– set of actions possible in state s - probability of transition from s- expected reward when executing ain s - discount rate for expected rewardAssumption: discrete timet = 0, 1, 2, . . .rrrt +2t +3. . .s. . .t +1ssst+3t+1t+2taaaatt+1t +2t +3

Less Harmed by violations of Markov PropertyFirst Visit VS Every-Visit

R-Learning (Off-Policy)>>Average Expected reward per time-step

More Related Content

PPTX

Reinforcement Learning

bySalem-Kabbani

PPTX

Intro to Deep Reinforcement Learning

byKhaled Saleh

PPT

Reinforcement learning 7313

bySlideshare

PDF

Reinforcement learning-ebook-part1

byRajmeet Singh

PPTX

Reinforcement learning

byDing Li

PDF

Reinforcement Learning Tutorial | Edureka

byEdureka!

PPTX

Reinforcement Learning

byDongHyun Kwak

PDF

Rl chapter 1 introduction

byConnorShorten2

Reinforcement Learning

bySalem-Kabbani

Intro to Deep Reinforcement Learning

byKhaled Saleh

Reinforcement learning 7313

bySlideshare

Reinforcement learning-ebook-part1

byRajmeet Singh

Reinforcement learning

byDing Li

Reinforcement Learning Tutorial | Edureka

byEdureka!

Reinforcement Learning

byDongHyun Kwak

Rl chapter 1 introduction

byConnorShorten2

What's hot

PPTX

An introduction to reinforcement learning

bySubrat Panda, PhD

PDF

Reinforcement Learning

byMuhammad Iqbal Tawakal

PPTX

Deep Reinforcement Learning

byUsman Qayyum

PDF

Deep Q-Learning

byNikolay Pavlov

PDF

Reinforcement Learning 5. Monte Carlo Methods

bySeung Jae Lee

PDF

Proximal Policy Optimization (Reinforcement Learning)

byThom Lane

PDF

Reinforcement Learning 4. Dynamic Programming

bySeung Jae Lee

PDF

Multi-armed Bandits

byDongmin Lee

PDF

An introduction to deep reinforcement learning

byBig Data Colombia

PDF

Introduction of Deep Reinforcement Learning

byNAVER Engineering

PDF

Deep Reinforcement Learning

byMeetupDataScienceRoma

PDF

Markov decision process

byHamed Abdi

PDF

DQN (Deep Q-Network)

byDong Guo

PPT

Reinforcement learning

byChandra Meena

PDF

Reinforcement Learning 6. Temporal Difference Learning

bySeung Jae Lee

PDF

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman

byPeerasak C.

PDF

Deep reinforcement learning from scratch

byJie-Han Chen

PPTX

AI: Planning and AI

byDataminingTools Inc

PDF

Temporal difference learning

byJie-Han Chen

PPT

Reinforcement Learning Q-Learning

byMelaku Eneayehu

An introduction to reinforcement learning

bySubrat Panda, PhD

Reinforcement Learning

byMuhammad Iqbal Tawakal

Deep Reinforcement Learning

byUsman Qayyum

Deep Q-Learning

byNikolay Pavlov

Reinforcement Learning 5. Monte Carlo Methods

bySeung Jae Lee

Proximal Policy Optimization (Reinforcement Learning)

byThom Lane

Reinforcement Learning 4. Dynamic Programming

bySeung Jae Lee

Multi-armed Bandits

byDongmin Lee

An introduction to deep reinforcement learning

byBig Data Colombia

Introduction of Deep Reinforcement Learning

byNAVER Engineering

Deep Reinforcement Learning

byMeetupDataScienceRoma

Markov decision process

byHamed Abdi

DQN (Deep Q-Network)

byDong Guo

Reinforcement learning

byChandra Meena

Reinforcement Learning 6. Temporal Difference Learning

bySeung Jae Lee

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman

byPeerasak C.

Deep reinforcement learning from scratch

byJie-Han Chen

AI: Planning and AI

byDataminingTools Inc

Temporal difference learning

byJie-Han Chen

Reinforcement Learning Q-Learning

byMelaku Eneayehu

Viewers also liked

PPTX

Autonomous Car

byJatin Sharma

PPTX

Theories of Learning

byMingMing Davis

PDF

Agent architectures

byAntonio Moreno

PPTX

A very easy explanation to understanding machine learning (Supervised & Unsup...

byRyo Onozuka

PPTX

The structure of agents

byAnitha Purushothaman

PPTX

Unsupervised learning

byamalalhait

PPTX

Presentation on supervised learning

Autonomous Car

Theories of Learning

Agent architectures

A very easy explanation to understanding machine learning (Supervised & Unsup...

byRyo Onozuka

The structure of agents

byAnitha Purushothaman

Unsupervised learning

byamalalhait

Presentation on supervised learning

byTonmoy Bhagawati

Similar to Reinforcement Learning : A Beginners Tutorial

PPTX

Reinforcement Learning

bySVijaylakshmi

PPTX

lecture_21.pptx - PowerPoint Presentation

bybutest

PPTX

An efficient use of temporal difference technique in Computer Game Learning

byPrabhu Kumar

PPT

types of reinforcement-learning and description

bysailajamachapuram

PPT

Reinforcement Learner) is an intelligent agent that’s always striving to lear...

byDiksha363458

PPT

about reinforcement-learning ,reinforcement-learning.ppt

byommrudraprasad21

PPT

reinforcement-learning.ppt

byhemalathache

PPT

reinforcement-learning its based on the slide of university

byMOHDNADEEM971008

PPT

reinforcement-learning.prsentation for c

byRahulChouhan572633

PDF

Reinforcement Learning for Financial Markets

byMahmoud Mahfouz

PPTX

Reinforcement Learning: An Introduction.pptx

byAnbazhaganSelvanatha

PPTX

What is Reinforcement Algorithms and how worked.pptx

byamranmerzad1400

PDF

Reinfrocement Learning

byNatan Katz

PDF

Introduction to Deep Reinforcement Learning

byIDEAS - Int'l Data Engineering and Science Association

PDF

Introduction to reinforcement learning

byAkshay Salunkhe

PPT

Cs221 rl

bydarwinrlo

PDF

Reinforcement learning, Q-Learning

byKuppusamy P

PPTX

How to formulate reinforcement learning in illustrative ways

byYasutoTamura1

PPTX

Reinforcement Learning and deep reinforcement learning

byseddikkhemaissia1

PDF

Deep Reinforcement learning

byCairo University

Reinforcement Learning

bySVijaylakshmi

lecture_21.pptx - PowerPoint Presentation

bybutest

An efficient use of temporal difference technique in Computer Game Learning

byPrabhu Kumar

types of reinforcement-learning and description

bysailajamachapuram

Reinforcement Learner) is an intelligent agent that’s always striving to lear...

byDiksha363458

about reinforcement-learning ,reinforcement-learning.ppt

byommrudraprasad21

reinforcement-learning.ppt

byhemalathache

reinforcement-learning its based on the slide of university

byMOHDNADEEM971008

reinforcement-learning.prsentation for c

byRahulChouhan572633

Reinforcement Learning for Financial Markets

byMahmoud Mahfouz

Reinforcement Learning: An Introduction.pptx

byAnbazhaganSelvanatha

What is Reinforcement Algorithms and how worked.pptx

byamranmerzad1400

Reinfrocement Learning

byNatan Katz

Introduction to Deep Reinforcement Learning

byIDEAS - Int'l Data Engineering and Science Association

Introduction to reinforcement learning

byAkshay Salunkhe

Cs221 rl

bydarwinrlo

Reinforcement learning, Q-Learning

byKuppusamy P

How to formulate reinforcement learning in illustrative ways

byYasutoTamura1

Reinforcement Learning and deep reinforcement learning

byseddikkhemaissia1

Deep Reinforcement learning

byCairo University

In this document

Slide 1Introduction to Reinforcement Learning

An overview of Reinforcement Learning, introducing the tutorial and its author, Omar Enayet.

Slide 2Understanding the Problem

A presentation of the key problem in reinforcement learning that needs to be addressed.

Slide 3Agent-Environment Interaction

Explains the interface between agents and their environments in reinforcement learning.

Slide 4Modeling the Environment

Discussion on how to model the environment that the agent interacts with.

Slide 5Setting Goals and Rewards

Introduction to the importance of goals and the reward system in reinforcement learning.

Slide 6Defining Returns

Explains the concept of returns, a fundamental aspect in the evaluation of reinforcement learning.

Slide 7Credit Assignment Problem

Describes the challenge in determining the contribution of each action towards the outcomes.

Slide 8Markov Decision Process (MDP)

Details the Markov Decision Process, defining its components such as states, actions, transition probabilities.

Slides 9 - 11Understanding Value Functions

Explains the concept of value functions which evaluate the worth of states or actions in reinforcement learning.

Slide 12Optimal Value Functions

Discussion on strategies to achieve optimal value functions in reinforcement learning.

Slide 13Exploring Trade-offs: Exploration vs. Exploitation

Analyzes the dilemma of balancing exploration of new actions versus exploiting known rewarding actions.

Slide 14Policies in Reinforcement Learning

Introduction to policies, which dictate the actions taken by the agent based on the state.

Slide 15Elementary Solutions Overview

Overview of basic methods for solving reinforcement learning problems.

Slide 16Dynamic Programming Techniques

Describes the role of dynamic programming as a method in reinforcement learning.

Slide 17Understanding the Perfect Model

Focus on the concept of a perfect model in dynamic programming and its implications.

Slide 18Bootstrapping Techniques

Explains bootstrapping, a technique used to improve learning efficiency.

Slide 19Generalized Policy Iteration

Discussion on generalized policy iteration and its significance in reinforcement learning.

Slide 20Efficiency in Dynamic Programming

Analyzes the efficiency of dynamic programming approaches in solving reinforcement learning tasks.

Slide 21Monte Carlo Methods Explained

An overview of Monte Carlo methods used in reinforcement learning to estimate value functions.

Slide 22Understanding Episodic Returns

Explains episodic returns, a critical concept in the evaluation of reinforcement learning episodes.

Slide 23Benefits Over Dynamic Programming

Outlines the advantages of Monte Carlo methods compared to dynamic programming.

Slide 24Visit Types in Reinforcement Learning

Compares first visit and every-visit methods in estimating state values.

Slide 25Policy Types: On-Policy vs Off-Policy

Explains the distinction between on-policy and off-policy learning methods.

Slide 26Action-Value vs State-Value Methods

Discussion on action-value methods compared to state-value approaches in reinforcement learning.

Slide 27Understanding Temporal-Difference Learning

Introduction to temporal-difference learning, a combination of policy evaluation and improvement strategies.

Slide 28Advantages of Temporal-Difference Learning

Highlights the benefits of using temporal-difference learning in practical applications.

Slide 29SARSA Learning Technique

An overview of the SARSA algorithm, which is an on-policy reinforcement learning method.

Slide 30Q-Learning as an Off-Policy Method

Describes the Q-Learning algorithm as a significant off-policy reinforcement learning strategy.

Slide 32Actor-Critic Approach Overview

Explains the actor-critic methods that combine both policy-based and value-based approaches.

Slide 33Understanding R-Learning

Introduces R-Learning, exploring average expected rewards per time step.

Slide 34Eligibility Traces in Learning

Overview of eligibility traces, a technique to enhance reinforcement learning efficiency.

Slide 37References and Further Reading

Provides references for further exploration of reinforcement learning concepts.

Reinforcement Learning : A Beginners Tutorial

1.
Reinforcement LearningA Beginner’sTutorialBy: Omar Enayet(Presentation Version)
2.
The Problem
3.
Agent-Environment Interface
4.
Environment Model
5.
Goals & Rewards
6.
Returns
7.
Credit-Assignment Problem
8.
Markov Decision ProcessAnMDP is defined by < S, A, p, r, >S - set of states of the environmentA(s)– set of actions possible in state s - probability of transition from s- expected reward when executing ain s - discount rate for expected rewardAssumption: discrete timet = 0, 1, 2, . . .rrrt +2t +3. . .s. . .t +1ssst+3t+1t+2taaaatt+1t +2t +3
9.
Value Functions
10.
Value Functions
11.
Value Functions
12.
Optimal Value Functions
13.
Exploration-Exploitation Problem
14.
Policies
15.
Elementary Solution Methods
16.
Dynamic Programming
17.
Perfect Model
18.
Bootstrapping
19.
Generalized Policy Iteration
20.
Efficiency of DP
21.
Monte-Carlo Methods
22.
Episodic Return
23.
Advantages over DPNoModel
24.
Simulation OR partof Model
25.
Focus on smallsubset of states
26.
Less Harmed byviolations of Markov PropertyFirst Visit VS Every-Visit
27.
On-Policy VS Off-Policy
28.
Action-value instead ofState-value
29.
Temporal-Difference Learning
30.
Advantages of TDLearning
31.
SARSA (On-Policy)
32.
Q-Learning (Off-Policy)
34.
Actor-Critic Methods(On-Policy)
35.
R-Learning (Off-Policy)>>Average Expectedreward per time-step
36.
Eligibility Traces
39.
ReferencesRichard S. Suttonand Andrew G. Barto. Reinforcement Learning, Bradford Books, 1998.Richard Crouch, Peter Bennett, Stephen Bridges, Nick Piper and Robert Oates - Monte Carlo - 2003Slides for reading with :Omar Enayet – Reinforcement Learning : A Beginner’s Tutorial- 2009

Editor's Notes

#5 By a model of the environment we mean anything that an agent can use to predict how the environment will respond to its actions. Given a state and an action, a model produces a prediction of the resultant next state and next reward. If the model is stochastic, then there are several possible next states and next rewards, each with some probability of occurring. Some models produce a description of all possibilities and their probabilities; these we call distribution models. Other models produce just one of the possibilities, sampled according to the probabilities; these we call sample models. For example, consider modeling the sum of a dozen dice. A distribution model would produce all possible sums and their probabilities of occurring, whereas a sample model would produce an individual sum drawn according to this probability distribution.
#8 Credit assignment problem: How do you distribute credit for success among the many decisions that may have been involved in producing it?
#14 One of the challenges that arise in reinforcement learning and not in other kinds of learning is the trade-off between exploration and exploitation. To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the past and found to be effective in producing reward. But to discover such actions, it has to try actions that it has not selected before. The agent has to exploit what it already knows in order to obtain reward, but it also has to explore in order to make better action selections in the future. The dilemma is that neither exploration nor exploitation can be pursued exclusively without failing at the task. The agent must try a variety of actions andprogressively favor those that appear to be best. On a stochastic task, each action must be tried many times to gain a reliable estimate its expected reward. The exploration-exploitation dilemma has been intensively studied by mathematicians for many decades