logo Hurry, Grab up to 30% discount on the entire course
Order Now logo
427 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Peter EllisAccounting
(5/5)

668 Answers

Hire Me
expert
Garardd BerwickData mining
(5/5)

951 Answers

Hire Me
expert
Gayle ThomasManagement
(5/5)

805 Answers

Hire Me
expert
Peter BernerGeneral article writing
(5/5)

728 Answers

Hire Me
Others
(5/5)

Formulate the above game as a reinforcement learning system. Please specify the key components in the game

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Remarks. You are expected to write a short essay, which covers in detail your approaches and answers to the below questions. It is highly recommended that you first state your approaches and ideas at a high level and then show how your ideas apply to the two concrete examples as shown here. Your score of this project will be evaluated against both your answers to specific questions and the overall writing skills.

Consider such an interesting game as follows. There is a special die with N  sides, where the ith side has

 the number i for each 1 ≤ i ≤

 . N . Let [N ] =

 {1, 2, 3, . . . , N }, the set of integers ranging from 1 to N .  Let

 

p [0, 1]N be a vector of length N such that the ith entry of p, denoted by pi, represents the probability that we will end with the ith side (thus, we will see the number i) if rolling the die once. For example, N = 4 and p = (0, 1/2, 1/4, 1/4), which means that if we roll the die once, we will see the number 1, 2, 3, and 4, with probability 0, 1/2, 1/4 and 1/4, respectively. There is another binary vector q ∈ {0, 1}N , where the ith entry of q, denoted by qi, indicates if the ith side is BAD (qi = 1) or not (qi = 0).

Game Rules. At the beginning, you have $0 at hand. Suppose at some time, you have x < K dollars at hand, where K is a parameter known in advance. You have two choices to make, either “accept” the challenge or “quit”. (Case 1) If your choice is “quit”, then game is over and you walk away with x dollars. (Case 2) If your choice is “accept”, then you will roll the die once and see a random number X ∈ [N ] with a probability specified by p. Here are two subcases. (1) If qX = 1, i.e., the Xth side is BAD, then you lose all current money at hand; (2) If qX = 0, i.e., the Xth side is not BAD, then you will get a reward of f (X) where f is a function of X.  In this case, you will have x + f (X) dollars.  Here is a tricky part:  if x + f (X)    K (bear in mind that K is a parameter known in advance), then game is over, and you take x + f (X) dollars and go away; otherwise, you will continue the game with x + f (X) dollars at hand. Attention: If you accept the

challenge, roll the die, and get X such that qX = 1, you lose all the money at hand but Game is NOT over: you can still continue to play the game with $0 at hand. Game is over only when either you choose to quit or you have at least K dollars at hand. Note that the following key components uniquely define the game: (N , p, q, f , K).

(Question 1) Consider a simple case  where  N  = 6,  p = (1/6, 1/6, 1/6, 1/6, 1/6, 1/6).  In  other  words,  we have a “normal” die with six sides, and each side will appear with the same chance if we roll once. Let q = (1, 0, 1, 0, 1, 0), f (X) = max(X2, 23), and K = 150. You are asked to do the following.

(a) Formulate the above game as a reinforcement learning system. Please specify the key components in the game (   ,    , P, R), where     is the state space,     is the action space, P is the transition probability matrix, R is the reward function. For simplicity, you can assume the discounted factor γ = 1. Please specify clearly the terminal state space (ST ) and the non-terminal state space (SN ).

(b) Compute the optimal value function V ∗ and the optimal policy π∗. You can try either the value iteration method or the dynamic programming method. Please make sure to state explicitly the values of V ∗(s) and π∗(s) for all s ∈ SN , where SN refers to the non-terminal state space. Based on your results, state explicitly

 

 

 

 

 

the maximum expected total rewards you will get in this game when starting with $0. (If you use the value iteration method, please try different tolerance parameters ϵ to make sure your algorithm converges properly.)

(c) Please try the approach of linear  programming  (LP)  to  compute  the  optimal  value  function  V ∗ and the optimal policy π∗. You should explicitly specify the following elements in the LP: variables, objective function, and constraints.  Again, please state explicitly the values of V ∗(s) and π∗(s) for all s    N .  Based on your results, state explicitly the maximum expected total rewards you will get in this game when starting

with $0.

(Question  2)  Consider  a  special  case  where  N  =  5,  p  =  (1/2, 1/4, 1/8, 1/16, 1/16),  q  =  (0, 1, 0, 1, 0),

f (X) = min(5, 2X), and K = 150. Answer the same questions (a), (b), and (c), as shown in Question 1.

 

(5/5)
Attachments:

Expert's Answer

427 Times Downloaded

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme