(5/5)

Buy Now $15 USD

826 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Les BarkerLaw

(5/5)

673 Answers

Hire Me

Zkim HistHistory

(5/5)

875 Answers

Hire Me

Jordan CarterEngineering

(5/5)

945 Answers

Hire Me

Jessica FullerComputer science

(5/5)

740 Answers

Hire Me

Others

(5/5)

Formulate the above game as a reinforcement learning system. Please specify the key components in the game

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Remarks. You are expected to write a short essay, which covers in detail your approaches and answers to the below questions. It is highly recommended that you first state your approaches and ideas at a high level and then show how your ideas apply to the two concrete examples as shown here. Your score of this project will be evaluated against both your answers to specific questions and the overall writing skills.

Consider such an interesting game as follows. There is a special die with N sides, where the ith side has

the number i for each 1 ≤ i ≤

. N . Let [N ] =

{1, 2, 3, . . . , N }, the set of integers ranging from 1 to N . Let

p [0, 1]N be a vector of length N such that the ith entry of p, denoted by pi, represents the probability that we will end with the ith side (thus, we will see the number i) if rolling the die once. For example, N = 4 and p = (0, 1/2, 1/4, 1/4), which means that if we roll the die once, we will see the number 1, 2, 3, and 4, with probability 0, 1/2, 1/4 and 1/4, respectively. There is another binary vector q ∈ {0, 1}N , where the ith entry of q, denoted by qi, indicates if the ith side is BAD (qi = 1) or not (qi = 0).

Game Rules. At the beginning, you have $0 at hand. Suppose at some time, you have x < K dollars at hand, where K is a parameter known in advance. You have two choices to make, either “accept” the challenge or “quit”. (Case 1) If your choice is “quit”, then game is over and you walk away with x dollars. (Case 2) If your choice is “accept”, then you will roll the die once and see a random number X ∈ [N ] with a probability specified by p. Here are two subcases. (1) If qX = 1, i.e., the Xth side is BAD, then you lose all current money at hand; (2) If qX = 0, i.e., the Xth side is not BAD, then you will get a reward of f (X) where f is a function of X. In this case, you will have x + f (X) dollars. Here is a tricky part: if x + f (X) K (bear in mind that K is a parameter known in advance), then game is over, and you take x + f (X) dollars and go away; otherwise, you will continue the game with x + f (X) dollars at hand. Attention: If you accept the

challenge, roll the die, and get X such that qX = 1, you lose all the money at hand but Game is NOT over: you can still continue to play the game with $0 at hand. Game is over only when either you choose to quit or you have at least K dollars at hand. Note that the following key components uniquely define the game: (N , p, q, f , K).

(Question 1) Consider a simple case where N = 6, p = (1/6, 1/6, 1/6, 1/6, 1/6, 1/6). In other words, we have a “normal” die with six sides, and each side will appear with the same chance if we roll once. Let q = (1, 0, 1, 0, 1, 0), f (X) = max(X2, 23), and K = 150. You are asked to do the following.

(a) Formulate the above game as a reinforcement learning system. Please specify the key components in the game ( , , P, R), where is the state space, is the action space, P is the transition probability matrix, R is the reward function. For simplicity, you can assume the discounted factor γ = 1. Please specify clearly the terminal state space (ST ) and the non-terminal state space (SN ).

(b) Compute the optimal value function V ∗ and the optimal policy π∗. You can try either the value iteration method or the dynamic programming method. Please make sure to state explicitly the values of V ∗(s) and π∗(s) for all s ∈ SN , where SN refers to the non-terminal state space. Based on your results, state explicitly

the maximum expected total rewards you will get in this game when starting with $0. (If you use the value iteration method, please try different tolerance parameters ϵ to make sure your algorithm converges properly.)

(c) Please try the approach of linear programming (LP) to compute the optimal value function V ∗ and the optimal policy π∗. You should explicitly specify the following elements in the LP: variables, objective function, and constraints. Again, please state explicitly the values of V ∗(s) and π∗(s) for all s N . Based on your results, state explicitly the maximum expected total rewards you will get in this game when starting

with $0.

(Question 2) Consider a special case where N = 5, p = (1/2, 1/4, 1/8, 1/16, 1/16), q = (0, 1, 0, 1, 0),

f (X) = min(5, 2X), and K = 150. Answer the same questions (a), (b), and (c), as shown in Question 1.

(5/5)

Attachments:

Instructions Files

Expert's Answer

Buy Now $15 USD

826 Times Downloaded

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Les BarkerLaw

Zkim HistHistory

Jordan CarterEngineering

Jessica FullerComputer science

Others

Formulate the above game as a reinforcement learning system. Please specify the key components in the game

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Les BarkerLaw

Zkim HistHistory

Jordan CarterEngineering

Jessica FullerComputer science

Others

Formulate the above game as a reinforcement learning system. Please specify the key components in the game

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer