Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Charles MorrisEnglish

(5/5)

915 Answers

Hire Me

Vishal GoyalComputer science

(5/5)

893 Answers

Hire Me

Connorr EvansEngineering

(5/5)

710 Answers

Hire Me

Tanushri MehtaGeneral article writing

(5/5)

694 Answers

Hire Me

Others

(5/5)

Simply providing the value without showing how it was calculated will not earn any credit

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Description and instructions

please follow these submission instructions carefully, so that I can focus on grading and providing helpful feedback. Not following these instructions will result in a -5 point penalty, and I may ask you to resubmit it in the correct format. Upload your submission to Moodle as a single PDF comprising photos of your (legibly!) hand- written work or a document that has been typeset in TEX. This PDF should be named <last_name>.pdf, replacing <last_name> with your last name(s).

Working in groups is allowed (and even encouraged), provided you do all of the following:

1. clearly include the names of the people you worked with (inside the write-up, not in the file name)

2. do all of the write-up yourself, in your own words

3. use the group for discussing ideas and not just sharing answers

For any questions, email me (Alex) at alexander.markham@univie.ac.at, or post on the Question discussion forum on Moodle.

Stochastic Gradient Descent

You might find this section easier if you wait until gradient descent is covered in lecture (still well before the due date of this assignment). Nevertheless, all the necessary information is here and you should be able to complete these tasks before the lecture.

vector of parameters (this replaces w and b from the notes ), returns a probability of the sample being from a certain class. So, you are given a matrix X ⊂ R m,n of samples (with m rows of samples and n columns of features) and a vector y ⊂ {0, 1}m of labels to train the model.

Then, do not forget to add a column of 1s to X for the bias term in θ so that the decision boundary is not constrained to pass through the originusing the trained model, you can classify a sample x(i) in class 1 if P(y = 1 | x(i)) = hθ(x(i)) ≥ 1

and class 0 otherwise. We define the loss function as m ll.r.(θ, X, y) = ∑ y(i) log hθ(x(i)) (1 y(i)) log 1 hθ(x(i)) ,i=1and we can thus compute the gradient (over the entire training set; but it could analogously be defined for any subset of samples) as follows(θ, X, y) =m∑i=1 hθ(x(i)) − y(i) x(i) T.

Recall the stochastic gradient descent algorithm:

Algorithm 1: Stochastic gradient descent

" x(1) #

" y(1) #

Data: Training samples X =x(m)and corresponding labels y =y(m)

Parameters: T, number of iterations; ηinit, initial learning rate; and k, batch size

1 initialize θˆ = 0;

2 for t in (1, ..., T) do

3 η ← η√init ;

4 randomly draw k samples from (X, y) to get Xk and yk;

5 θˆ ← θˆ − η • 1 • ∂ l(θˆ, Xk, yk);

6 end

k ∂θ

Result: θˆ that minimizes l

Tasks

For each of the following tasks, show your detailed calculations at every step. Simply providing the value without showing how it was calculated will not earn any credit. Round to at least three decimal places. You may use a scientific calculator, and you can represent the sigma notation summation as matrix multiplication to (slightly) condense your calculations, but you may not use more advanced calculators or features of e.g. Python or GNU Octave that automatically compute gradients or perform gradient descent.

1. Using an initial learning rate of ηinit = 0.1, perform T = 3 iterations of stochastic gradient descent. Assume we have a large data set X, from which we draw k = 3 samples:

x(1) = (1, 3); y(1) = 0

x(2) = (3, 1); y(2) = 1

x(3) = (1, 4); y(3) = 0

or equvalently written using matrix notation:

Xk = x(1) x(2) x(3) 1 3 = 3 1 1 4 and the corresponding labels y = y(1) 0 y(2) = 1 y(3) 0

Provide the learned values for θˆ and the final value for η. Note that you should use all of Xk for each iteration and that Xk X is the same for each iteration, even though in practice it would likely be a different subset each iteration, because it is randomly drawn from X. (tip: make sure you understand the dimensions of θ, X, Y, x(i), y(i), and how they are all related.)

2. What is the loss ll.r. of your learned θˆ (computed over the three samples) after each iteration?

3. Using the θˆ parameters learned in the first task, classify the following data point and state the probability that it belongs in that class (i.e., estimate y(4) and state the corresponding posterior):x(4) = (2, 2)

(5/5)

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Charles MorrisEnglish

Vishal GoyalComputer science

Connorr EvansEngineering

Tanushri MehtaGeneral article writing

Others

Simply providing the value without showing how it was calculated will not earn any credit

ANSWER ALL QUESTIONS

Description and instructions

Stochastic Gradient Descent

Recall the stochastic gradient descent algorithm:

Algorithm 1: Stochastic gradient descent

Tasks

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Charles MorrisEnglish

Vishal GoyalComputer science

Connorr EvansEngineering

Tanushri MehtaGeneral article writing

Others

Simply providing the value without showing how it was calculated will not earn any credit

ANSWER ALL QUESTIONS

Description and instructions

Stochastic Gradient Descent

Recall the stochastic gradient descent algorithm:

Algorithm 1: Stochastic gradient descent

Tasks

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer