logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Colin JenkinssSocial sciences
(5/5)

822 Answers

Hire Me
expert
Riley ReynoldsEnglish
(5/5)

703 Answers

Hire Me
expert
Sinaa AntiqueNursing
(5/5)

955 Answers

Hire Me
expert
Alan DuderMarketing
(5/5)

756 Answers

Hire Me
Others
(5/5)

Simply providing the value without showing how it was calculated will not earn any credit

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Description and instructions

please follow these submission instructions carefully, so that I can focus on grading and providing helpful feedback. Not following these instructions will result in a -5 point penalty, and I may ask you to resubmit it in the correct format. Upload your submission to Moodle as a single PDF comprising photos of your (legibly!) hand- written work or a document that has been typeset in TEX. This PDF should be named <last_name>.pdf, replacing <last_name> with your last name(s).

Working in groups is allowed (and even encouraged), provided you do all of the following:

1. clearly include the names of the people you worked with (inside the write-up, not in the file name)

2. do all of the write-up yourself, in your own words

3. use the group for discussing ideas and not just sharing answers

For any questions, email me (Alex) at alexander.markham@univie.ac.at, or post on the Question discussion forum on Moodle.

Stochastic Gradient Descent

You might find this section easier if you wait until gradient descent is covered in lecture (still well before the due date of this assignment). Nevertheless, all the necessary information is here and you should be able to complete these tasks before the lecture.

vector of parameters (this replaces w and b from the notes ), returns a probability of the sample being from a certain class. So, you are given a matrix X ⊂ R m,n of samples (with m rows of samples and n columns of features) and a vector y ⊂ {0, 1}m of labels to train the model.

Then, do not forget to add a column of 1s to X for the bias term in θ so that the decision boundary is not constrained to pass through the originusing the trained model, you can classify a sample x(i) in class 1 if P(y = 1 | x(i)) = hθ(x(i)) ≥ 1

and class 0 otherwise. We define the loss function as m ll.r.(θ, X, y) = ∑ y(i) log hθ(x(i)) (1 y(i)) log 1 hθ(x(i)) ,i=1and we can thus compute the gradient (over the entire training set; but it could analogously be defined for any subset of samples) as follows(θ, X, y) =m∑i=1 hθ(x(i)) − y(i) x(i) T.

Recall the stochastic gradient descent algorithm:

Algorithm 1: Stochastic gradient descent

" x(1) #

" y(1) #

Data: Training samples X =x(m)and corresponding labels y =y(m)

Parameters: T, number of iterations; ηinit, initial learning rate; and k, batch size

1 initialize θˆ = 0;

2 for t in (1, ..., T) do

3 η ← η√init ;

4 randomly draw k samples from (X, y) to get Xk and yk;

5 θˆ ← θˆ − η • 1 • ∂ l(θˆ, Xk, yk);

6 end

k ∂θ

Result: θˆ that minimizes l

Tasks

For each of the following tasks, show your detailed calculations at every step. Simply providing the value without showing how it was calculated will not earn any credit. Round to at least three decimal places. You may use a scientific calculator, and you can represent the sigma notation summation as matrix multiplication to (slightly) condense your calculations, but you may not use more advanced calculators or features of e.g. Python or GNU Octave that automatically compute gradients or perform gradient descent.

1. Using an initial learning rate of ηinit = 0.1, perform T = 3 iterations of stochastic gradient descent. Assume we have a large data set X, from which we draw k = 3 samples:

x(1) = (1, 3); y(1) = 0

x(2) = (3, 1); y(2) = 1

x(3) = (1, 4); y(3) = 0

or equvalently written using matrix notation:

Xk = x(1) x(2) x(3) 1 3 = 3 1 1 4 and the corresponding labels y = y(1) 0 y(2) = 1 y(3) 0

Provide the learned values for θˆ and the final value for η. Note that you should use all of Xk for each iteration and that Xk X is the same for each iteration, even though in practice it would likely be a different subset each iteration, because it is randomly drawn from X. (tip: make sure you understand the dimensions of θ, X, Y, x(i), y(i), and how they are all related.)

2. What is the loss ll.r. of your learned θˆ (computed over the three samples) after each iteration?

3. Using the θˆ parameters learned in the first task, classify the following data point and state the probability that it belongs in that class (i.e., estimate y(4) and state the corresponding posterior):x(4) = (2, 2)

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme