Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Gabriel GaboardiLaw

(5/5)

773 Answers

Hire Me

Nitesh BhardwajEconomics

(5/5)

596 Answers

Hire Me

Joan DomettEngineering

(4/5)

593 Answers

Hire Me

Martin ClarkMarketing

(5/5)

603 Answers

Hire Me

R Programming

(5/5)

Given the definition of noise, derive the corresponding mean and variance parameters of the normal distribution for y|x1,x2; 0.Write also down its probability density function.

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Problem 1: Linear Regression and Gradient Learning [30 points]

In class, we derived linear regression and various learning algorithms based on gradient descent. In addition to the least square objective, we also learned its probabilistic perspective where each observation is assumed to have a Gaussian noise. (i.e., Noise of each example is an independent and identically distributed sample from a normal distribution) In this problem, you are supposed to deal with the following regression model that includes two linear features and one quadratic feature.y = θ0 + θ1x1 + θ2x2 + θ3x2 + ϵ where ϵ ∼ N (0, σ2)

Your goal is to develop a gradient descent learning algorithm that will estimate the best pa- rameters θ = {θ0, θ1, θ2, θ3}.

(a) Given the definition of noise, derive the corresponding mean and variance parameters of the normal distribution for y|x1, x2; θ. Write also down its probability density function.

(b) You are provided with a training observations D = {(x(i), x(i), y(i))|1 ≤ i ≤ m}. Derive

the conditional log-likelihood that will be later maximized to make D most likely.

(c) If you omit all the constant that does not relate to our parameters θ, what will be the objective function J(θ0, θ1, θ2, θ3) that you are going to perform Maximum Likelihood Estimation? Does J look similar to the Least Square objective for this problem?

(d) Compute the gradient of J(θ) with respect to each parameter. (Hint: You should evaluate the partial derivatives of J(θ) with respect to each θj for 0 ≤ j ≤ 3)

(e) [Coding] Develop two learning algorithms: batch and stochastic gradient descent for this problem on the Auto dataset given in Problem 4 in Homework 1. Try to find the two best input features for predicting the output mpg. Compare and report the difference between your full coding and R’s built-in function call: lm. (Hint: At least your stochastic gradient algorithm must learn parameters θ comparable to the result from calling the R’s built-in function. Otherwise try to tune the learning rate α < 1.0)

Problem 2: Logistic Regression and Evaluations [30 points]

Recall the grading problem to predict pass/fail in the class. Suppose you collect data for a group of students in the class that consist of two input features X1 = hours studied and X2 = undergrad GPA. Your goal is to predict the output Y ∈ {pass, fail }. Suppose that you fit a logistic regression, learning its parameter (θ0, θ1, θ2) = (−6, 0.05, 1).

(a) What will be the probability for a student who studies for 40 hours and has a GPA of

3.5 to pass the class?

(b) How many hours would the student in part (a) needs to study in order to have at least 50% chance of passing the class?

The following questions must be answered using Weekly dataset in ISLR package. It contains 1,089 weekly returns for 21 years from the beginning of 1990 to the end of 2010. You will use its 1990-2008 as a training data and 2009-2010 as a test data.

(c) [Coding] Given the training data, you are to perform a logistic regression where the input features are five of Lag variables and Volume, and the binary output is Direction. Use the summary function to print our the results. Report the confusion matrix and the accuracy on both training and test data given the learned model. (Hint: As nothing is specified, your default threshold to decide Up/Down must be 0.5)

(d) [Coding] Now you will run logistic regression five times with only one input features Lagj(1 ≤ j ≤ 5) for each time. Compute the confusion matrix and the accuracy on both training and test data given each of the learned models. Which are the best models among the five models here and the earlier model in part (c) in terms of the accuracy and F-score, respectively? Does the best model also achieve the best accuracy or F-score

on the training data? (Hint: The best model must be chosen based on the test data, not the training data!)

(e) [Coding] Try to draw six ROC curves for 6 models from the part (c) and (d) with varying thresholds. Determine the best model in terms of the Area Under Curve (AUC). (Note: 6 curves must be plotted at the same time as a single graph. You can use R’s built-in function to compute the AUC)

(f) [Coding] Try to draw six Precision-Recall curves for 6 models from the part (c) and (d) with varying thresholds. Determine the best model in terms of the Area Under Curve (AUC). (Note: 6 curves must be plotted at the same time as a single graph. You can use R’s builtin function to compute the AUC)

(g) Report your observation about different evaluation measures, and pick the one that you think the most appropriate for this problem. Explain why.

(5/5)

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Gabriel GaboardiLaw

Nitesh BhardwajEconomics

Joan DomettEngineering

Martin ClarkMarketing

R Programming

Given the definition of noise, derive the corresponding mean and variance parameters of the normal distribution for y|x1,x2; 0.Write also down its probability density function.

ANSWER ALL QUESTIONS

Problem 1: Linear Regression and Gradient Learning [30 points]

Problem 2: Logistic Regression and Evaluations [30 points]

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Gabriel GaboardiLaw

Nitesh BhardwajEconomics

Joan DomettEngineering

Martin ClarkMarketing

R Programming

Given the definition of noise, derive the corresponding mean and variance parameters of the normal distribution for y|x1,x2; 0.Write also down its probability density function.

ANSWER ALL QUESTIONS

Problem 1: Linear Regression and Gradient Learning [30 points]

Problem 2: Logistic Regression and Evaluations [30 points]

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer