In class, we derived linear regression and various learning algorithms based on gradient descent. In addition to the least square objective, we also learned its probabilistic perspective where each observation is assumed to have a Gaussian noise. (i.e., Noise of each example is an independent and identically distributed sample from a normal distribution) In this problem, you are supposed to deal with the following regression model that includes two linear features and one quadratic feature.y = θ0 + θ1x1 + θ2x2 + θ3x2 + ϵ where ϵ ∼ N (0, σ2)
Your goal is to develop a gradient descent learning algorithm that will estimate the best pa- rameters θ = {θ0, θ1, θ2, θ3}.
(a) Given the definition of noise, derive the corresponding mean and variance parameters of the normal distribution for y|x1, x2; θ. Write also down its probability density function.
(b) You are provided with a training observations D = {(x(i), x(i), y(i))|1 ≤ i ≤ m}. Derive
the conditional log-likelihood that will be later maximized to make D most likely.
(c) If you omit all the constant that does not relate to our parameters θ, what will be the objective function J(θ0, θ1, θ2, θ3) that you are going to perform Maximum Likelihood Estimation? Does J look similar to the Least Square objective for this problem?
(d) Compute the gradient of J(θ) with respect to each parameter. (Hint: You should evaluate the partial derivatives of J(θ) with respect to each θj for 0 ≤ j ≤ 3)
(e) [Coding] Develop two learning algorithms: batch and stochastic gradient descent for this problem on the Auto dataset given in Problem 4 in Homework 1. Try to find the two best input features for predicting the output mpg. Compare and report the difference between your full coding and R’s built-in function call: lm. (Hint: At least your stochastic gradient algorithm must learn parameters θ comparable to the result from calling the R’s built-in function. Otherwise try to tune the learning rate α < 1.0)
Recall the grading problem to predict pass/fail in the class. Suppose you collect data for a group of students in the class that consist of two input features X1 = hours studied and X2 = undergrad GPA. Your goal is to predict the output Y ∈ {pass, fail }. Suppose that you fit a logistic regression, learning its parameter (θ0, θ1, θ2) = (−6, 0.05, 1).
(a) What will be the probability for a student who studies for 40 hours and has a GPA of
3.5 to pass the class?
(b) How many hours would the student in part (a) needs to study in order to have at least 50% chance of passing the class?
The following questions must be answered using Weekly dataset in ISLR package. It contains 1,089 weekly returns for 21 years from the beginning of 1990 to the end of 2010. You will use its 1990-2008 as a training data and 2009-2010 as a test data.
(c) [Coding] Given the training data, you are to perform a logistic regression where the input features are five of Lag variables and Volume, and the binary output is Direction. Use the summary function to print our the results. Report the confusion matrix and the accuracy on both training and test data given the learned model. (Hint: As nothing is specified, your default threshold to decide Up/Down must be 0.5)
(d) [Coding] Now you will run logistic regression five times with only one input features Lagj(1 ≤ j ≤ 5) for each time. Compute the confusion matrix and the accuracy on both training and test data given each of the learned models. Which are the best models among the five models here and the earlier model in part (c) in terms of the accuracy and F-score, respectively? Does the best model also achieve the best accuracy or F-score
on the training data? (Hint: The best model must be chosen based on the test data, not the training data!)
(e) [Coding] Try to draw six ROC curves for 6 models from the part (c) and (d) with varying thresholds. Determine the best model in terms of the Area Under Curve (AUC). (Note: 6 curves must be plotted at the same time as a single graph. You can use R’s built-in function to compute the AUC)
(f) [Coding] Try to draw six Precision-Recall curves for 6 models from the part (c) and (d) with varying thresholds. Determine the best model in terms of the Area Under Curve (AUC). (Note: 6 curves must be plotted at the same time as a single graph. You can use R’s builtin function to compute the AUC)
(g) Report your observation about different evaluation measures, and pick the one that you think the most appropriate for this problem. Explain why.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme