INSTRUCTIONS TO CANDIDATES

Covariance, Correlation and Linear Regression

1, For the below datasets, perform the following steps. Use the function provided by the instructor for COV, COR, OLS. Draw a scatter plot for this data

1.     Calculate the covariance and correlation
2.   Compute Estimate the regression line by calculating the slope and intercept using the OLS function.
3.     Draw a scatter plot and the best-fitted line in one plot.
4.   Calculate the đť‘…2
5.     Interpret the fitted slope, is the intercept meaningful? Explain.

Dataset 1: Study Time and Exam Scores (n = 10 students) [ScoreData.txt]

 Student Study Hours Exam Score Tom 1 53 Mary 5 74 Sarah 7 59 Oscar 8 43 Cullyn 10 56 Jaime 11 84 Theresa 14 96 Knut 15 69 Jin-Mae 15 84 Courtney 19 83

Dataset 2: Portfolio Returns on Selected Mutual Funds (n = 17 funds) [LastThisYear.txt]

 Last Year (X) This Year (Y) 11.9 15.4 19.5 26.7 11.2 18.2 14.1 16.7 14.2 13.2 5.2 16.4 20.7 21.1 11.3 12.0 -1.1 12.1 3.9 7.4 12.9 11.5 12.4 23.0 12.5 12.7 2.7 15.1 8.8 18.7 7.2 9.9 5.9 18.9

Dataset 3: Number of Orders and Shipping Cost (n = 12 months) [OrderShipCost.txt]

 Orders (X) Ship Cost (Y) 1,068 4,489 1,026 5,611 767 3,290 885 4,113 1,156 4,883 1,146 5,425 892 4,414 938 5,506 769 3,346 677 3,673 1,174 6,542 1,009 5,088
1.   Perform multiple regression analysis on the SalesAdvertising data set.

The  SalesAdvertising dataset contains statistics about the sales of a product in 200 different markets, together with advertising budgets in each of these markets for different media channels: TV, radio and newspaper. The sales are in thousands of units and the budget is in thousands of dollars.

Your task is to explore the dataset and build a multi-regression model that would take TV, radio and newspaper as predictors and Sales as the target variable.

1.   Load the data into NumPy
2.   Summarize Statistics of the dataset (mean, stdev, max,min for each column).
3.   Visualize the data using correlation matrix & scatter plots
4.   Split the data into Training and Test dataset.
5.   Fit the multiple linear regression to the training dataset
6.   Estimate the regression model and explain the regression equation.
7.   Predict the Test dataset
8.   Visualize the residuals
9.   Calculate Regression Error Metrics (Accuracy, MSE and RMSE).

