# Produce the summary statistics (the Five-number Summary, Mean, and SD) for x and y. [3 marks]

Data was collected from 63 geographical zones in Sweden on auto insurance claims. In particular we want to predict Y = the total auto insurance payout (in thousands of Swedish Krona (kr)) using X = no. of claims.

1. Produce the summary statistics (the Five-number Summary, Mean, and SD) for x and y. [3 marks]

2. Produce the Scatter Plot of X and Y, along with the Correlation Coefficient. [2 marks]

3. Based on the data can we conclude that X and Y are linearly related? Support your answer by referring to the appropriate outputs. [3 marks]

4. Note the two observation on the upper right hand corner. Are these observations outliers? Briefly explain your answer. [2 marks]

5. Fit the Simple Regression model of Y on X. Report the Least Square Line equation, and the RMSE. R calls it “residual standard error.” [2 marks]

6. One of the observations, the one on row 43 in the Excel file, has x = 60, and y = 202.4. Calculate the residual for this observation and interpret what the number means. [4 marks]

7. Predict the auto insurance payout in a zone with 50 claims. Use an interval using a calculation that would be correct 95% of the time. [2 marks]

8. Produce the Residual Plot and the Density Trace of the Residuals [2 marks]

9. Explain what the Normality assumption means in the context of this problem. And assess whether this assumption seems reasonable in this problem. [4 marks]

