R Programming

Describe the variable in a few words. briefly explain your a priori belief on the impact that V may have on slpnaps


General guidelines

The report should be made up of two parts. Make sure that both files contain your full name along with your student number.

1. Main body: It will contain all your analyses, numerical results (tables. figures,...), comments, and answers to the questions below. Every statistical procedure (regression, test. ...) should be reported with aU the relevant numerical values (coefficients. statistics, p-values. ...). forever, all writing should be in complete sentences. The main body should not be hand-written, and it will be a PDF file.

2. R. script: Your R script will contain all the instructions that are necessary to run the regressions, tests. ... reported in the main body. Ir should be free from errors and, as much as possible, linear(i.e. in the same order as the questions below). It should be a standard R file.

If you have any the question, or if you find typos and or mistakes in the questions, do not hesitate to send an e-mail the full report should be sent Late submissions will not be graded.


In this study, we focus on the SleepT5 dataset which can be found in the Wooldridge package iii R.

If this is not already the case, you can install the Wooldridge package by typing install .package(”Wooldridge”)Then you can open and attach the Sleep75 dataset by executing the following block of instructions:

library (Wooldridge)

attach( sleep75)

We are interested in relating the variable snaps. corresponding to the weekly amount of time(in minutes) an individual sleep (including naps), to the following G variables: work. Educ, age.

male. Ingrid. and south. The definition of each variable can be found by executing the instruction?sleep75. The goal of this study is to understand what are the important factors that determine the

variable snaps.


1. For each V  {totwrk. Educ. age. male. Ingrid, South}

. Describe the variable in a few words. Give (and briefly explain) your a priori belief on the

impact that V may have on slpnaps (there are no wrong answers!).

. Run the simple linear regression:

slpnaps = o + ß1V1 + U1.

. Report the LSE, the related 95% confidence intervals, R2, the t-statistics, and the p-values for both coefficients.

. Report the scatter plot (V, slpnaps) along with the best-fit regression line.

. Analyse the significance and the sign of the relationship between slpnaps and each explanatory variable according to the simple regression. Compare this with your belief fromthe first bullet point.

. Explain what is the average impact (in minutes) of an increase of one unit in the explanatory variable V on slpnaps. (You will need to analyze the impacts of the variables male, southand yngkid differently, since they are categorical).

2. Rank the above regressions according to their R2, and find the best explanatory variable according to the simple regressions.

3. We now turn to the multiple linear regression.

(a) We first check that the numerical variables are not too close to collinearity. Calculate andreport all the correlations between any pair of variables among { totwrk. educ. age. male.yngkid. south). What are the two most correlated variables? Usually, we consider thatif at least one of the correlations exceeds 0.8 in absolute value, there is a risk of multicollinearity. Does any correlation exceed 0.8 in absolute value?

Hint: You can type

v<—c( ‘totwrk ‘, ‘educ ‘ , ‘age’ . ‘male’ , ‘yngkid ‘ , ‘south’)

cor(sleeplvi) to calculate all the correlations at once.

(b) Fit the multiple linear model slpnaps, = fl + /31totwrk + S2educ1 + 33age1 + ß4male (1)+ yngkid, + 86south1 + U1


