Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Yvonne DuffNursing

(5/5)

671 Answers

Hire Me

Adarsh Vikram RaiComputer science

(/5)

928 Answers

Hire Me

Praveen MehraniyaComputer science

(/5)

647 Answers

Hire Me

Alejandro PerryyTechnical writing

(5/5)

616 Answers

Hire Me

Applied Statistics

(5/5)

A study is conducted to determine how the attention span of small children is affected by various factors.

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Instructions:

• Complete all questions in this homework assignment.

• You should spend about two hours a week for the next three weeks on this assignment (put aside at least 5 hours to complete it).

• You must upload your answers to Assignment 2 - 2021-05-11 for 2021-05-31 on the STTN316 eFundi Assignment page.

• You must submit your answers before 23h59 on the 31st of May, 2021.

• Please write your name, surname, and student number on every page that you upload.

• How to submit: Here are some options on how to submit this assignment (you’ll need to pick one):

– Type the answers out in Word using the equation editor (please note that this is very time consuming). Save the Word document and upload that Word document onto the STTN316 eFundi Assignment page. Use your student number as part of the file name.

– Write out the answers on a sheet of paper and scan the pages using an App like CAMSCANNER on your phone to create a single PDF file. Save the PDF document and upload that PDF document onto the STTN316 eFundi Assignment page. Use your student number as part of the file name.

– Write your answers on a sheet of paper, take photos of the assignment pages, and then upload the photos. Try to zip these photos together or create a combined PDF file from all the photos, and upload that single zip or PDF document onto the STTN316 eFundi Assignment page. If you cannot make the zip file or the PDF, then you can upload all the photos as separate files. Use your student number as part of the file names.

Question 1:

Consider the linear regression model

Y = Xβ + ε,

where Y is a (n × 1) vector of observations on a dependent variable; X is a (n × p) matrix of observations on non-stochastic independent variables; ε is a (n × 1) vector of unobserved disturbances with E(ε) = 0, Var(ε) = σ2I, and E(εJε) = σ2; and β is a (p × 1) vector of unknown coefficients.

1. Define the least squares residuals as e = Y − Xb, where b = (XJX)−1XY. Show that

XJe = 0. (1)

2. Show that eJY = 0. (1)

3. Show that e = (I − H)Y = (I − H)ε, where (I − H) = I − X(XJX)−1XJ. (1)

Question 2:

Assume we are working with a model Y = Xβ + ε, where

1 X11 • • • X1,p−1

1 X21 • • • X2,p−1

1 Xn1 • • • Xn,p−1

E(ε) = 0n and the covariance matrix of ε is Σ = σ2V, where

1 ρ • • • ρ

ρ ρ • • • 1

with 0 ≤ ρ ≤ 1. Recall that e = (I − H)Y and that we define SSE = eJe.

1. Show that Σ can be written in the form Σ = σ2[(1 − ρ)I + ρJ]. (1)

2. Show that E(MSE) = σ2(1 − ρ) ≤ σ2 (thus MSE underestimates σ2 in this scenario).(4)

HINT:

• You may use some of the results that are stated in Chapter 5 of the notes to answer this question; you do not have to prove results that are already stated in the notes.

• You may also use the following result: The sum of each of the rows of H is equal to 1. Also, since H is symmetric, we also have that the sum of each the columns of H is equal to 1.

Question 3:

Throughout this module, we have assumed that SSE/σ2 ∼ χ2 , however, we have never seen

a proof for it! We now look at how to tackle this proof using matrix notation. We start by proving two simple theorems:

Theorem 1: Let Y be an (n × 1) vector where Y ∼ Nn(0, I). Next, let A be a (n × n)

symmetric matrix such that we have the following two properties:

• A is idempotent, and

• A has rank r, i.e., rank(A) = r. We then have that YJAY ∼ χ2.

Note that the two properties listed for A imply that A permits the following eigen- decomposition

A = WΛWJ,

where W is the orthogonal matrix created from the eigen-vectors of A (these eigen-vectors are denoted w1, . . . , wn and are orthogonal to one another) and Λ is the (n × n) diagonal matrix formed from the eigen-values of A, i.e., Λ = Diag(λ1, λ2, . . . , λn). However, since rank(A) = r we have λj = 1, j = 1, 2, . . . , r and λj = 0, j = r + 1, . . . , n, which means that we have

Using the above information, answer the following questions to prove the result,

1. Suppose we define Z = WJY. Determine E(Z) and Var(Z). (2)

2. Let Zi denote the ith value in the vector Z. State the distribution of Zi and Z2. (2)

3. Use the above two results and show that YJAY ∼ χ2. (3)

An extension of the previous theorem is to change the variance/covariance structure of the random vector.

Theorem 2: We now define X as the (n × 1) vector where X ∼ Nn(0, Σ), where Σ is defined as a positive definite matrix and so a non-singular matrix S exists such that we are allowed to write Σ = SSJ. Next, let B be a (n × n) symmetric matrix such that we have the following two properties:

• BΣ is idempotent, and

• BΣ has rank r, i.e., rank(BΣ) = r. We then have that XJBX ∼ χ2.

Using the above information, answer the following questions to prove the result:

4. Show that if BΣ is idempotent and Σ is positive definite (and thus non-singular), then it implies that BΣB = B. (1)

5. Suppose we define Y = S−1X. Show that Y has a Nn(0, I) distribution as required for Theorem 1. (2)

6. Since Y satisfies the conditions of Theorem 1, show that XJBX can be written in the form YJAY (clearly define this new A matrix using the other quantities introduced above). Determine if this A matrix satisfies the two conditions stated in Theorem 1 for this type of matrix. (3)

HINT: You may use the following two results without proof:

• The trace of an idempotent matrix equals the rank of the matrix.

• If A is (m × n) and B is (n × m), then tr(AB) = tr(BA).

7. Use the above results and Theorem 1 to confirm that XJBX ∼ χ2. (1)

Now, assume we have the normal error regression model Y = Xβ+ε, where ε ∼ Nn(0, σ2I), where the sample residuals are defined as e = (I − H)Y and where the error sum of squares is given by SSE = eJe = YJ(I − H)Y.

8. Set r = e/σ so that rJr = eJe/σ2 = SSE/σ2. Use the results of Theorem 1 and 2 to show that rJr follows a χ2 distribution with n − p degrees of freedom (thereby proving the result we originally wanted to show). (2)

HINT: You may once again use the following result without proof: The trace of an idempotent matrix equals the rank of the matrix.

Question 4:

Suppose we conduct an experiment where we study the amount time taken (Y ) for a matrix multiplication operation to be completed when a large square matrix is multiplied by itself using specific computer code (run-time is measured in minutes). The variables that are used in the experiment as factors that potentially affect the run-time include the version of the compiler used to compile the code

X1 = Compiler version 0.1.0 Compiler version 1.2.0 ,

Compiler version 2.0.0

as well as the variable X2 which denotes the number of rows and columns of the square matrix used in the calculation (this variable can take on values anywhere between 1 000 and 1 000 0000).

NOTE: The same code is compiled on different versions of compilers and then the code is executed after it is compiled — it is thought that some versions of the compiler will perform better/worse for these kinds of matrix operations.

The experiment is conducted 9 times and the following results are obtained:

Interest lies in comparing the mean run-times of the different compilers to one another.

1. Describe one technique of dealing with the qualitative ‘compiler type’ variable. Carefully define any new variables created using this technique. (1)

2. Briefly explain why you chose this particular technique. (1)

3. You must now decide whether to use a model that incorporates interactions or not. Make a decision and fully motivate your answer. (1)

4. Provide an expression for the model you chose above. Write out the full form of the design matrix for the model you specified. (2)

5. Use your specified model and test the hypothesis that Compiler v0.1.0 produces shorter run times than Compiler v2.0.0 when calculating matrices of size 1.00 × 105. Carefully state the hypothesis used and test the hypothesis at α = 0.05. (3)

Question 5:

A study is conducted to determine how the attention span of small children is affected by various factors. The variables in the study are:

• Y : Attention span (in minutes),

• X1: Amount of sugary drinks consumed (in ml)

• X2: Age of child (in months).

The following multiple linear regression model is fit using data obtained from 20 children:

0.550983 −0.000797 −0.011573

Sorted Leverages:

(XJX)−1 = −0.000797 0.000003 0.000004

−0.011573 0.000004 0.000397

NOTE: For the questions below, accept that all model assumptions of the normal error multiple linear regression model are satisfied.

1. Using this information construct a 99% confidence interval for the expected response value (from a model based on these 2 predictor variables) for children that consume 100ml of sugary drinks and are 20 months old. (3)

2. Using this information construct a 99% prediction interval for a specific child that con- sumes 100ml of sugary drinks and is 20 months old. (1)

3. Interpret both intervals and, in particular, explain how the interpretation differs between these two intervals. In which circumstances would you recommend calculating the one over the other? (2)

4. Is the prediction interval calculated above a form of ‘extrapolation’ or ‘interpolation’?

Calculate a value that will help motivate your answer. (1)

5. Test the hypothesis that the age of the child is a significant predictor in the model. State the hypothesis, calculate the test statistic, determine the critical value, and state your conclusion. (2)

Question 6:

Suppose that you have to find the best liner regression model for a response variable Y using four predictor variables X1, X2, X3 and X4. Using an “all possible sub-sets” approach, you calculate the Mallow’s CP ,R2 and R2 measures for all possible combinations of models that you could build using these three predictor variables and you tabulate the results below

1. Use the above output and state which model you would consider using. Motivate your answer. (2)

2. What happens to the R2 measure as the number of variables added to the model becomes very large, i.e., p → ∞. Does this result change when n → 1 or n → ∞? Motivate your answer. (1)

3. For fixed n, p and SSTO values, indicate what the highest and lowest values R2 can assume? (2)

Question 7:

A research psychologist approaches you a research proposal. He wants to publish an article that details the effect that gender, weight, and income levels play in determining ‘Life Satisfaction’ (LS) of university students that graduated from South African universities in 2020. The LS variable is measured by having the participants take a questionnaire and produces an LS score between 0 and 100 (0 being the worst life satisfaction score and 100 being the best). The researcher wants to determine the relationship between the various levels of these factors and the expected value of the LS score for a population under study.

Write a short 100 to 300 word proposal that details your approach to addressing this re- search. Explain all necessary steps required to collect the necessary data, arrive at an appro- priate model, and ensure that the model is adequate. In addition to taking the researcher’s problem statement into account, your proposal must also keep in mind real-world limitations of such studies.

(5/5)

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Yvonne DuffNursing

Adarsh Vikram RaiComputer science

Praveen MehraniyaComputer science

Alejandro PerryyTechnical writing

Applied Statistics

A study is conducted to determine how the attention span of small children is affected by various factors.

ANSWER ALL QUESTIONS

Instructions:

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Yvonne DuffNursing

Adarsh Vikram RaiComputer science

Praveen MehraniyaComputer science

Alejandro PerryyTechnical writing

Applied Statistics

A study is conducted to determine how the attention span of small children is affected by various factors.

ANSWER ALL QUESTIONS

Instructions:

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer