logo Hurry, Grab up to 30% discount on the entire course
Order Now logo
527 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Joseph MburuStatistics
(/5)

791 Answers

Hire Me
expert
Thomas BornholdttEnglish
(5/5)

747 Answers

Hire Me
expert
SrilekhaStatistics
(/5)

632 Answers

Hire Me
expert
Arjit SinghEngineering
(5/5)

848 Answers

Hire Me
R Programming
(5/5)

Briefly describe what is meant by the “bias-variance trade-off” and give two examples

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Question 1: (9 Total Marks)

Provide a Short Answer (3 marks each)

a) Briefly describe what is meant by the “bias-variance trade-off” and give two examples of how it operates in modelling techniques we have studied this term.

b) When employing a Poisson regression model to assess contingency table data, how do we test whether the two discrete variables involved are independent or not?

c) Linear discriminant analysis (LDA) finds class boundaries in the predictor space that are linear in the predictors to determine which category a data point belongs in. Quadratic discriminant analysis (QDA) does essentially the same thing, except the boundaries are quadratic. What is the key difference in the model assumptions for QDA that creates these quadratic boundaries?

Question 2: (18 Total Marks)

The file baby.dat contains data on 247 premature births. For each, the birthweight (in grams), the gestational age (in weeks), the one and five minute Apgar scores (on a scale of 1 to 9) and the pH level of the venous blood are recorded. In addition, an indicator of whether the baby survived or not is in the first column, a value of 1 indicating survival and 0 indicating death.

a) A pregnant woman is experiencing complications and her doctors are considering inducing labour at either 30 or 31 weeks. Assuming the other predictors would be the same regardless of when the induction is performed, the doctors want an estimate of how much the extra week of gestational age would change the baby’s survival chances. Build an appropriate model and give an estimate (and confidence interval) as requested. [9 marks]

b) At the time of the one minute Apgar score calculation, the only variables available are the one minute Apgar value itself, the birthweight and the gestational age. Use these variables to develop a model to predict the five minute Apgar score. [9 marks]

 

Question 3: (18 Total Marks)

The file crckt.dat contains data for ball-by-ball outcomes of 217 men’s international 50-over matches, 151 men’s international 20-over matches, 72 women’s international 50-over matches and 70 women’s international 20-over matches played during 2015 and 2016. Each row of the data corresponds to a single ball of a match. For each ball, the information recorded is:

• BallsRem – the number of balls still remaining in the match, including the current one (e.g., for the first ball of a 50-over match, this value would be 300, as there are 6 balls/over).

• Runs – the total number of runs scored on the ball (including extras for illegal deliveries).

• Wckts – the number of wickets down at the time of the ball.

• WLastBall – an indicator of whether a wicket fell on the previous ball (1 = Yes, 0 = No).

• Year – the calendar year of the match.

• GameType – an indicator of the type of match (1 = 50-over, 2 = 20-over)

• Gender – an indicator of the gender for the match (1 = men, 2 = women)

a) There is debate about whether scoring rates for 20-over matches are the same as those in the final 20-overs of a 50-over match (i.e., balls 120 down to 1). Use an appropriate technique to model the relationship between expected runs scored and balls remaining which allows comparison across match types and appropriately adjusts for any other important factors. Further, produce a plot (or small collection of plots) to illustrate the similarity or dissimilarity in the expected runs scored versus balls remaining relationship across the two match types. [9 marks]

b) When a wicket falls, a new batsman must start their innings. It is often claimed this is the most difficult time for a batter to score. Investigate whether there is a difference in the runs scored at any given stage of a match depending on whether a wicket has fallen on the previous delivery or not. [9 marks]

(5/5)
Attachments:

Expert's Answer

527 Times Downloaded

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme