Programming Assignment 1¶

Frequentist Statistics¶

Group Members: Rene Padilla and Shanmukh Sain Karri¶


Prompt:¶

According to the Bureau of Crime Statistics and Research of Australia, the mean length of imprisonment for motor-vehicle-theft offenders in Australia is 16.7 months. A group of researchers would like to perform a hypothesis test to decide whether the mean length of imprisonment for motor-vehicle-theft offenders in Sydney differs from the national mean in Australia. They have found out that Sydney population standard deviation is 6.0. They have also decided to choose a random sample of size 100 and perform the test at the significance level of 0.05. Suppose that, in reality, the mean length of imprisonment in Sydney is 15.5 months.


In [1]:
# Importing Libraries
library(stats)

# Defining Given Values
n = 100
alpha = 0.05
mu = 16.7
sd = 6.0
sigma = sd/sqrt(n)
true_mean = 15.5

a. State the null and alternative hypotheses. (3 points)¶

The Nulll Hypothesis can be stated as: The mean length of imprisonment for motor-vehicle-theft offenders in Australia is 16.7 months.

  • $ H_0: \mu = 16.7 months. $

The alternative Hypothesis can be stated as: The mean length of imprisonment for motor-vehicle-theft offenders in Sydney differs from the national mean in Australia.

  • $H_a: \mu\neq 16.7 months $

b. Determine the probability of a Type I error. (2 points)¶

The probability of a Type I error is denoted by $ \alpha $, where $ \alpha $ is also known as the significance level of the hypothesis test. So here the probability of a Type I Error is: 0.05 or 5%.

This represents the probability of rejecting the null hypothessis when it is in fact TRUE.


c. Determine the probability of a Type II error. (5 points)¶

The probability of a Type II error is denoted by $ \beta $. This represents the probability of NOT rejecting the null hypothesis when it is in fact false. To solve for $ \beta$, the critical values must be found.

Note: We have a 2-sided hyptohesis => 2 Critical Values.

In [2]:
# Solving For Critical Value 1
# folliwng slide 48
x_left = qnorm(alpha/2, mean = mu, sd = sigma, lower.tail = T)
x_left
15.524021609276
In [3]:
# Solving For Critical Value 2
# folliwng slide 48
x_right = qnorm(alpha/2, mean = mu, sd = sigma, lower.tail = F)
x_right
17.875978390724
In [4]:
# solving for Beta
# slide 49
curve1 = pnorm(x_right, mean = true_mean, sd = sigma, lower.tail = T)
curve2 = pnorm(x_left, mean = true_mean, sd = sigma, lower.tail = T)

beta = curve1 - curve2
beta
0.483994726023825

From above, we see that our probability of a Type II Error, $\beta$ is approx. 0.48 or 48%.


d. Simulate 1,000 samples, each of size 100. (5 points)¶

Here we first created a normalized standard deviation of 100 datapoints with a mean of 15.5 and a standard deviation of 6.0 using rnorm. This is fed into the replicate function where we then create 1000 samples of the output of the rnorm function.
link to walkthrough we followed

In [5]:
set.seed(111)
# replicate function "repeatedly evaluate some expression in R a certain number of times."
samples = replicate(1000, rnorm(n, true_mean, sd))
head(samples,2)
16.91132 19.09771813.70780 16.35879 13.17302223.87493 5.02676812.87050813.88540 13.74923 ... 4.075963 22.7285 13.12917 15.33994 11.75197 7.82752415.71353 16.97241 17.33399 23.80199
13.51558 8.53802311.93879 17.68144 6.99276810.14808 20.238205 6.23792131.25065 15.34757 ... 9.393428 15.7630 26.93070 14.61639 11.88793 20.29630911.09280 16.15424 10.96369 20.24181

e. Determine the mean of each sample in part (d). (5 points)¶

Here we used the apply function to get the mean of every column/sample.
link

In [6]:
# margin = 2 --> applies to columns 
sample_means = apply(samples, 2, mean)
# get into matrix
sample_means = as.matrix(sample_means)
head(sample_means, 5)
15.42309
15.39880
16.31830
15.63875
14.33606

f. For the 1,000 samples obtained in part (d), about how many would you expect to lead to nonrejection of the null hypothesis? Explain your answer. (5 points)¶

A non-rejection of the null hypothesis is equal to the probability of a Type II Error. Here our $\beta$ is 0.48 or 48%. So then, 48% of 1,000 is 480. We expect 480 samples to lead to nonrejection of the null hypothesis.


g. For the 1,000 samples obtained in part (d), determine the number that lead to nonrejection of the null hypothesis. (5 points)¶

Here we need to find the samples that have a Sample Mean that is between our to critical values. I.e: $$ 15.52 < \bar{x} < 17.88 $$ To do so, we will first filter out the values that are above 15.52, and then filter those values again to get the values below 17.88.

In [7]:
above_xleft = sample_means[sample_means > x_left]
In [8]:
between = above_xleft[above_xleft < x_right]
length(between)
486

From above, we see that we have 486 samples that lead to nonrejection of the null hypothesis.


h. Compare your answers from parts (f) and (g), and comment on any observed difference. (5 points)¶

Here our actual value was higher than our expected value by 6 samples. This can be due to some variations in calculations, randomness of data, etc.


i. Plot the power curve for the range of true μ between 14 to 19. Interpret your plot. (5 points)¶

Power of a test can be defined mathematically as: $$1 - \beta $$ To do so, we will re-use our code from Part C. Since we know our critical values x_left, and x_right, will not change, we will only need to solve for our new values of $\beta$ as it is dependent on the true mean of the data.

In [9]:
true_mean = seq(14,19, by = 0.001)

curve1 = pnorm(x_right, mean = true_mean, sd = sigma, lower.tail = T)
curve2 = pnorm(x_left, mean = true_mean, sd = sigma, lower.tail = T)

beta = curve1 - curve2
power = 1 - beta
In [12]:
plot(true_mean, power,'l', xlab = expression(mu), 
     ylab = 'Power', main = 'Power Curve')
In [ ]: