According to the Bureau of Crime Statistics and Research of Australia, the mean length of imprisonment for motor-vehicle-theft offenders in Australia is 16.7 months. A group of researchers would like to perform a hypothesis test to decide whether the mean length of imprisonment for motor-vehicle-theft offenders in Sydney differs from the national mean in Australia. They have found out that Sydney population standard deviation is 6.0. They have also decided to choose a random sample of size 100 and perform the test at the significance level of 0.05. Suppose that, in reality, the mean length of imprisonment in Sydney is 15.5 months.
# Importing Libraries
library(stats)
# Defining Given Values
n = 100
alpha = 0.05
mu = 16.7
sd = 6.0
sigma = sd/sqrt(n)
true_mean = 15.5
The Nulll Hypothesis can be stated as: The mean length of imprisonment for motor-vehicle-theft offenders in Australia is 16.7 months.
The alternative Hypothesis can be stated as: The mean length of imprisonment for motor-vehicle-theft offenders in Sydney differs from the national mean in Australia.
The probability of a Type I error is denoted by $ \alpha $, where $ \alpha $ is also known as the significance level of the hypothesis test. So here the probability of a Type I Error is: 0.05 or 5%.
This represents the probability of rejecting the null hypothessis when it is in fact TRUE.
The probability of a Type II error is denoted by $ \beta $. This represents the probability of NOT rejecting the null hypothesis when it is in fact false. To solve for $ \beta$, the critical values must be found.
Note: We have a 2-sided hyptohesis => 2 Critical Values.
# Solving For Critical Value 1
# folliwng slide 48
x_left = qnorm(alpha/2, mean = mu, sd = sigma, lower.tail = T)
x_left
# Solving For Critical Value 2
# folliwng slide 48
x_right = qnorm(alpha/2, mean = mu, sd = sigma, lower.tail = F)
x_right
# solving for Beta
# slide 49
curve1 = pnorm(x_right, mean = true_mean, sd = sigma, lower.tail = T)
curve2 = pnorm(x_left, mean = true_mean, sd = sigma, lower.tail = T)
beta = curve1 - curve2
beta
From above, we see that our probability of a Type II Error, $\beta$ is approx. 0.48 or 48%.
Here we first created a normalized standard deviation of 100 datapoints with a mean of 15.5 and a standard deviation of 6.0 using rnorm
. This is fed into the replicate function where we then create 1000 samples of the output of the rnorm
function.
link to walkthrough we followed
set.seed(111)
# replicate function "repeatedly evaluate some expression in R a certain number of times."
samples = replicate(1000, rnorm(n, true_mean, sd))
head(samples,2)
16.91132 | 19.097718 | 13.70780 | 16.35879 | 13.173022 | 23.87493 | 5.026768 | 12.870508 | 13.88540 | 13.74923 | ... | 4.075963 | 22.7285 | 13.12917 | 15.33994 | 11.75197 | 7.827524 | 15.71353 | 16.97241 | 17.33399 | 23.80199 |
13.51558 | 8.538023 | 11.93879 | 17.68144 | 6.992768 | 10.14808 | 20.238205 | 6.237921 | 31.25065 | 15.34757 | ... | 9.393428 | 15.7630 | 26.93070 | 14.61639 | 11.88793 | 20.296309 | 11.09280 | 16.15424 | 10.96369 | 20.24181 |
# margin = 2 --> applies to columns
sample_means = apply(samples, 2, mean)
# get into matrix
sample_means = as.matrix(sample_means)
head(sample_means, 5)
15.42309 |
15.39880 |
16.31830 |
15.63875 |
14.33606 |
A non-rejection of the null hypothesis is equal to the probability of a Type II Error. Here our $\beta$ is 0.48 or 48%. So then, 48% of 1,000 is 480. We expect 480 samples to lead to nonrejection of the null hypothesis.
Here we need to find the samples that have a Sample Mean that is between our to critical values. I.e: $$ 15.52 < \bar{x} < 17.88 $$ To do so, we will first filter out the values that are above 15.52, and then filter those values again to get the values below 17.88.
above_xleft = sample_means[sample_means > x_left]
between = above_xleft[above_xleft < x_right]
length(between)
From above, we see that we have 486 samples that lead to nonrejection of the null hypothesis.
Here our actual value was higher than our expected value by 6 samples. This can be due to some variations in calculations, randomness of data, etc.
Power of a test can be defined mathematically as: $$1 - \beta $$ To do so, we will re-use our code from Part C. Since we know our critical values x_left
, and x_right
, will not change, we will only need to solve for our new values of $\beta$ as it is dependent on the true mean of the data.
true_mean = seq(14,19, by = 0.001)
curve1 = pnorm(x_right, mean = true_mean, sd = sigma, lower.tail = T)
curve2 = pnorm(x_left, mean = true_mean, sd = sigma, lower.tail = T)
beta = curve1 - curve2
power = 1 - beta
plot(true_mean, power,'l', xlab = expression(mu),
ylab = 'Power', main = 'Power Curve')