1. Answer EITHER (a) OR (b).
EITHER
(a) (i) With the use of an example, evaluate the difference between probabilistic and non-probabilistic sampling methods.
(ii) State and carefully explain the difference between stratified random sampling and simple random sampling.
(iii) Explain carefully the sources of error that arise from simple random sampling. Total survey error is defined as a combination of observational and non-observational errors. Describe the main sources of observational and non-observational errors and how they combine to create total survey error.
A mental health charity organisation is promoting university students’ well-being and decided to provide a trial counselling session to students at a university in China. A simple random sample of 4 students is drawn from a population of 16 students in one tutorial group.
• What is the probability of any given student being selected?
• How many different samples of 4 students are possible?
• What is the probability of any given sample of 4 students being selected?
OR
(b) A coffee shop chain has branches across three regions in Southern Kenya. The management team is considering adding a new item on the menus. To evaluate the likely demand, the new item was piloted on the menus of coffee shops across the three regions selected by stratified random sampling. The mean and variance for number of orders received for this item per coffee shop in a week was reported as follows.
Let
h = the stratum (regions).
Nh = stratum population size (the number of coffee shop branches in region h).
nh = stratum sample size
Xih = number of orders (per week) of coffee shop i in region h. i =1, 2, , nh
|
Region A |
Region B |
Region C |
Nh |
70 |
60 |
50 |
nh |
14 |
12 |
10 |
Stratum mean number of orders h̅ |
20.5 |
11.2 |
25.9 |
Stratum variance number of orders var(Xih) |
164.2 |
129.9 |
85.3 |
(i) What is the probability of selection for an individual coffee shop within each region (stratum)?
(ii) Calculate a 95% confidence interval for each of the stratum mean number of orders.
(iii) Calculate a point estimate for the overall mean number of orders.
(iv) Calculate a 95% confidence interval for the overall mean number of orders.
(v) Explain what is meant by the design effect, the design factor and the effective sample size in sampling designs.
(a) Explain the difference between one-sample t-test and paired samples t-test.
(b) The Korean Tennis Association wants to compare the mean distances associated with four different brands of tennis balls when struck by a player. A random sample of 10 balls of each brand is hit by a machine, and the distance (in meters) is recorded for each hit. The summary statistics and test statistic for equal variance are also reported. The ANOVA table computed from these data is given below but with some of the entries removed.
|
N |
Mean |
Standard deviation |
Brand A |
15 |
245.88 |
4.815 |
Brand B |
15 |
262.05 |
3.798 |
Brand C |
15 |
268.98 |
4.499 |
Brand D |
15 |
248.36 |
5.186 |
Total |
60 |
256.52 |
9,334 |
Analysis of Variance (ANOVA)
Source |
Sum of |
Degrees of |
Mean Square |
F |
Sig (Prob > |
Squares |
freedom |
F) |
|||
Between |
2786.33 |
? |
? |
? |
0.00 |
groups |
|||||
Within |
? |
? |
? |
|
|
groups |
|||||
Total |
3654.27 |
59 |
|
|
|
Test of Homogeneity of Variances
Levene Statistic = 0.017 df(3, 56) Pr > F = 0.995
(i) Using = 0.05, test the null hypothesis that the population mean distance of brand C is 265.
(ii) Calculate the values left out of the ANOVA table above.
(iii) Using = 0.05, test the hypothesis that mean distances of the four brands are equal.
(iv) State and comment on the assumptions of the ANOVA test for the equality of means.
(c) The Ministry of Transport in in the Democratic Republic of Congo has carried out a study to examine if the domestic air fares are falling. The average airfares (in US$) during the first quarter of 2018 and 2019 have been collected for a random sample of 10 airlines. The summary statistics are reported below
|
N |
Mean |
Standard Deviation |
First quarter 2018 |
10 |
183.24 |
53.65 |
First quarter 2019 |
10 |
172.53 |
55.97 |
Difference |
10 |
10.71 |
24.69 |
(i) Using = 0.05, test the null hypothesis that there was no change in the average airline fares between the first quarter of 2018 and 2019. Indicate which test you use.
(ii) State the assumptions of the test.
(d) A researcher wants to know if there is a relationship between annual salary and years of experience and has obtained a random sample of 2450 call centre workers in India. A simple bivariate regression is estimated with their annual salary (Y), measured in US$, as the dependent variable, and workers’ experience (X), measured in years, as the independent variable. The estimated coefficients, standard errors and t-statistics are given below.
i = 12453.24 + 52.46Xi se = (437.66) (23.62)
R2 = 0.14
In the above equation, ‘se’ stands for standard errors. Diagnostic tests (not reported here) do not reject the assumption of normally distributed, spherical disturbances.
(i) Interpret the estimated coefficients.
(ii) Test for the statistical significance of the relationship between annual salary and years of experience at the 5% significance level.
(iii) State the assumptions of the test in (ii).
(a) The government of Singapore has conducted a survey of women aged 20-29, which gives the number of individuals by employment status (either in-work or unemployed) and by educational qualification (with higher education or without higher education qualification). The result of the survey is summarised in the following table:
|
With higher |
Without higher |
Total |
education |
qualification |
||
In-work |
523 |
714 |
1237 |
Unemployed |
45 |
149 |
194 |
Total |
568 |
863 |
1431 |
Evaluate the relationship between employment status and educational status using three appropriate statistical techniques covered during the Statistical Research Techniques module. Comment on their relative strengths and weaknesses
(b) Using annual data for Bangladesh from 2000 to 2019, a researcher has calculated the Pearson’s correlation coefficient between average annual household electricity consumption and the real price of electricity. The result was 0.798. Interpret the correlation coefficient and test whether it is statistically different from zero.
(a) Explain the coefficient of determination R2, and the adjusted coefficient of determination.
(b) Explain the assumptions of the classical linear regression model (CLRM).
(c) Explain what is meant by Gauss-Markov conditions.
(d) Explain the differences between Pearson’s Correlation Coefficient and Spearman’s Rank Correlation Coefficient.
(e) Explain the F test for linear restrictions in multiple regression models.
Using a random sample of 30 observations, a regression model C= 0 + 1Y+ 2 W is estimated for consumption expenditures (C) against disposable income (Y) and wealth (W), all in thousands of Euros. The results are given below:
̂i = 1.64+ Yi − Wi
Se = ( ) ( ) (0.29)
R2 =0.991, F(2, 27) = 1045.33
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme