It is noted that a large sample tends to give a better approximation of the population parameter than a small sample. This reflects a simple notion supported by common sense: in only a few trials, results can be very different, but in a large sample, results tend to be fairly stable and consistent. For example it would not be unusual to get 3 heads when flipping a fair coin 3 times, but it would be very unusual to get 300 heads when flipping a fair coin 300 times. This notion is widely used by insurance companies to estimate the expected amount of claims they will pay in any one year. It is used by government to estimate tax collections. It tells casinos that when the odds favor the house, even a little bit, if they can induce enough people to gamble long enough, the house will win.
The central limit theorem states that when sampling from a normal, a uniform, or even a skewed distribution, the means of the samples will be approximately normally distributed if a sufficient number of samples are taken. Point estimates are sample measures of central tendency such as mean, median and mode. Sample measures of dispersion such as variance and standard deviation are also point estimates. They are one value used to estimate a population parameter. Interval estimates state the range within which a population parameter probably lies. A 95 percent confidence interval means that about 95 percent of the similarly constructed intervals will contain the parameter being estimated.
Although the 95% confidence interval is the most commonly reported, you can calculate intervals for any confidence level. What changes is the width of the interval. For example, to construct a 99% confidence interval, for a large sample you use the value 2.58 instead of 1.96. That’s because 99% of sample means are within 2.58 standard error units of the unknown population mean (parameter).
A. Computing Confidence Interval
It is very easy to perform confidence interval computation by using SPSS Explore Statistics.
Example: Let’s use the senior tourist age data from last SPSS assignment and determine the 90%, 95% and 99% confidence intervals for the variable SeniorAge (depending on how you named this variable) of the 18 senior tourists visiting Flagstaff, Arizona.
1. Retrieve tourist age data you used for last week assignment.
2. Click Analyze
3. Click Descriptive Statistics
4. Click Explore … The Explore dialog box comes up.
5. Click SeniorAge (depending on how you named it) variable and move it to the Dependent List: box by clicking the arrow.
6. In the Display box, select Statistics
7. Click Statistics … to the right
8. In Explore: Statistics box, you will see Confidence Interval for Mean is set at 95% by default. Replace 95% with 90%.
9. Click Continue
10. Click OK and the following two output tables are generated:
The first table is the Case Processing Summary table reporting the total number of cases analyzed in this procedure, showing no missing data.
The second table reports the descriptive summary statistics for Seniorge variable. 90% confidence interval for the mean is reported with lower bound at 70.16 years and upper bound at 71.84 years. These lower bound and upper bound values are called the confidence limits. The 90% confidence interval for the age of the 18 senior tourists is from 70.16 years to 71.84 years.
What do these results mean? Suppose we selected many samples of the age of the 18 senior tourists, for each sample, we compute the mean and the standard deviation, and then construct a 90% confidence interval. We would expect about 90% of these confidence intervals to contain the true population mean. About 10% of the intervals would not contain the population mean of senior tourist age in the samples. When the samples do not contain the population mean, we attribute this to sampling error, and it is the risk we assume when we select the level of confidence.
Defining Standard Error of the Mean and 5% Trimmed Mean Reported in Above Table.
Standard Error of the Mean is a statistic that estimates the variability in the sample mean you expect if you take repeated samples of the same size from the population. It is calculated by dividing the standard deviation of the observations by the square root of the number of observations. Since it is unlikely that a sample of observations would all be unusually high or low, we would expect the variability of the mean to be less than that of an individual value.
Trimmed Mean is a measure of central tendency similar to the mean, except that trimmed mean is based on only the “inner” portion of the data after trimming the top and bottom p percent of values. For instance, the age variable has 18 cases, a 5% trimmed mean discards the 5% smallest value (18 x .05 = .90 ≈ 1 case of the smallest value which is the age of 67) and the 5% largest values (18 x .05 = .90 ≈ 1 case of the largest value which is the age of 75), and then computes the average (sum divided by the remaining 16 cases, in this case, 68+69+69+70+70+70+71+71+71+71+72+72+72+73+73+74/16=71) of the values remaining. When p=0, the trimmed mean is equal to the mean (please note: since this data set has a normal distribution, therefore, the 5% trimmed mean = the sample mean). As p approaches 50, the trimmed mean approaches the median. Trimmed means offer an advantage over the mean for variables with extreme values on either end, such as the variable length of stay in the Mongolia data. However, the age variable measured from the 18 senior tourists does not vary much from the mean since they are pretty much in the same age group.
Repeating the above procedures to generate 95% and 99% confidence intervals for the SeniorAge variable result in the following two tables:
Therefore, the 95% confidence interval for the age of the 18 senior tourists is from 69.98 years to 72.02 years, and the 99% confidence interval for the age of the 18 senior tourists is from 69.59 years to 72.41 years. As we note, the increase in the level of confidence expands the confidence intervals.
B. Plotting Data
In previous chapters, you have used a variety of graphical displays to summarize your data. You have made pie chart, bar charts, histograms, box plots and stem-and-leaf plots. All of these plots show the distribution of the values of a single variable. In this exercise, you will learn how to display the values of two variables that are measured on a meaningful scale. By using scatterplots, you can examine the relationships of pairs of variables, such as a country’s marketing budget and total tourism arrivals, movement in foreign exchange rates and total tourism arrivals, team salary and total number of wins, hours of training and quality of service, and corporate profits and CEO’s compensations.
A Scatterplot is one of the best ways to look for relationships and patterns among variables. It is simple to understand, yet it conveys much information about the data.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme