Outliers are considered as the data values which differ considerably from the bulk of a given data set. These data values lie outside the overall trend, which already lies in the data. Outliers are extremely low or extremely high stragglers in a given set of the data that can create an error in your stats. For instance, if one measured the length of children’s nose, their common value might lie if Pinocchio implied in the given class of data values.
There is a necessity to examine the set of given data to study the outliers in statistics, and how to find outliers in statistics that might cause some challenges. Although this might be easy to recognize with the help of a stemplot in which a few values vary from the given data values. So, how much variation does the value has as an outlier? We will study at a particular analysis that provides an external standard about what develops an outlier in the data.
What are outliers in statistics?
A definition of outliers in statistics can be considered as a section of data, which is used to represent an extraordinary range from a piot to another point. Or we can say that it is the data that remains outside of the other given values with a set of data. If one had Pinocchio within a class of teenagers, his nose’s length would be considered as an outlier as compared to the other children.
Examples of outliers in statistics:
In the given set of random values, 5 and 199 are outliers:
5, 94, 95, 96, 99, 104, 105, 199
“5” is studied as an extremely low value whereas “199” is recognized as an extremely high value. But, outliers are not always taken as these simple values. Let’s assume one accepted the given paychecks in the last month:
$220, $245, $20, $230.
Your average paycheck is considered as $130. But the smaller paycheck ($20) can be because that person went on holiday; that is why an average weekly paycheck is $130, which is not an actual representation of their earned. Their average is more like $232 if one accepts the outlier ($20) from the given set of data. That is why seeking outliers might not be that simple as it seems. The given data set might resemble as:
60, 9, 31, 18, 21, 28, 35, 13, 48, 2.
One might guess that 2 is an outlier and possibly 60. But one predicts it as 60 is the outlier in the set of data.
Whiskers and box chart often represent outliers:
However, one might not has a passage to the whiskers and box chart. And if one does, the few boxplots might not explain outliers. For instance, the chart has whiskers which stand out to incorporate outliers as:
That is why do not believe in obtaining outliers in statistics from the whiskers and a box chart. It said that whiskers and box charts could be a valuable device to present after one will be determined what their outliers are—the efficient method to obtain all outliers with the help of the interquartile range (IQR). These IQR includes the average amount of the data; therefore, outliers could quickly be determined once one understands the IQR.
How to find outliers in statistics using the Interquartile Range (IQR)?
An outlier is described as a data point that ranges above 1.5 IQRs, which is under the first quartile (Q1) or over the third quartile (Q3) within a set of data.
Low = (Q1) – 1.5 IQR
High = (Q3) + 1.5 IQR
Sample Problem: Find all of the outliers in statistics of the given data set: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100.
Step 1: Get the Interquartile Range, Q1(25th percentile) and Q3(75th percentile).
IQR = 50
Q1 (25th percentile) = 30
Q2 (50th percentile) = 55
Q3 (75th percentile)= 80
|How to calculate IQR of the above data set value|
|Put all the data values in order and mark a line between the values to find Q1(25th percentile) and Q3(75th percentile). [Q1:(10,20,30,40,50) | Q2: (60,70,80,90,100)]Find the median of Q1 and Q2, which is 30 and 80.Subtract Q1 from Q2. [80-30 = 50]IQR = 50.|
Step 2: Multiply the calculated IQR with 1.5 that has been obtained in Step 1:
IQR * 1.5 = 50* 1.5 = 75.
Step 3: Add the number of Step 2 to Q3 [calculated in Step 1]:
75+ 80= 155.
It is considered as an upper limit. Keep this number away for a specific moment.
Step 4: Subtract the number which one has found in Step 2 from Q1 from Step 1:
30 – 50= -20.
It is the lower limit. Put the number aside for a moment.
Step 5: Keep the values from the data set in order:
10, 20, 30, 40, 50, 60, 70, 80, 90, 100.
Step 6: Include these low and high values to the given data set in order:
-20, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 155.
Step 7: Highlight a value above or below the values that one has put in Step 6:
-20, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 155.
Here is the method for how to find outliers in statistics, and for this example, it will be 100.
How to find the outliers in statistics using the Tukey method?
The Tukey method to discover the outliers in statistics applies the Interquartile Range to separate very small or very large numbers. It is the equivalent of the above method, but one might examine the formulas which are composed slightly different, and the specification is slightly different. For instance, the Tukey method utilizes the idea of “fences.”
The specifications are:
High outliers = Q3 + 1.5(Q3 – Q1) = Q3 + 1.5(IQR)
Low outliers = Q1 – 1.5(Q3 – Q1) = Q1 – 1.5(IQR)
Q1 = first quartile
Q2 = middle quartile
Q3 = third quartile
IQR = Interquartile range
The above equations provide two values. One can study a fence that can highlight the outliers from the values included in the amount of the data. Now, let’s check how to find outliers in statistics.
Sample Problem: Use Tukey’s method to get the value of outliers of the following data: 3,4,6,8,9,11,14,17,20,21,42.
Step 1: Calculate the Interquartile range [follow the same procedure shown in the table as mentioned above], which give the value as
Q1 = 6
Q3 = 20
IQR = 14
Step 2: Measure the value of 1.5 * IQR:
1.5 * IQR = 1.5 * 14= 21
Step 3: Subtract the value of Q1 to obtain the lower fence:
6 – 21 = -15
Step 4: Sum the value to Q3 to obtain the upper fence:
20+ 21 = 41.
Step 5: Add these fences to the given data to get the value of outliers:
-15, 3, 4, 6, 8, 9, 11, 14, 17, 20, 21, 41, 42.
Anything which is outside the fences is considered to be the outliers. For the given data set, 42 is considered as an only outlier.
Several students face difficulty regarding how to find outliers in statistics; that is why we have mentioned two different methods to calculate it. Besides this, there are other advanced methods too to get the value of outliers, such as Dixon’s Q Test, Generalized ESD, and much more. Use the above-mentioned IQR and Tukey method to solve the problems of outliers values.
If you are still struggling with any of the statistics assignments and homework, avail of our services to get high-quality data, and all assignments and homework will be delivered within the solved time from our experts. One can take experts’ help 24*7 as our customer support executive are accessible 24*7. Get relaxed from your statistics assignments and homework, and take our services to score A+ grades in your academics. Get the best do my statistics homework services from the experts.