{"id":1052,"date":"2020-03-26T07:24:59","date_gmt":"2020-03-26T07:24:59","guid":{"rendered":"https:\/\/statanalytica.com\/blog\/?p=1052"},"modified":"2021-08-14T12:26:49","modified_gmt":"2021-08-14T11:26:49","slug":"statistics-basics","status":"publish","type":"post","link":"https:\/\/statanalytica.com\/blog\/statistics-basics\/","title":{"rendered":"Top 3 Statistics Basics Concepts For The Beginners"},"content":{"rendered":"\n<p>Statistics is a powerful tool for performing the functions of data science. In terms of high-level view, statistics is a branch of mathematics that is used for technical data analysis. <strong>Statistics basics<\/strong> visualization like a bar chart can provide you some high-level data, but using statistics. It is possible to operate the data in a more informative and targeted way. This branch of mathematics helps the concrete summary of the data instead of just guesstimating.<\/p>\n\n\n\n<p>With the help of statistics, one can find deeper insights into how exactly the information is organized due to which the data science techniques can be applied to gain more information. Therefore, this blog has described 3 <strong>statistics basics <\/strong>concepts which must be known to the data scientists, so let\u2019s discuss them.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"top-3-statistics-basics\"><\/span><strong>Top 3 Statistics Basics<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2><div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-light-blue ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a3827f556f3f\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #ff5104;color:#ff5104\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #ff5104;color:#ff5104\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a3827f556f3f\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/statanalytica.com\/blog\/statistics-basics\/#top-3-statistics-basics\" >Top 3 Statistics Basics<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/statanalytica.com\/blog\/statistics-basics\/#statistical-features\" >Statistical Features<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/statanalytica.com\/blog\/statistics-basics\/#bayesian-statistics\" >Bayesian Statistics<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/statanalytica.com\/blog\/statistics-basics\/#over-and-under-sampling\" >Over and Under Sampling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/statanalytica.com\/blog\/statistics-basics\/#conclusion\" >Conclusion<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"statistical-features\"><\/span><strong>Statistical Features<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>It is the most usable <strong>statistics basics <\/strong>concept for data science. And it is the first statistics method that is applied when you need to explore the data and involves things. Such as variance, median, bias, mean, percentiles, and much more. Let\u2019s take an example of this.<br><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/JxBcohVchsABjAkY3nWdec99WKtNnh5BNviHWvGdjvgpQ9OFe0wAcTDVpmxwEnA0TXjTQDfKy6Nk-yy0gXaKJYiO_XYln-dTrj7t9869bmrqFm1TTgM_6nRvZsi9djhdF16NsMMe\" alt=\"\"\/><\/figure>\n\n\n\n<p>The middle line is the data\u2019s median value, where the first quartile shows the 25th percentile of the value. The third quartile is 75 percentile of the given data. And the max and min values show the lower and upper ends of the data range.<\/p>\n\n\n\n<p>Now, we will discuss the statistical features that are illustrated in a box plot:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>A short box plot implies that most of your data points are identical. Even though there are several values within the small range.<\/li><li>A tall box plot implies that most of your data points differ with each other. Therefore, the value is spread in a wide range.<\/li><li>If the value of the median is nearer to the bottom value. Then the data is considered to be the lower value or vice-versa. If there is no line in the middle of the box, then this indicates the skewed data.<\/li><li>Is your data whiskers very long? It means that the information has variance and standard deviation; that is, the values can be spread and highly variable. If you find that one side of the box has long whiskers as compared to the other side. Then the data is varying in a single direction.<\/li><\/ul>\n\n\n\n<p>The data, as mentioned earlier, has shown some of the statistical features which are easy to measure. Try all the features whenever one needs an informative view of the data.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"bayesian-statistics\"><\/span><strong>Bayesian Statistics<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>To understand Bayesian <strong>statistics basics<\/strong>, it needs to know where frequency statistics fail. The frequency statistics is one of the kinds of <strong>statistics basics<\/strong> that several individuals think of like the word \u201cprobability\u201d. It includes the application of mathematics to analyze the probability of the few events happening. Where the computed data is on priority. Let\u2019s check out the Baye\u2019s theorem:<br><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/PjSmPFm04URgKSlt8Uc18mvGF-J3BGoUBp0nhi9y4WCIZeF8inh0DkDgWN5NN8QWzahisf9jgno3dTwRPBoBnQY42a0Z6WobuSwCPlzMVbSnKEBm48qoQGFAwlNIub_PM50qa2Bt\" alt=\"\"\/><\/figure>\n\n\n\n<p>The frequency of the analytical is represented by probability P(H). That is also considered to be priority data. Which is the probability of the event happening. The P(E|H) in the given equation is known as the likelihood. For instance, if one wants to roll a die almost 1,000 times, and the initial 100 rolls one gets all 6, then you will find that it boosts your confidence. The probability P(E) of exact evidence is true. If someone says to you that the given die is loaded. Then there is a possibility that the guess of getting 6 is true. <\/p>\n\n\n\n<p>Side by side, you can take your evidence of loaded die into account, whether it is true or not. Now, you can see that you have taken everything into account as per the layout of the Bayesian statistics equation. You can use it where one finds that the prior data do not have a good visualization of the future results and data.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"over-and-under-sampling\"><\/span><strong>Over and Under Sampling<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>It is the <strong>statistics basics<\/strong> technique that is used to classify the different problems. There is the possibility that the classification dataset may have too many tips for a single side. Such as you have almost 200 examples for class 5, but for class 6, you have only 20. Now put this data using several machine learning techniques. And use the sampled data and make predictions. Now, we will explain it through over and under-sampling technique such as:<br><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/201OuQ1ZMHkMb8YhPyICIlTWipDkCkUW-U80d6zI0NPHPeNUBN9uRBq9IFCMSsyrlm1NBFVFP5SmSBCl-24y_u5loSGAsKbXoAG8UgDTlMbvtbpTfpv5MLykW8FVcDDag2S2tnUW\" alt=\"\"\/><\/figure>\n\n\n\n<p>As you can see on both the right and left side of the picture. The blue class has more models as compared to the orange class. Therefore, it has 2 pre-processing choices that can aid the training of machine learning models. The undersampling simply means that one needs to select only a few of the information from the data of the majority class. By using as much as examples that a minority class has. These choices must be made to manage the probability distribution of the given class.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Whereas, oversampling means that you need to generate copies of the minority class. Thus you can get a similar number of examples like that of majority classes. Or we can say that the copies must be made in such a way that the minority class distribution can be maintained.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"conclusion\"><\/span><strong>Conclusion<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>This blog has explained three <strong>statistics basics<\/strong> that are Statistical Features, Bayesian Statistics, and Over and Under Sampling with the supporting examples. This will help you to understand the details of the statistics. Thus you can easily solve the mathematical problems of the statistics. These three concepts are used to analyze the different concepts of data science. These three concepts are applicable in real life so that one can solve daily problems easily.&nbsp;<\/p>\n\n\n\n<p>If you find any difficulty related to the statistics, then you can avail of our services. We have a team of <a href=\"https:\/\/statanalytica.com\/statistics-homework-help\">statistics homework helper<\/a> who are well qualified in their subjective fields. Therefore, they are able to deliver high-quality data at an affordable price. You can take our experts\u2019 help anytime, as we are accessible to you 24\/7. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Statistics is a powerful tool for performing the functions of data science. In terms of high-level view, statistics is a branch of mathematics that is used for technical data analysis. Statistics basics visualization like a bar chart can provide you some high-level data, but using statistics. It is possible to operate the data in a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1055,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"default","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[76],"tags":[],"class_list":["post-1052","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-statistics"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/statanalytica.com\/blog\/wp-json\/wp\/v2\/posts\/1052","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/statanalytica.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/statanalytica.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/statanalytica.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/statanalytica.com\/blog\/wp-json\/wp\/v2\/comments?post=1052"}],"version-history":[{"count":0,"href":"https:\/\/statanalytica.com\/blog\/wp-json\/wp\/v2\/posts\/1052\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/statanalytica.com\/blog\/wp-json\/wp\/v2\/media\/1055"}],"wp:attachment":[{"href":"https:\/\/statanalytica.com\/blog\/wp-json\/wp\/v2\/media?parent=1052"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/statanalytica.com\/blog\/wp-json\/wp\/v2\/categories?post=1052"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/statanalytica.com\/blog\/wp-json\/wp\/v2\/tags?post=1052"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}