(5/5)

**General Instructions**

I expect you to upload your solutions on Moodle as a single running R Markdown file (.rmd) + its .html output, named with your surnames.

**R Markdown Test**

To be sure that everything is working fine, start RStudio and create an empty project called HW3. Now open a new R Markdown file (File > New File > R Markdown...); set the output to HTML mode, press OK, and then click on Knit HTML. This should produce a web page with the knitting procedure executing the default code blocks. You can now start editing this file to produce your homework submission.

**Please Notice**

• For more info on R Markdown, check the support webpage that explains the main steps and ingredients: R Markdown from RStudio.

• For more info on how to write math formulas in LaTex: Wikibooks.

• Remember our policy on collaboration: Collaboration on homework assignments with fellow students is encouraged. However, such collaboration should be clearly acknowledged, by listing the names of the students with whom you have had discussions concerning your solution. You may not, however, share written work or code after discussing a problem with others. The solutions should be written by you.

**Exercise: Estimating a population mean. . . in 2020. . .**

**1. Introduction**

Given {X1, . . . , Xn} iid from some (univariate for now) distribution, in this exercise we consider a seemingly trivial goal:

estimate the population mean µ = E(X)

An obvious choice would be the plug-in estimator, the empirical mean that you all know and love. . .

This estimator is computationally attractive, requires no prior knowledge, and automatically scales with the population variance

σ. In addition, tweaking a bit the Central Limit Theorem, we also know that result that also holds non-asymptotically under some suitable technical conditions. If these conditions are not met, we still have Chebyshev’s inequality, which says that with a probability of at least 1 − α an exponentially weaker bound that will especially hurt in modern applications where many means have to be estimated simultaneusly (e.g. empirical risk minimization methods).

One may in fact try to extend the MoM estimator to the multivariate case, only one problem: what is a median in the multivariate case? Given n points x1, . . . , xn in Rd, the center of the smallest ball that contains at least half of the points may be considered as a notion of a multivariate median. Computing such a median is totally a nontrivial problem! The multivariate MoM estimator may be defined as the geometric median of the sample means of the k blocks defined before. As in the univariate case, the theoretical optimal block number is k٨ = 8 log(1/α) . As a quick final remark, let me notice that, for the purpose of experimenting a bit, we could replace this last summary with any robust multivariate location estimator.

**1.On robust procedures and heavy-tailed distribution in Data Science**

As mentioned, MoM estimators should have an edge on the beloved sample average in multivariate, heavy-tailed cases. At this point in time, heavy-tailed distributions have been accepted as realistic models for various phenomena:

• www-session characteristics (e.g. sizes and durations of sub-sessions; sizes of responses inter-response time intervals)

• on/off-periods of packet traffic

• file sizes

• service-time in queueing model

• flood levels of rivers

• major insurance claims

• extreme levels of ozon concentrations

• high wind-speed values

• wave heights during a storm

• low and high temperature

But there’s more. As you probably know, recent technological developments have allowed companies and state organizations to collect and store huge datasets. Big datasets have also challenged scientists in statistics and computer science to develop new methods. In fact, because of the very “unstructured” way in which these datasets are collected, oftentimes they tend to be corrupted by nasty outliers and/or exhibit heavy tails.

(5/5)

CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,

Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This

7COM1028 Secure Systems Programming Referral Coursework: Secure

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme

Get Free Quote!

398 Experts Online