logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Martha BagemihllSocial sciences
(5/5)

618 Answers

Hire Me
expert
Pankaj KukrejaSociology
(5/5)

731 Answers

Hire Me
expert
Fiona EwinggPsychology
(5/5)

634 Answers

Hire Me
expert
Arun SainiPolitical science
(5/5)

919 Answers

Hire Me
R Programming
(5/5)

Given X1, ..,Xn iid from some univariate for now distribution, in this exercise we consider a seemingly trivial goal

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

General Instructions

I expect you to upload your solutions on Moodle as a single running  R Markdown file (.rmd) + its .html output,  named with your surnames.

R Markdown Test

To be sure that everything is working fine, start RStudio and create an empty project called HW3. Now open a new R Markdown file (File > New File > R Markdown...); set the output to HTML mode, press OK, and then click on Knit HTML. This should produce a web page with the knitting procedure executing the default code blocks. You can now start editing this file to produce your homework submission.

Please Notice

•  For more info on R Markdown, check the support webpage that explains the main steps and ingredients: R Markdown from RStudio.

•  For more info on how to write math formulas in LaTex: Wikibooks.

•  Remember our policy on collaboration: Collaboration on homework assignments with fellow students is encouraged. However, such collaboration should be clearly acknowledged, by listing the names of the students with whom you have had discussions concerning your solution. You may not, however, share written work or code after discussing a problem with others. The solutions should be written by you.

Exercise: Estimating a population mean. . . in 2020. . .

1.   Introduction

Given {X1, . . . , Xn} iid from some (univariate for now) distribution, in this exercise we consider a seemingly trivial goal:

 estimate the population mean µ = E(X)  

An obvious choice would be the plug-in estimator, the empirical mean that you all know and love. . .

This estimator is computationally attractive, requires no prior knowledge, and automatically scales with the population variance

σ. In addition, tweaking a bit the Central Limit Theorem, we also know that result that also holds non-asymptotically under some suitable technical conditions. If these conditions are not met, we still have Chebyshev’s inequality, which says that with a probability of at least 1 − α an exponentially weaker bound that will especially hurt in modern applications where many means have to be estimated simultaneusly (e.g. empirical risk minimization methods).

One may in fact try to extend the MoM estimator to the multivariate case, only one problem: what is a median in the multivariate case?  Given n points   x1, . . . , xn    in Rd, the center of the smallest ball that contains at least  half of the points may be considered as a notion of a multivariate median. Computing such a median is totally a nontrivial problem! The multivariate MoM estimator may be defined as the geometric median of the sample means of the k blocks defined before. As in the univariate case, the theoretical optimal block number is k٨ = 8 log(1/α) . As a quick final remark, let me notice that, for the purpose of experimenting a bit, we could replace this last summary with any robust multivariate location estimator. 

1.On robust procedures and heavy-tailed distribution in Data Science

As mentioned, MoM estimators should have an edge on the beloved sample average in multivariate, heavy-tailed cases. At this point in time, heavy-tailed distributions have been accepted as realistic models for various phenomena:

•  www-session characteristics (e.g. sizes and durations of sub-sessions; sizes of responses inter-response time intervals)

•  on/off-periods of packet traffic

•  file sizes

•  service-time in queueing model

•  flood levels of rivers

•  major insurance claims

•  extreme levels of ozon concentrations

•  high wind-speed values

•  wave heights during a storm

•  low and high temperature

But there’s more. As you probably know, recent technological developments have allowed companies and state organizations to collect and store huge datasets. Big datasets have also challenged scientists in statistics and computer science to develop new methods. In fact, because of the very “unstructured” way in which these datasets are collected, oftentimes they tend to be corrupted by nasty outliers and/or exhibit heavy tails.

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme