logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Jaida PatrickTechnical writing
(5/5)

824 Answers

Hire Me
expert
Roshan NehraPhilosophy
(5/5)

831 Answers

Hire Me
expert
Christopher MclaughlinMathematics
(5/5)

665 Answers

Hire Me
expert
Anmol AroraGeneral article writing
(5/5)

520 Answers

Hire Me
Applied Statistics
(5/5)

Consider the distribution with density f (x)

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Problem 1

Consider the distribution with density

 f (x) = 

 

2νx 0 < x < 1

 

 

(1)

 

(1 − ν)λ/xλ+1 x ≥ 1

where 0 ≤ ν ≤ 1 and λ > 0.

a) Compute the cdf and then employ it to obtain a sampler from the distri- bution.

b) The meaning of the two parameters ν, λ.

Use the result in part a) to determine what ν is.

As for the meaning of λ, show that E(log X|X  > 1) = 1/λ.

c) The  method-of-moments  estimator  of  λ,  λˆM M ,  using  a  sample  of  size  n, when ν is known.

Show or argue why there is no method-of-moments estimator if the true value of λ is smaller than or equal to 1.  Then assume λ > 1 and find λˆM M using the first sample moment.

d) Obtain the MLE estimator of λ, λˆM LE , using a sample of size n assuming that ν is known.

e) Comparison of the two estimators λˆM LE   and λˆM M . First address some questions mathematically.

Do the two estimators always exist for all possible values of the true pa-

rameter λ?  Do the estimators exist when ν = 0?  Do they exist when

ν = 1? Recall the meaning of ν obtained in part b) and comment.

Then compare the two estimators using the MSE, or, in fact,  a Monte Carlo estimate of the MSE, since it is difficult to compute the MSE analyt- ically. Monte Carlo estimation is another name for estimation by simula- tions: drawing a large number of times samples from the true distribution and employing these samples to estimate what is of interest. [This is jus- tified by the law of large numbers]. More precisely, consider the following 60 simulation schemes,  corresponding to all possible triples (n, ν, λ),  with ν = 0, 0.25, 0.5, 0.75, 0.9; λ = 1.2, 2, 3; sample sizes n = 50, 100, 500, 1000.

For each case defined by (n, ν, λ), using the sampler obtained in part a), draw from the distribution independently n times to get (x1, . . . , xn); eval- uate the estimators λˆM LE   and λˆM M   on such sample; do so N  times (say, N = 200000) and then use the resulting N estimates of each estimator, to compute (the estimates of) its variance, its bias squared and its MSE. Namely, the MSE

 

 

is approximated as follows:

 

R = E[(λˆ − λ)2]

 

E[(λˆ − λ)2] ≈  1  Σ(λˆ(x  ) − λ)2

 

where xj = (x1, . . . , xn)j is a size-n sample drawn from  the  distribution given in (1) and λˆ(xj) is the estimate of λ using the formulae obtained in c) and d).  Since the question asks to compute also bias squared (λ     E(λˆ))2 and variance V ar(λˆ), compute these quantities first, using the N  estimates

 

{λˆ(xj)}N

 

, and then use the MSE in the following form

 

R = E[(λˆ − λ)2] = (λ − E(λˆ))2 + V ar(λˆ).

 

 

after showing that this decomposition does hold. Plot the MSE, variance and bias “curves”. Comment on the results: is one estimator always better than the other? which one would you choose? do the estimators become more accurate when n increases? [Pay attention to the scale of the MSE values, if they are small you should phrase your statement with caution.]

 

Problem 2

The attached file p2data.txt contains data d = (x1, . . . , x50) consisting of 50 values that were drawn independently from some distribution F . You do not know this generating distribution and I will not reveal it to you (or, not yet, at least).

The concern in this problem is the estimation of the true median θ of the distribution. You will probably estimate θ using the median of the sample, θˆ. Naturally, it is important to assess how accurate an estimate the sample median

is of the true median and, for this, it is necessary to employ some measure of error. But how can you compute the standard error of θˆ or a confidence interval (CI) for the true median θ if the distribution F of the data is not known to you

and in this case not even the family to which it may belong? Here’s a method (known as the bootstrap) that allows you to do that. The method is quite general and works for any quantity θ of interest and any estimate θˆ of it.

The method uses the one sample one has, d, from the unknown F to find an  approximation  Fˆ  of  F .  Then  Fˆ  is  used  as  if  it  were  the  true  F  and  Monte Carlo estimates of the standard deviation of θˆ (as in problem 1) and even CIs for  θ  can  be  computed.  Precisely,  draw  n  values  independently  from  Fˆ,  where n is the number of values in the data d (the sample so obtained, d∗, is called a bootstrap sample); compute the statistic of interest on this sample (here the

median); repeat these two steps a large number of times B.   The procedure

 

therefore yields B replicates of the median:

 

θˆ∗1  =  s(d∗1), . . . , θˆ∗B   =  s(d∗B),

 

where s is the algorithm that outputs the median of the bootstrap sample.

These replicates are then employed to get an measure of error for the statistic

θˆ (here the median of the sample d), and a CI for θ (the true median).

 

What to do with the bootstrap replications?

The fundamental goal of the bootstrap is that of obtaining a standard error for θˆ. This is estimated as the sample standard deviation of the B replications, (θˆ∗1, . . . , θˆ∗B ):  i.e.  the square root of the sample variance

 

 

 

 

¯ˆ∗

 

 

ΣB ˆ∗b

 

 

 

B

Vˆar(θˆ) =

b=1

 

θˆ∗b θ¯ˆ∗   2

(2)

B

 

B − 1 in place of B in the formula (2), if you prefer the usual sample variance].

Vˆar(θˆ) is known as the bootstrap estimate σˆboot  of the standard error of the

θ

statistic θˆ.  It is just the Monte Carlo estimate of the the standard deviation

 

of θˆ by using not the true distribution F , which is unknown, but its available approximation Fˆ.

The bootstrap replicates can also be used to compute the CI of the parameter of interest θ of which the statistic θˆis an estimator.  There exist different versions of bootstrap CIs. We will consider two of them.

One is the bootstrap percentile interval which uses the sample percentiles of the bootstrap replicates. Namely, the 100(1 − α)% bootstrap two-sided CI is

 

α/2,

 

1−α/2)

 

where  θˆ∗  is  the  β  sample  percentile  of  the  bootstrap  replicates:  the  value  that separates the 100β% smaller values of the set (θˆ∗1, . . . , θˆ∗B ) from the rest.

Another version of the bootstrap CI uses the bootstrap estimate of the stan- dard error:

θˆ± zα/2σˆboot.

This is known as the normal-based 100(1 − α)% bootstrap CI since it uses the normal cutoff points zα/2 [P (Z ≥ zα) = α where Z ∼ N (0, 1)]. [Notice that θˆ is the statistic evaluated on the original data d = (x1, . . . , xn) not on bootstrap

samples.]

 

How to approximate F ?

There are two versions of the bootstrap method: they differ in how F is ap- proximated.

 

Version 1. Non-parametric bootstrap

This version of the method assumes nothing about F .

The estimate Fˆ of F  obtained from its one sample d = (x1, . . . , xn) is the dis- crete distribution having masses of equal values at x1, . . . , xn. Mathematically the pmf of the approximate distribution is

1

P (X = xi) = n, i = 1, . . . , n.

This is known as the empirical distribution. This is tantamount to considering the values we have observed as the only possible values, and each equally likely (at least if the xi are distinct).

In other words, a non-parametric bootstrap sample is obtained as follows. Draw with replacement n balls from an urn that contains the n balls (x1, . . . xn) (each ball is labeled by the values of the data). The n values you draw are the bootstrap sample. Therefore in a bootstrap sample some values may appear more than once.

 

Version 2. The parametric bootstrap

The parametric bootstrap is a different version of the method, which is applied when some information about the true distribution F of the data is known. Specifically it requires that F = Fλ be known up to some parameter(s) λ . The data d are employed to estimate the unknown parameter(s) λ, the only missing information, and F  is approximated by Fˆ = Fλˆ.  For λˆ one can use the MLE (or

some other estimator).  Since Fλˆ  is completely known, one can sample as many

times as one wants and generate the bootstrap samples.

What changes in the two versions is how F  is estimated.  The procedure is otherwise the same. To sum up, draw B data sets of (generally and for this exercise) the same size as the original data set from the bootstrap approximation Fˆ  to  F ;  evaluate  the  statistic  of  interest  on  the  B  bootstrap  samples  to  get  B bootstrap replicates.

 

 

 

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme