Problem 1
Consider the distribution with density
f (x) =
2νx 0 < x < 1
(1)
(1 − ν)λ/xλ+1 x ≥ 1
where 0 ≤ ν ≤ 1 and λ > 0.
a) Compute the cdf and then employ it to obtain a sampler from the distri- bution.
b) The meaning of the two parameters ν, λ.
Use the result in part a) to determine what ν is.
As for the meaning of λ, show that E(log X|X > 1) = 1/λ.
c) The method-of-moments estimator of λ, λˆM M , using a sample of size n, when ν is known.
Show or argue why there is no method-of-moments estimator if the true value of λ is smaller than or equal to 1. Then assume λ > 1 and find λˆM M using the first sample moment.
d) Obtain the MLE estimator of λ, λˆM LE , using a sample of size n assuming that ν is known.
e) Comparison of the two estimators λˆM LE and λˆM M . First address some questions mathematically.
Do the two estimators always exist for all possible values of the true pa-
rameter λ? Do the estimators exist when ν = 0? Do they exist when
ν = 1? Recall the meaning of ν obtained in part b) and comment.
Then compare the two estimators using the MSE, or, in fact, a Monte Carlo estimate of the MSE, since it is difficult to compute the MSE analyt- ically. Monte Carlo estimation is another name for estimation by simula- tions: drawing a large number of times samples from the true distribution and employing these samples to estimate what is of interest. [This is jus- tified by the law of large numbers]. More precisely, consider the following 60 simulation schemes, corresponding to all possible triples (n, ν, λ), with ν = 0, 0.25, 0.5, 0.75, 0.9; λ = 1.2, 2, 3; sample sizes n = 50, 100, 500, 1000.
For each case defined by (n, ν, λ), using the sampler obtained in part a), draw from the distribution independently n times to get (x1, . . . , xn); eval- uate the estimators λˆM LE and λˆM M on such sample; do so N times (say, N = 200000) and then use the resulting N estimates of each estimator, to compute (the estimates of) its variance, its bias squared and its MSE. Namely, the MSE
is approximated as follows:
R = E[(λˆ − λ)2]
E[(λˆ − λ)2] ≈ 1 Σ(λˆ(x ) − λ)2
where xj = (x1, . . . , xn)j is a size-n sample drawn from the distribution given in (1) and λˆ(xj) is the estimate of λ using the formulae obtained in c) and d). Since the question asks to compute also bias squared (λ E(λˆ))2 and variance V ar(λˆ), compute these quantities first, using the N estimates
{λˆ(xj)}N
, and then use the MSE in the following form
R = E[(λˆ − λ)2] = (λ − E(λˆ))2 + V ar(λˆ).
after showing that this decomposition does hold. Plot the MSE, variance and bias “curves”. Comment on the results: is one estimator always better than the other? which one would you choose? do the estimators become more accurate when n increases? [Pay attention to the scale of the MSE values, if they are small you should phrase your statement with caution.]
Problem 2
The attached file p2data.txt contains data d = (x1, . . . , x50) consisting of 50 values that were drawn independently from some distribution F . You do not know this generating distribution and I will not reveal it to you (or, not yet, at least).
The concern in this problem is the estimation of the true median θ of the distribution. You will probably estimate θ using the median of the sample, θˆ. Naturally, it is important to assess how accurate an estimate the sample median
is of the true median and, for this, it is necessary to employ some measure of error. But how can you compute the standard error of θˆ or a confidence interval (CI) for the true median θ if the distribution F of the data is not known to you
and in this case not even the family to which it may belong? Here’s a method (known as the bootstrap) that allows you to do that. The method is quite general and works for any quantity θ of interest and any estimate θˆ of it.
The method uses the one sample one has, d, from the unknown F to find an approximation Fˆ of F . Then Fˆ is used as if it were the true F and Monte Carlo estimates of the standard deviation of θˆ (as in problem 1) and even CIs for θ can be computed. Precisely, draw n values independently from Fˆ, where n is the number of values in the data d (the sample so obtained, d∗, is called a bootstrap sample); compute the statistic of interest on this sample (here the
median); repeat these two steps a large number of times B. The procedure
therefore yields B replicates of the median:
θˆ∗1 = s(d∗1), . . . , θˆ∗B = s(d∗B),
where s is the algorithm that outputs the median of the bootstrap sample.
These replicates are then employed to get an measure of error for the statistic
θˆ (here the median of the sample d), and a CI for θ (the true median).
What to do with the bootstrap replications?
The fundamental goal of the bootstrap is that of obtaining a standard error for θˆ. This is estimated as the sample standard deviation of the B replications, (θˆ∗1, . . . , θˆ∗B ): i.e. the square root of the sample variance
¯ˆ∗
ΣB ˆ∗b
B
Vˆar(θˆ) =
b=1
θˆ∗b θ¯ˆ∗ 2
(2)
B
B − 1 in place of B in the formula (2), if you prefer the usual sample variance].
Vˆar(θˆ) is known as the bootstrap estimate σˆboot of the standard error of the
θ
statistic θˆ. It is just the Monte Carlo estimate of the the standard deviation
of θˆ by using not the true distribution F , which is unknown, but its available approximation Fˆ.
The bootstrap replicates can also be used to compute the CI of the parameter of interest θ of which the statistic θˆis an estimator. There exist different versions of bootstrap CIs. We will consider two of them.
One is the bootstrap percentile interval which uses the sample percentiles of the bootstrap replicates. Namely, the 100(1 − α)% bootstrap two-sided CI is
α/2,
1−α/2)
where θˆ∗ is the β sample percentile of the bootstrap replicates: the value that separates the 100β% smaller values of the set (θˆ∗1, . . . , θˆ∗B ) from the rest.
Another version of the bootstrap CI uses the bootstrap estimate of the stan- dard error:
θˆ± zα/2σˆboot.
This is known as the normal-based 100(1 − α)% bootstrap CI since it uses the normal cutoff points zα/2 [P (Z ≥ zα) = α where Z ∼ N (0, 1)]. [Notice that θˆ is the statistic evaluated on the original data d = (x1, . . . , xn) not on bootstrap
samples.]
How to approximate F ?
There are two versions of the bootstrap method: they differ in how F is ap- proximated.
Version 1. Non-parametric bootstrap
This version of the method assumes nothing about F .
The estimate Fˆ of F obtained from its one sample d = (x1, . . . , xn) is the dis- crete distribution having masses of equal values at x1, . . . , xn. Mathematically the pmf of the approximate distribution is
1
P (X = xi) = n, i = 1, . . . , n.
This is known as the empirical distribution. This is tantamount to considering the values we have observed as the only possible values, and each equally likely (at least if the xi are distinct).
In other words, a non-parametric bootstrap sample is obtained as follows. Draw with replacement n balls from an urn that contains the n balls (x1, . . . xn) (each ball is labeled by the values of the data). The n values you draw are the bootstrap sample. Therefore in a bootstrap sample some values may appear more than once.
Version 2. The parametric bootstrap
The parametric bootstrap is a different version of the method, which is applied when some information about the true distribution F of the data is known. Specifically it requires that F = Fλ be known up to some parameter(s) λ . The data d are employed to estimate the unknown parameter(s) λ, the only missing information, and F is approximated by Fˆ = Fλˆ. For λˆ one can use the MLE (or
some other estimator). Since Fλˆ is completely known, one can sample as many
times as one wants and generate the bootstrap samples.
What changes in the two versions is how F is estimated. The procedure is otherwise the same. To sum up, draw B data sets of (generally and for this exercise) the same size as the original data set from the bootstrap approximation Fˆ to F ; evaluate the statistic of interest on the B bootstrap samples to get B bootstrap replicates.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme