(5/5)

Advanced Topics in Statistics Assignment

Please make sure that the submitted wok is your own. This is NOT a group assignment, therefore approaches, solutions shouldn’t be discussed with other students. Plagiarism and collusion with other students are examples of academic misconduct and will be reported. More information on academic honesty can be found here.

The assignment has three main parts. Part A involves (i) fitting a Poisson regression model to assess the effect of using different priors, and (ii) fitting an auto-regressive process to time series data using the BUGS language in order to estimate missing data. Part B involves using different methods for classification of data into two groups. Part C involves producing a narrated power point presentation based on question 3 of part B.

Part A and B gives 80% of your final marks and Part C gives 20% of your final marks. [Assignment: 125 marks in total]

A. Bayesian Inference [66 marks]

1. The first question of part A involves fitting a Poisson regression model using the Ohio_Data dataset, which contains the observed and expected counts of lung cancer for counties in Ohio for 1988.

(i) [3 marks] Calculate the Standard Mortality Ratios (SMRs) for each county and plot the distribution of the SMRs. Next, plot a map of the SMRs by county. You may want to use the following code using the OhioMap function which uses random numbers (the file with the code is on the ELE page), or you can write your own.

We are interested in estimating the relative risk (RR) for each county and we are going to fit a Poisson model of the following form:

Obsi ∼ P ois(µi)

log(µi) = log(Expi) + β0 + log(θi)

RRi = exp(β0) ∗ θi

Where the prior distributions for θi are Gamma(α, α). Here, the Exp(ected) numbers are an ‘offset’,

i.e. we don’t assign a coefficient to them (or another way of putting it is that we fix the coefficient to be one).

(ii) [4 marks] Describe the role of β0 and the set of θ’s in this model and how they contribute to the estimation of RR.

(iii) [14 marks] Code up this Poisson-Gamma model in JAGS to analyse the Ohio data. Use the priors p(β0) Unif ( 100, 100) and α Gamma(1, 1). Initialise 2 chains and run the model with these two chains. You will have to decide on the appropriate values of n.iter and burnin. Produce

trace plots for the chains and summaries of all the parameters. Investigate whether the chains for all the parameters have converged.

(iv) [6 marks] Extract the posterior means for the RR and map them. Then calculate the posterior probabilities that the relative risk in each area exceeds 1.2. Extract these probabilities and map them.

(v) [6 marks] Repeat the analysis with different priors for β0 and α. The exact choice is yours, but explain why you have chosen them and what they mean. Map the two sets of RRs and explain any differences you see in the summaries of the posteriors for the parameters of the model.

2. One factor that affects the relative risk of lung cancer is air pollution. The dataset ohio_pm25.csv contains measurements of particulate matter (PM2.5) air pollution in Ohio for 1988-1989. However there is missing data. We will use JAGS to impute this missing data so that the PM2.5 measurements can be fed into the relative risk analysis at a later stage (note that this last step is not part of the assignment).

(i) [4 marks] Do some exploratory data analysis: summarise the data, then plot the PM2.5 measurements against time, highlighting (showing clearly) the periods of missing data.

We are going to fit a model that allows us to estimate these previously seen missing data by treating them as model parameters that will be estimated (and we find posterior distributions for them). As we have time series data, we are going to use the fact that day-to-day measurements will be correlated, i.e. today’s measurement will correlate with yesterday’s.

A random walk process of order 1, RW(1), is defined at time t as

Yt − Yt−1 = wt

Yt = Yt−1 + wt

Where wt are a set of realisations of random (or white) noise, e.g. wt N (0, σ2 ). Note the first line refers to the differences in the values at consecutive time points being white noise.

We are interested in fitting a random walk model to the Ohio data. The model will be of the following form:

Where σ2

Ohiot ∼ N (Yt, σ2)

Yt ∼ N (Yt−1, σ2 )

is the variance of the white noise process associated to the random walk. We then make

noisy measurements of this random walk process, thus Ohiot, the measurement we have at time t, equals to the true value of the underlying process Yt plus some measurement error. In the formula above, σ2 is the variance of this measurement error.

(ii) [12 marks] Code this model using the model definition below in JAGS to analyse the Ohio data for 1988(!). Due to the nature of the model you will have to explicitly specify a value for Y1 in the model (i.e. for the first time point as Y0 doesn’t exist). One suggestion might be Y1 dnorm(6, 0.001). The model definition can be found below.

Run the model for 10,000 iterations, with 2 chains, discarding the first 5,000 as ‘burn-in’. Produce trace plots for the chains and summaries for the fitted parameters (including the missing data). In your solution file you should include a representative sample of this output.

Hint: You will have to initialise both chains. One suggestion might be using the mean and median to initialise the missing values of Ohio, and using random uniforms (with a narrow interval centred around say 6) to initialise Y.

(5/5)

CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,

Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This

7COM1028 Secure Systems Programming Referral Coursework: Secure

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme