Recently, there has been dramatic growth in the development and application of Bayesian inference in Statistics. In Bayesian field, Monte Carlo Markov Chain (MCMC) are often used to simulated direct draws from complicated distributions of interest. R is a well-developed simple programming language with a wide range of functions for data manipulation and graphs display. The purpose of this master paper is using Gibbs sampling simulation in R program to generate three posterior distribution of parameters from Bayesian hierarchical modeling based on response time from non-schizophrenics and schizophrenics and predict the probability of non-schizophrenic reacting faster than schizophrenic. Due to lack of information of prior, MCMC samples we simulated maybe or maybe not approximate the true distribution. So there were several statistical diagnostic methods were adopted to estimate the convergence of the Markov chain simulated.
First, I cannot express enough thanks to my advisor Dr. Jun Ye’s careful guidance and continued support for this master paper. Then I would like to express sincere gratitude to my reader Dr. Sujay Datta for reviewing this paper and giving suggestions. The knowledge and skills that I have gained from this paper with their help are priceless and very important for my future development. Finally, to my family, especially my husband, I am so grateful to your support and encouragement.
Bayesian inference is a method of statistical inference based on Bayes’ theorem which shows the relation between two conditional probabilities that are the reverse of each other. The main difference between frequentist (classical) inference and Bayesian inference is, for the former, the unknown parameters are fixed and the probabilities are objective, for the latter, parameters are treated as random variables and probabilities” are subjective which are considered as “degree of belief” (Rathnayake, R.C., 2010). In Bayesian statistics, based on a prior distribution over the unknown parameters formulated according to some beliefs and observing data y, , a posterior distribution can be obtained which takes account of both the prior and data. The posterior probability of a model is proportional to the prior probability times the likelihood which can be expressed as p(θ|y) ∝ p(θ) x p(y|θ). We don’t always do Bayesian since the calculations needed for Bayesian statistics can be overwhelming and prior distribution is subjective and maybe result in different results (Parker, M., 2005).
The dataset of this paper presented response times (in milliseconds) for 11 non-schizophrenics and 6 schizophrenics (30 measurements for each person) (Belin and Rubin, 1990). The dataset had been used in Figure 18.1 from the book “Bayesian Data Analysis”, third edition by Dr. Gelman. Psychologists at Harvard University performed an experiment measuring thirty reaction times for each of seventeen male subjects: eleven non-schizophrenics and six schizophrenics. Manual reaction times to visual cues, where subjects watch a screen and move their fingers from one button to another when a signal appears, were measured. There are 30 measurements per individual. Fig. 1 and Fig. 2 are histograms from these two groups with normal curves.
The posteriors of non-schizophrenics and schizophrenics comparison is as Fig.11. From R program, the probability that schizophrenics has longer response time than non-schizophrenics is 0.8804
If we considered variabilities from population of schizophrenics and non-schizophrenics are higher than it form the observed data, we get the comparison as Fig.12. And in this situation, the probability that schizophrenics has longer response time than non-schizophrenics is 0.8466.
If we considered between-group variabilities are higher than within-group variabilities for non-schizophrenics and schizophrenics, we get the comparison as Fig.13. Here, the probability that schizophrenics has longer response time than non-schizophrenics is 0.8620.
Like all statistical methods, MCMC method has its own disadvantages. The key one is the difficulty to determine the convergence of this algorithm. After the sample chain from MCMC algorithm being constructed, we should assess whether the chain is under a stationary distribution, how many steps are needed to converge to the stationary distribution and whether the stationary distribution is reached quickly and so on. This stationary distribution is the true posterior distribution that we are interested. The convergence estimation is important since the MCMC simulation is often started at a random point in parameter space and is often far from the true high density regions of the posterior distribution (Sahli, K., 2011) and we should make sure the efficiency of our simulated samples before making any inference.
CODA is a menu-driven set of S-Plus functions which serves as an output processor of BUGS (Bayesian inference Using Gibbs Sampling) software (Best, N., Cowles, M.K. and Vines, K., 1995). It can be used to perform convergence diagnostics. So for this paper, we mainly use functions in this R package to estimate convergence of MCMC. We use visual inspection at first, such as trace plots and stationary boxplots, to see how the chain mixing and how the samples distribute. Then we use “geweke. diag”, “heidel.diag” and so on (in coda) which are statistical diagnostics to generate output for assessing MCMC convergence.
One nature of Markov chain is that members of a sample will generally be correlated with each other which will slow the algorithm in its attempt to sample from the entire stationary distribution (Cowels, M.K. and Carlin, B.P., 1996). Autocorrelation is a measure of how independent different samples from your posterior distribution are. A Markov chain with high autocorrelation moves around the parameter space very slow, taking a long time to achieve the correct balance among the different regions of the parameter space. The higher the autocorrelation, the more MCMC samples we need to attain a given level of precision for our approximation (Hoff, P.D., 2009). So lower autocorrelation means higher efficiency in your chain. In this paper, we use R function “acf” to represent autocorrelation plot for each MCMC output (see R-script in appendix).
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme