People are notoriously bad at generating random numbers in their heads. In this problem, we will compare random binary sequences generated experimentally versus ones that are made up.
A. Write down a sequence of 100 binary digits (each digit is a ‘0’ or a ‘1’) that you make up off the top of your head without any experimental or computational help.
B. Find a way to generate a sequence of 100 binary digits that represent an independent sample of size 100 from a Bernoulli distribution with p=0.5.
C. Let xk be the kth digit in either of the sequences you generated. Compute the sample mean, for both sequences. What is the expected value for independent Bernoulli random variables with p=0.5?
samples, for both sequences.
What is the theoretical covariance for independent Bernoulli random variables? How do the sample statistics from each sequence compare to the theoretical values?
D. A run is a sequence of adjacent ‘0’s or ‘1’s.
If a sequence of random Bernoulli variables is drawn independently, after drawing a first ‘0’ or ‘1’, what is the probability that the first run will be of length 1?
What is the probability that the first run will be of length 2? What is the probability that the first run will be of length ?
Plot the empirical PMF and empirical CDF of the lengths of runs from both sequences, and compare these to the theoretical distributions.
E. Discuss how well your made up binary sequence resembles an actual sequence of independent Bernoulli[0.5] random variables. If you wanted to determine whether a binary sequence was generated from an independent Bernoulli process or made up by a person, which statistics would you check?
F. Write code to generate 0s and 1s in the following way.
1. For the first sample, randomly select a 0 or 1 with probability 1/2 for each.
2. If the last sample was a 1, the next sample will be a 1 with probability p and a 0 with probability 1-p. If the last sample was a 0, the next sample will be a 1 with probability 1-p and a 0 with probability p.
3. Repeat until the sample is length 100.
G. Come up with a statistic to estimate in the above problem by computing the number of times the following sequential pairs appear in the sequence: 00, 01, 10,
4. For example in the sequence 0001101 you would make this table.
Use this statistics to estimate for your sequence from part A. Do you think the above model is a good model for your sequence? Is there something different about your sequence?
Problem 2 The dataset contains (made up) pilot data for a test of a new cholesterol drug. 20 high-cholesterol patients were assigned to a drug group and 20 patients were assigned to a placebo group. Their blood cholesterol levels were measured before and after a 1-month regimen of the drug or placebo. The dataset contains the patient blood triglyceride levels in mg/dL before and after the regimen. The group variable contains a ‘0’ for the placebo group and a ‘1’ for the drug group.
A. Load the data into a statistics software package (such as R or MATLAB). Visualize the data before and after the intervention.
Use some descriptive statistics to describe and compare the data between the before and after periods.
Describe the structure of the data in words based on your statistics and visualizations.
B. Compute the change in triglyceride level for each patient, both as a change in the raw value (in mg/dL) and as a percentage change from the period before the regimen.
Visualize and use descriptive statistics to characterize features of the levels of change for both the drug and placebo groups.
Also compute the fraction of patients who saw a reduction of triglyceride level from each of the drug and placebo groups.
Is there evidence for an improved effect of the drug relative to a placebo?
C. When you show your results to the clinicians studying the drug, they hypothesize that some of the measurements might be abnormally high because patients did not fast before their blood was drawn.
Abnormal data points are often called outliers.
Is their any evidence in the data of abnormally high measurements? If so, remove these outliers and repeat the analyses from part B.
D. What would you recommend to the researchers regarding this dataset? What are the potential drawbacks of removing the outlier points in part C?
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme