To complete the assignment, electronically submit 2 files: a Stata "do-file" that produces all the necessary numbers and output and your written answers to the questions with appropriate tables and figures inserted.
Your written answers should include any tables, graphs, etc that are requested or needed to support your answers. Please don't submit raw Stata output in your write-up.
Part 1.
USA Facts collected the fall 2020 reopening plans of the 255 largest school districts in the US. They have an article about it here (Links to an external site.). First, download the excel spreadsheet with the raw data here (Links to an external site.). It also includes data from the US Census at the school district level and information reported by states to the federal Dept of Ed.
You can import the excel file into Stata with the command (assuming Stata's working directory is where the file is). You may also want to refer to the Excel file itself as it has a separate worksheet with descriptions of what each variable mean.
import excel using schoolskids_data.xlsx, sheet("Data") firstrow
Create a frequency table of district's reopening plans. What portion of these districts are reopening with online only education?
Create a set of boxplots that examines whether districts that are reopening with online-only education differ in their percentage of non-Hispanic white students from those with other plans. Describe what each feature of the boxplot tells us about the difference between groups.
Create a density plot of the percentage of non-Hispanic white students, that has two lines, one for the density of districts with online instruction and one for all other districts. What can you learn from this? (There are multiple ways to do this, one is to use the addplot() option on the kdensity command. By putting the command for a new plot (ie another kdensity command) inside the parenthesis following addplot you can add a second plot to the graph)
If online instruction was randomly assigned to districts (while holding fixed the overall percentage of districts that are using online only instruction) how would the means of percentage of non-Hispanic white students differ. Follow the steps below and report
Save your dataset so we can use it later save schoolskids.dta, replace
Create a random variable called random evenly distributed between zero and one.
Use the cumul command to calculate for each observation what fraction of observations have a lower random value. You can do that with: cumul random , generate(cumulrandom). This will range from zero to one.
Create a variable called fake_online that is equal to 1 if cumulrandom is less than the fraction of districts that actually had online instruction (ie your answer in Q1) and 0 otherwise.
Calculate the mean percent of non-Hispanic white students among districts with "fake online" instruction using the sum command. Store that mean in a "local" called online_pctwhite with the command local online_pctwhite=r(mean). Repeat this for districts without "fake online" instruction. and store it in another local local notonline_pctwhite=r(mean). Create a new variable that contains the difference between these means with the command gen fake_diffpctwhite=`online_pctwhite'-`notonline_pctwhite'. Run the command tab fake_diffpctwhite to make sure your code worked.
Report these means and the difference. How does it compare to the difference between those with and without actual online instruction?
That was only one randomization. Now we will do it 1000 times. Take the code you created in Q4 and put it within a "loop". Loops in Stata allow you to do the same thing many times over. There are several loop commands, but we will use forvalues. Look at the help file help forvalues to understand what it does. Follow the steps below
Before starting your loop use clear to remove all the data we have open. Save an empty dataset called fake_online_simulation with save fake_online_simulation, replace emptyok.
Your loop will start with forvalues i=1/1000 { this will make the "local" i take on the values 1 then 2 then 3 etc up to 1000 each time it goes through the loop.
Load the dataset: use schoolskids.dta, clear
Now copy & paste your code from Q5.1-4
Keep only the first observation keep in 1. And keep only the variable with the end result: keep fake_diffpctwhite
Use append to add the new iteration of the result to the results that came before it. append using fake_online_simulation
Save the updated dataset save fake_online_simulation, replace (if you get an error message when you run your program that says your file can't be modified or erased add the command sleep 200 before the save command.)
Now close your loop with }
Report a histogram that shows the distribution of simulated differences. Describe the shape and position of the distribution. Is it symmetric? Roughly normal? Skewed? What can you conclude from this histogram about our research question.
A different way of thinking about this question would be to ask whether non-Hispanic white students (that are in one of these districts) are experiencing online education at a different rate than other students. Why is this different from the analysis we just did? (BONUS Q: answer that question. i.e. start by making a frequency contingency table showing students' experiences)
Part 2
For this part of the homework we are going to use the Census Household Pulse Survey (Week 7) dataset hhpulseweek7.dta.
Create a histogram of tspndprpd, spending on prepared food in the last 7 days, against a normal distribution (with the normal option). Describe the distribution. Is it symmetric? Roughly normal? Skewed?
What are the mean and median of tspndprpd?
We are now going to consider the set of survey respondents as a "population" and we are going to take "samples" from this population and examine what we can learn about spending on prepared food from those samples and how it compares to all survey respondents.
Suppose we surveyed a random set of 200 people among all the survey respondents. Do this in Stata with: sample 200, count
Use egen to compute the sample mean of tspndprpd. Call the sample mean sm_spndprpd. How does it compare to the mean among the true set of survey respondents?
Now we are going to do that 500 times.
clear your data and save an empty dataset with save spnd_samples, replace emptyok
Start a loop with forvalues i=1/500 {
Load your data
Now copy and paste from Q3.1-2
Keep only the first observation (keep in 1) and keep only the variable sm_spndprpd.
Use append to add the new iteration of the result to the results that came before it (append using spnd_samples) and then save the new file (save spnd_samples, replace)
Now close your loop with }
Create a histogram that displays the distribution of these sample means. Does it look similar to the histogram that you made in Q1? Where is this new distribution centered? Is it symmetric? Is it approximately normal?
Repeat Q4 but this time draw samples that each have 1000 observations (sample 1000, count). What is different?
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme