logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
ShreyeshComputer science
(/5)

933 Answers

Hire Me
expert
Jeremy BauerPhilosophy
(5/5)

641 Answers

Hire Me
expert
Drake WeberPhilosophy
(5/5)

865 Answers

Hire Me
expert
rahul kumarOthers
(/5)

873 Answers

Hire Me
STATA

short introduction frames your paper, providing the focus and rationale for your research. The introduction needs to pique the reader\'s interest

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

The final paper is designed to be the culmination of your learning throughout the semester. You will write an original research paper centered around a multiple regression analysis. In this paper, you will: 

●  Ask a research question 

●  Construct a short but dense literature review 

●  Narrow your research question down into a specific testable hypothesis or 

hypotheses 

●  Find and clean an appropriate data set 

●  Run a regression analysis 

●  Explain the results of the regression, including which variables are statistically 

significant, which have the largest effects, and any limitations in extrapolating from the results. 

The organization of your final paper should follow the outline below. 

Introduction 

300 to 500 words ( 1.5 pages max) 

The short introduction frames your paper, providing the focus and rationale for your research. The introduction needs to pique the reader's interest, and – even more importantly! – provide big-picture context for why we care about your results. It must be short, a page or page and a half (double-spaced) at most. 

In the introduction’s final paragraph or two, mention the data set you used and the specific hypothesis or hypotheses you tested. Your concluding sentences to the introduction should -- very briefly! -- tease the results of your analysis. Conclude the intro with either a one or two sentence summary of the results, or with a sentence that hints at but doesn’t fully reveal what you found. 

Literature review 

700 to 1000 words (​ 3 pages max) 

The literature review is a crucial part of the paper. Like those of other regression analysis papers, your lit review should be short -- only a few pages in length -- but quite dense. 

The goal of the lit review is to explain existing scholarly knowledge on your chosen topic. It helps us give us a fuller picture of why we care and what ​you​ are adding to our understanding. Just as importantly, the lit review helps build​ informed expectations​ about the results of your regression analysis. 

20 to 25 academic references are typical. However, d​ o not s​ pend an equal amount of time on all of these recent references! 

Identify the 4-6 pieces of research that are either foundational to the work that you’re trying to do, or the recent studies most similar in methods and data. Spend the most time and space on these most important studies. 

Other scholarly references can be summed up in a sentence or two, often in sentences and paragraphs that group them with similar studies. For example: “Several recent studies using field experiments have found that social pressure mail increases voter turnout (Smith 2012, Jones 2019, Wu and Hahn 2020).” 

Google scholar is your friend in constructing this literature review efficiently and comprehensively. P​ rotip: c​ opying references directly from google scholar into your reference list is just fine, and will make this much faster. 

Hypotheses 

300 words (​ 1 page max, possibly shorter) 

Explain the relationship(s) you expect to find with your core regression analysis. Remember: a hypothesis is a ​directional relationship between two or more variables!​ 

Here's example: 

H1: Controlling for other factors, I expect that individuals who live in urban areas will be have a larger increase in turnout after receiving the mail treatment than who live in rural areas. 

Make sure that your hypotheses are connected to the findings from the literature review! The literature review should flow naturally into your hypotheses. 

Data and methodology 

Length varies 

From here, the format of the final paper follows the regression template closely. 

Explain the data set you used. Tell us where you found the data, who collected it, when the survey was in the field (if applicable), etc. etc. 

Tell us briefly about any data cleaning and/or variable creation you had to do. 

Focusparticularlyonyourk​ eyvariablesofinterest​.Thatincludesthedependent variable, and the explanatory variables you hope to examine with your hypothesis above. Make sure we understand how the variable is coded, after you have done any data wrangling or data cleaning necessary. 

Finally, do a s​ umm​ command with the list of ​all​ of the variables you include in at least one of your regressions. Copy those results directly into your paper 

Results 

Length varies 

Now, perform the regressions. Copy and paste the results from the regression directly into your final paper (resized and in a fixed-width font like Courier). 

Your interpretation of the regression is just as important as getting the Stata commandright!S​ ospendthetimetocarefullytelluswhattheseresultsmean. Which variables in your regression are statistically significant? Which variables are in your judgment substantively significant, with X having a big affect on Y? 

Gentle reminder: in considering which variables are substantively significant, you have to consider the range of possible X values. For example, the coefficient for age may be small, but adults can range in age from 20 to 100. 

For students who are replicating previous work, make sure that you try out several different regression models, explaining if the results differ when including slightly different sets of variables. 

Conclusion 

300-400 words 

In about a page, sum up what we can learn from your regression. Emphasize the most important findings and strongest relationships in the regression. Tell us whether the evidence was consistent with your hypotheses. 

The conclusion is also the place to explain the limitations of your results, or any worries we should have about generalizing from them. Remember the discussions of regression mistakes in the Allison text and the Wheelan book -- are any of these likely to be a worry for your analysis? For example, is there a potentially important variable you would have liked to include in this analysis that was not present in the data set? Only discuss those you think are relevant. 

If you choose, this is also the place to discuss where you think future work on the topic needs to focus. 

A final caution: don’t oversell or undersell your results! A successful final project is not one that answers every question with finality. Instead, good papers explain both what a data set can tell us, and what it CANNOT tell us. It’s ok to leave key questions for different studies and data sets. 

References 

We’re looking for 20 to 25 references here. 

No specific format is required for the references, but APA format and Chicago format are easiest. 

Appendix: .do file 

Include your full .do file as an appendix. Copying and pasting it 

Template for Basic Regression Analysis 

Version 1.1 

STEP ONE:

State Your Hypotheses 

Good research starts with good questions—preferably questions that you, yourself find interesting! 

In this class, often the research question will be given to you. In other classes or other contexts, figuring out good research questions usually starts with a literature review. If you are NOT given a research question, start a lit review by examining 2-3 high-profile, recent articles on the topic, paying special attention to their discussion and framing of recent research. Look at the most cited pieces in your area of interest, using tools such as Google Scholar or the Social Science Citation Index. Build outward from there. 

 

You need to express your research questions in terms of hypotheses. A hypothesis is the expected directional relationship between two or more variables. For example: “I expect that more years of education will be associated with higher income, even controlling for other factors.” 

OUTPUTS FOR STEP ONE:

In two or three sentences, ​before you start looking at the data, ​write down what you expect to find.: 

● Explain what your hypotheses are.

● Tell us which of the explanatory variables are most important to test your 

hypothesis.

● Explain which control variables are likely to be most important, to prevent 

other effects from confounding the relationship. (NOTE: Think especially about common demographic and socioeconomic variables: age, income, education, race/ethnicity, gender, locality, etc.) 

STEP TWO:

Data Check: Look At Your Data—Directly! 

The first thing you should always do after opening up a data set for the first time is to ​look at your data​! This is very important. Open the data editor in browse mode; it should look like a spreadsheet. Scroll through, both up and down, and left to right. 

Look at the data both with the value labels (if any) shown, and with them hidden (Data Editor > Value Labels). Remember that Stata denotes missing data with a period (e.g. “ . “). 

OUTPUTS FOR STEP TWO Answer the following questions: 

●  Do you see any missing values? 

●  Do the values of the variables make sense? 

●  How many observations do you have? 

STEP THREE:

Data Cleaning and Creating New Variables (If Needed) 

You began by listing the variables you needed, above. Does the data set provide all of these in usable form? Often you will need to create one or more new variables. 

In creating new variables, think especially about: 

 

● Turning categorical data into dummy variables -- or use the ​i.​ or ​ib. 

 

commands to do this automatically. If you do, make sure that you make the right choice about the base category. 

 DO NOT make unnecessary dummy variables! ​Don’t take an income variable and turn it into a rich / poor dummy. Don’t take an age variable an turn it into a young /old dummy, etc. This throws 

away information and makes your regression perform worse. 

●  Rescaling variables that are non-normal or highly skewed (e.g. taking the log 

of income -- discussed later in the course) 

●  Creating interaction variables (discussed later in the course) 

●  Adding squared versions of variables if you expect nonlinear relationships 

In some data sets, you may have so-called “string” variables, where the answer is actually text instead of a numeric value. These have to go! The ​gen​, ​replace​, and recode​ commands are your friends. Also: the ​encode ​command (particularly with the force option if needed) is a good way to make a string variable numeric. 

For example, you might have a simple party preference variable that has three categories (Democrat | Independent | Republican). Usually this is a label for an underlying numerical value. ​Turn off the labels​ using the options on the data viewer to see the underlying data. Or alternatively, using the command line, you can use the nolabel option to show the underlying numerical value. For example: 

● tab partypref

● tab partypref, nolabel 

In the most common case, the real variable is stored as a number. So for example, a variable might have three categories: Democrat = 1, Republican = 2, and Independent = 3. (NOTE: It’s actually better to do a FIVE category scale with partisan leaners, but for this example we’ll keep it simple.) 

● gen partypref2 = 0

● replace partypref2 = 1 if partypref == 1 ● replace partypref2 = -1 if partypref == 2 ● summ partypref2

● tab partypre2 

More rarely, you will find cases where there is no underlying number​,​ and the real “value” is just a text string. In this case you might want to convert it to a number variable like so: 

● gen partypref2 = 0 ● replace partypref2 ● replace partypref2 ● summ partypref2 

● tab partypref2 Even easier: 

= 1 if partypref == “Republican” = -1 if partypref == “Democrat” 

● encode partypref, generate(partypref2)

.... And then replace the values so that -1 = Democrat, 0 equals Independent, etc. In both cases above, ​partypref2​ is now a -1 | 0 | 1 variable, and can be included in the regression. Make sure to note the double equal signs == and the quotation marks. Make sure the capitalization matches. Explain and write down very briefly what you’ve done. Before moving on, check that the variable creation worked by both 1.) browsing with the data editor and 2.) using the ​summ​ and ​tab ​(for discrete data) commands. 

OUTPUTS FOR STEP THREE 

●  Note any variables you had to clean or recode. 

●  Did you need to create any variables? 

●  Copy and paste the Stata code you used​ ​for cleaning and variable creation. 

STEP FOUR:

Summarize the Key Variables 

Next, use the ​summ ​command to summarize the characteristics of all of the variables. What are their means and standard deviations? Their maximum and minimum values? This is especially important for your dependent variables, and for important explanatory variables. But you should do this for all of the control variables, too! ​As always, cut and paste the results into your assignment. ​Make sure that we know the mean and s.d. for all of the variables used in the regression. 

summ​ works great with both continuous variables, and also with dummy (0|1) variables. But for variables with discrete categories (e.g. if the value is 0, 1, 2, or 3) it can be misleading. For these variables, use the ​tab​ command to show us how the data is distributed (how many 1s, how many 2s, how many 3s, etc.) 

OUTPUTS FOR STEP FOUR 

●  Copy and paste the output of the ​summ ​command for all variables used in 

the regression. 

●  If any of your key explanatory variable(s) have just a few values, copy and 

paste the output of the tab command as well. 

STEP FIVE:

Present the Full Regression Model (or Models) 

Next: run the regression, and then​ present the regression model​, with both the theoretically interesting explanatory variables, and the full set of control variables. 

Note that in most assignments you want 8-12 variables or sets of variables (counting sets of dummies as one variable). Two few and you aren’t getting the advantages of multiple regression. 

In writing up your results, you always need to mention three things: 

●  Are the coefficients in the expected direction​ (positive, negative)? For example, if you expected those with more education to have higher income, is that what you find? Compare with your initial expectations above. 

●  Which variables are statistically significant? 

●  Which variables are ​substantively​ significant​—i.e., which have the largest estimated effects? 

Note that in many cases, it makes sense to present multiple versions of the same model. This is especially true if: 

●  You have several different measures of a variable of interest. You can’t put them all in at once, since they are likely highly correlated: use them one at a time in different models. 

●  You have different dependent variables to examine. (e.g. If you’re interested in Internet use, you might have one regression on email use, one on facebook use, one on twitter use, etc.) 

Explain: what portion of the total variance does the model explain (hint: R-squared)? What is the “average miss” of the model on the in-sample data (i.e. root-MSE)?​ We’re less concerned about R-squared and RMSE with hypothesis 

testing than when we’re trying to make accurate predictions, but these stats should always be reported. 

OUTPUTS FOR STEP FIVE:

● Run the regression, and copy and paste the regression tables. 

○ Unless the assignment specifies otherwise, use a logit for dummy/binary dependent variables. 

●  Explain how the regression performed overall, with reference the R or R-squared and the RMSE. 

●  For the regression, explain the regression results. For your ​key variables only,​ discuss whether: 

○  the coefficients where in the expected direction (positive, negative) 

○  whether the coefficients were statistically significant. 

○  whether the coefficients are ​substantively​ significant. What counts as a 

large effect is up to your own judgment! Just show that you understand what the results are telling you. 

STEP SIX:

Explain the Limits of the Regression Analysis 

Conclude by explaining the limits of the analysis. What are the potential concerns? Mentally walk through the list of regression problems featured in Ch. 2 of the Allison text. ​List—and explain—concerns that might be relevant here​. Especially think about: 

●  Possible missing explanatory variables 

●  Reverse causation or selection bias 

●  Measurement error, in both the dependent and explanatory variables 

●  Sample bias. Is the sample a good reflection of the relevant population? If not how might the relationship between the variables be skewed? 

●  Sample size issues (both too big, where trivial effects may be statistically significant, or too small, which requires caution on several fronts) 

●  Strong correlation among two or more explanatory variables. 

●  Mediating variables, which may present problems of interpretation. These could be between two or explanatory variables (the family income -> GPA -> SAT scores examples). It could even be a third variable that affects both x and y. Only discuss factors that you think are relevant. 

OUTPUTS FOR STEP SIX: 

● Discuss and explain any of the above issues that you have reason to be 

concerned about, ignoring those you don’t think are relevant. 

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme