In this project you will explore the effect of socio-economic factors on the school performance of children. The dataset includes 229 elementary schools in one federal state of the USA in year 2001.
■ maths - % of children passing the 4th grade maths
■ reading - % of children passing the 4th grade reading
■ enroll - number of enrolled pupils
■ expendp - expenditure per pupil in $
■ flunch - % of children eligible for the 'free lunch' program
■ medinc - median family income in the school's zipcode (in $)
■ ctotal - total number of children (in zipcode)
■ cmarried - number of children in married-couple families (in zipcode)
■ csingle - number of children not in married-couple families (in zipcode)
When answering questions, always write down all regression equations estimated. State any assumptions that you have made. Always justify and explain how you reached your conclusions, and make sure that all regressions and results are interpreted clearly.
1. What are your a priori hypotheses about the relationship between the maths pass rate and other variables in the dataset?
2. The last two variables are in levels. Explain why you might prefer to use their relative counterparts in your analysis (e.g., pmarried - % of children in married-couple families).
3. Graphically display the relevant data, using whichever graphs you think are appropriate. What information do they provide about the data?
4. Generate summary statistics for the data. What information do these give about the data? Would you say there are large differences between schools?
5. Generate correlation coefficients for the data and interpret the relevant coefficients. Do they have expected signs? Are you surprised by the sign and size of correlation between maths and reading? Between medinc and flunch?
6. Which test is harder to pass - maths or reading?
7. One economist suggests that the federal program, which provides poor children with free lunches, has a positive effect on their performance. Run a simple (bivariate) regression of maths on Hunch. Explain carefully what conclusion you make based on your regression.
8. Another economist suggests that children from two-parent households perform better, as they enjoy more support. Run a simple (bivariate) regression of maths on the percentage of children from two-parent households (pmarried). Interpret the slope coefficient. Does this effect seem large?
9. Add the variable Hunch to the above regression. What has happened to the effect of parenthood? Was this expected given the nature of correlation between the two explanatory variables? Explain.
10. Now develop a multiple regression model for the determinants of school performance in mathematics. Think about which variables you wish to use and which functional form. Estimate the model, interpreting your results clearly and testing any relevant hypotheses.
11. Which explanatory variable has the largest effect on the maths pass rate?
12. Obtain residuals from your model. Which school has the largest positive residual? Interpret this residual value.
13. What do you make of the R-squared in your model?
14. What is your overall conclusion about the effect of socio-economic factors on school performance? Discuss limitations and state clearly any problems of the empirical analysis in this project. What are some suggested directions for future research? What other variables could be used to improve the analysis?
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme