All of the answers must be submitted in a txt file or .r file to the assignment link in module 7 of myelearning.
The first set of data we’re going to have you work on is data from an analysis of the levels of an enzyme (HK) in butterfly populations and a range of climatic conditions, including altitude (ALT), Precipitation (PRECIP), Minimum temperature (MINTEMP), and Maximum temperature (MAXTEMP). You have been provided with these three variables in the “butterflies.csv” file. You ought to know how to bring this data into R, so go ahead and do so now.
1) Produce scatter plots of the dependent variable (HK) against 4 independent variables (ALT, PRECIP, MAXTEMP, MINTEMP). Put these graphs into one figure split into 4 quadrants, and put labels on all axes. Comment on the shape of each of these graphs. Is there a discernible pattern with either of these graphs? Judging by the graphs, which of the independent variables might have more of an effect on the levels of HK? (5mks)
2) Produce histograms of each of the variables, both dependent and independent and put them in your answer (with labels). Judging from this, which of the variables illustrates a normal type distribution? Conduct basic statistics to give the skewness and kurtosis of each of these variables. Provide the values for kurtosis and skewness for each of these variables and interpret these values. You will need to do some investigation to figure out what the interpretation of
these values means. (Hint: look up the “moments” package). (6mks)
3) Carry out a correlation on these four independent variables and include your results in your answer. Is there anything we should be concerned about with these correlations? (2mk)
4) The first regression analysis to be conducted is a regression between HK with ALT, PRECIP, MAXTEMP, and MINTEMP. You should know how to do this based on what was demonstrated in class. Interpret the results for the regression analysis in terms of which variables were significant in the model (at an alpha level of 0.05), how well the model is explaining the variance in the levels of HK, and the significance of the model. Do we have a problem with the variance inflation factors? (Hint: install and run the car package, and also read the reference “An R companion on Applied Regression – Chapter 6” provided for information about interpreting vif values). If so,
what is this problem and how does this affect our results? Comment on the diagnostic plots. (8mk)
5) Have a look at the anova of the model. Does this differ from the coefficients table of the regression analysis? Why is this? (2mk)
6) Given some of the problems above, you should know that you ought to remove one or more variables. Rerun the regression analysis with this/these variables missing, including the VIF (if necessary – can’t calculate VIF for a simple linear regression) as well as the 4 diagnostic plots (in a single frame using the par function). Does this resolve the problem? Discuss which final model you have chosen, and provide the results for your model, interpreting the results of the model for the regression analysis in terms of which variable/s were significant in the model (at an alpha level of 0.05), how well the model is explaining the variance in the levels of HK, and the significance of the model. Briefly discuss the effect that each of the variable/s has in terms of your model (ie whether an increase or decrease results in a resultant increase in the dependent variable). For your chosen model, comment on the diagnostic plots. You should also compare the anova table to the coefficients table in the regression result. (12mk)
7) Plug these values into your final regression equation to derive a value for HK (this of course depends upon which variable/s you have in the final model). Comment on the utility of using regression equations to extrapolate data. Is this wise? Explain your answer. (5mk)
ALT = 4
PRECIP = 65
MAXTEMP = 110 MINTEMP=-20
In this assignment, you will need to carry out some of the diagnostic tests prior to use in logistic regression analyses, conduct the analyses, conduct diagnostic tests afterwards and interpret the results of your analysis. You will be provided the data which you should be able to bring into R.
Having demonstrated this in class, it is expected that you will know how to carry out the analyses being asked of you with minimal assistance. If you are having difficulty, I urge you to review the powerpoints as well as the recordings of the Webex sessions, as well as the example R code which has been made available to you. You may also use the forum on the myelearning site to interact with your peers and instructor if you are in dire need of assistance.
You have been provided with a dataset of species’ presence (incidence) given the area of the island (area), the degree of isolation (isolation), the quality of the habitat on the island (quality), an index for predators on the island (enemy) and an index for competitors on the island (competitor) entitled “island.txt”.
8) Comment (using the hashtag #)on what you might expect to find with respect to the presence of this species on these island habitats. How might these explanatory variables be expected to influence species presence?
9) Plot normal qqplots of the explanatory variables. Do you see any problems with these? (3mks)
10) Investigate the explanatory variables to determine whether or not we have a problem with multicollinearity (5 mks).
11) Run a logistic model with incidence being the dependent variable, and with area, isolation, quality, enemy, and competitors being the explanatory variables. Check the variance inflation factors for this model and interpret them. (5mks)
12) Run a summary of the model and interpret the model in terms of what explanatory variables are significant, and, for those variables that are significant, interpret the log odds of this/these variables in how they change the dependent variable. (3mks)
13) Calculate the odds ratio for the model. Again interpret the odds ratio in the way that it affects the dependent variable for the explanatory variable/s that were significant in the model. (3mks)
14) Calculate whether or not the model is significant. (4mks)
15) Look at the influence measures of the model. Are there any data points we should be concerned about? (2mks)
16) Find the deviance residuals and the Pearson residuals for the model. Do any of these residual values conform with what you observed in question 15? Do histograms of both the deviance and Pearson residuals, comment on the shape of these histograms. (4mks)
17) Given the fact that some of the explanatory variables in question 12 didn’t come out as being significant, how would you rerun this model (i.e. what variable/s would you include in the model?). Run this model and interpret the results as you did before (steps 12 to 16). (16mks)
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme