Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Nagendra Singh ChauhanMathematics

(/5)

913 Answers

Hire Me

Rob RouseData mining

(5/5)

943 Answers

Hire Me

Soni GiranComputer science

(5/5)

507 Answers

Hire Me

Antonio SullivannResume writing

(5/5)

569 Answers

Hire Me

JMP

(5/5)

Univariate profiling: Examine the shape of the distributions (histograms) and get summary statistics

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Question 1. Examining the data (30 pts)

Examine the HBAT-200 dataset (attached)

1. Stage 1: Graphical representation analysis

a. Univariate profiling: Examine the shape of the distributions (histograms) and get summary statistics (central tendency, dispersion, skewness, kurtosis, n missing, etc.) for each variable. (5 pts)

b. Bivariate profiling: Examine the relationship between variables by using scatter plot. Consider that X19 is the target variable and X6-18 are independent variables. Include the correlation between the variables and interpret the results. (5 pts)

c. Examine the group differences by using boxplot and check if outliers are present in the groups, and if the groups are different from one another. Please check the differences of X18-Delivery speed for each X1: Customer Type, X2: Industry type, X3: Firm size, X4: Region and X5: Distribution system. (5 pts)

2. Stage 2: Assessing assumptions including normality, homoscedasticity, linearity and noncorrelated errors. You are expected to do the following analysis:

a. Normality: Do statistical test (Shapiro-Wilks) for normality for each variable X6-19. Add also normal plots. Interpret the results. (5 pts)

b. Homoscedasticity: Do statistical tests (Levene test or Barlett’s test) to check whether variances are equal between the variables (X6-X18) and X1 (customer type) (Analyze – Fit Y by X). Interpret the results. (5 pts)

c. Linearity and Non-correlated errors: Fit a model between the predictor variables X6-X18 and the response variable X19 (Analyze – Fit Y by X) Check for residuals by predicted plot and interpret results. (5 pts)

Question 2: Exploratory Factor Analysis (35 pts)

Perform Exploratory Factor Analysis (EFA) using the Principal Component Analysis and Common Factor Analysis approaches on the HATCO dataset (attached).

You are expected to conduct a factor analysis for the HATCO dataset for the perceptions of HATCO on seven attributes (X1 to X7) using JMP and compile a data analysis report. Note that appropriate assumption checking, and validation steps need to be conducted to support your analysis results. Use the latent root criteria (eigen values>1) for deciding the number of factors to retain.

Below are a few questions to be addressed in your data analysis report within the relevant data analysis steps:

1. Upon inspecting the correlation matrix, how many correlations are observed to be significant at the 0.01 level? (4 pts)

2. Provide the partial correlation matrix for all variables under consideration. What is meant by partial correlations (in general)? What variables have higher partial correlations? (4 pts)

3. Use measure of sampling adequacy (MSA) to decide on the variables that need to be removed? (Use KMOS R file for MSA analysis or JMP) (4 pts)

4. For the remainder of the variables for the HATCO dataset, how many factors can be retained using the Scree test criteria? Also, how many factors can be retained using the latent root criterion? (4 pts)

5. Considering the latent root criterion, set the number of factors to be retained for the HATCO dataset, and derive the factors using principal component analysis. Provide the communality of the variables. What does the communality of a variable represent? (5 pts)

6. Perform an orthogonal rotation (preferably Varimax) and provide the factor loading matrix. Which variables load significantly on which factors? How much variance is explained by each factor? (5 pts)

7. Considering the latent root criterion, set the number of factors to be retained for the HATCO dataset, and derive the factors using common factor analysis. (3 pts/ each = 9 pts)

a. Provide the communality of the variables. Interpret the results.

b. Provide the significancy test results. Is the number of factors you used in the analysis significant? Interpret the test results.

c. Provide the factor loading matrix. Which variables load significantly on which factors? How much variance is explained by each factor?

Question 3: Multiple regression and Logistic regression analysis (35 pts)

Perform multiple regression analysis and logistic regression analysis on the HATCO dataset.

1. (20 pts) Conduct multiple regression analysis on the HATCO Dataset to predict the product usage levels (X9) of the customers based on their perceptions of HATCO’s performance (X1 to X7) using SAS JMP and compile a data analysis report. Use stepwise regression approach for your analysis.

Given below are a few questions to be addressed in your data analysis report within the relevant data analysis stages.

a. Provide scatterplots for individual independent variables against the dependent variable. Do any of them indicate any nonlinearity between the dependent variable and the independent variables? (3 pts)

b. Explain what you understand by the term “heteroscedasticity”. Is it desirable or not? What can you say about the heteroscedasticity of each of the relationships (between the dependent variable and each independent variable) based on the scatter plots obtained in step a) above? (3 pts)

c. Plot the normal probability plots for each of the seven independent variables. What do these plots indicate? Are data transformations for any of the variables required? (3 pts)

d. Considering the original variables without any data transformations, estimate the regression model using stepwise regression and answer the following questions. Assume the p-value threshold for adding new variables to the regression equation to be 0.05, and the p-value threshold for dropping any existing variables from the regression equation to be 0.10. Provide the regression model generated based on this analysis. What can you say about the strength of relationships (in terms of R2) that can be detected with the significance level set at 0.05? What variables are significant in the model? (10 pts)

e. What are the Variance Inflation Factor (VIF) for each of the independent variables in the final regression equation? What do they signify? Interpret the results. (3 pts)

f. What percentage of total variation of X9 is explained by the variate? (3 pts)

2. (15 pts) Determine whether differences in the perceptions of HATCO’s performance (X1 to X7) exist between the small and large size firms (X8). So, you are expected to perform logistic regression on this dataset. For validation purpose, use 60% of the data for training set and 40% for validation set.

Given below are a few questions to be addressed in your data analysis report within the relevant data analysis stages.

a. Run the logistic regression model. Is the model significant? Which variables are significant in this model? (10 pts)

(5/5)

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Nagendra Singh ChauhanMathematics

Rob RouseData mining

Soni GiranComputer science

Antonio SullivannResume writing

JMP

Univariate profiling: Examine the shape of the distributions (histograms) and get summary statistics

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Nagendra Singh ChauhanMathematics

Rob RouseData mining

Soni GiranComputer science

Antonio SullivannResume writing

JMP

Univariate profiling: Examine the shape of the distributions (histograms) and get summary statistics

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer