Part 3
The marine biologists research team are satisfied with the excel application you have developed, which helped them greatly understand the abalone across the country. On top of the analysis, you have done in part 2, the scientists are keen to find some underlying patterns from the abalone data. In other words, the research team wants to build mathematical models for the abalone data, which reveal the fundamental relationships among the variables in the abalone data. As an expert in business analytics, you have the perfect skill set for this task.
To build a solid model, you need to go through the following steps and finalize your model in the end.
Task 1. Prepare the dataset
Firstly, you need to prepare the data for building the model. In classic data modeling tasks, you only use a portion of the data to train your model – this portion of the data is called the training set; the rest of the data are used to evaluate the performance of your model – this is called the test set.
What you need to do:
a. Create a new excel file called “Firstname_Lastname_DataModeling.xlsx”.
b. Name your current worksheet “Original Data”.
c. Copy the data in your “Personal Data” worksheet from your semester project part 2 and past the data set in the “Original Data” worksheet.
d. Create a new worksheet called “Training set” and copy the first 2/3 of the data from the “Original data” and paste them here.
e. Create a new worksheet called “Test set” and copy the rest of 1/3 of the data from the “Original data” and paste them here.
Task 2. Find relationships among variables in stacked data
Before modeling the data, you need to have a better understanding of the relationship among variables. The research team have specified a set of numerical variables that they care the most. They are listed in the table below. In particular, scientists are mostly interested in the rings of the abalone, since it tells the age of the abalone.
Length Diamete r Height Whole_weig ht Shucked_weig ht Viscera_weig ht Shell_weig ht Rings
What you need to do:
a. Create a new worksheet called “Stacked data analysis”
b. Use the “Training set”. Explore and create histograms for different variables listed above and then pick 3 most interesting histograms and describe the characteristics of each of them.
c. Use the “Training set”. Create a box plot for Shucked_weight, Viscera_weight and Shell_weight and describe characteristic for each of the variable in the plot.
d. Use the “Training set”. Explore and create scatter plots for different variables listed above, then pick 5 most interesting scatter plots and describe the characteristics of each of them.
e. Use the “Training set”. Calculate the correlation between every pair of the variables listed above. Identify the top-5-strong corelated variables. Apply conditional formatting on your computed results that indicates top-5-strong correlations.
f. Use scatter plots to demonstrate the strong corelated variables. Describe your findings.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme