INTRODUCTION
The project will be preparing data for analysis, a process known as data cleaning. You will explore various graphs and statistics to identify outliers, consider various methods to handle missing data, such as imputation, and explore a basic use of principal component analysis (PCA) for data reduction of a setoff variables.
To complete this assessment, you will use raw data from the industry of your choice and prepare the data set for analysis. You will also create visualizations and deliver a clean data set ready for exploratory analysis.
SCENARIO
For this task, you will use the medical_raw_data.csv Data Set file.
You will review the Data Dictionary to understand the needs of the company and to prepare to clean the data. In this assessment, you will analyze the .csv data file, also referred to as the data set.
Note: This assessment may require you to submit pictures, graphics, and/or diagrams. Each file must be an attachment no larger than 30 MB in size. Diagrams must be original and may be hand-drawn or drawn using graphics program. Do not use CAD programs because attachments will be too large.
REQUIREMENTS
Part I: Research Question
A. Describe one question or decision that you will address using the data set. The summarized question or decision must be relevant to a realistic organizational need or situation.
B. Describe the variables in the data set and indicate the specific type of data being described. Use examples from the data set that support your claims.
Part II: Data-Cleaning Plan
Note: You must use R as the programming language for implementing your coding solutions, manipulating the data, and creating visual representations.
C. Explain the plan for cleaning the data by doing the following:
1. Propose a plan that includes the relevant techniques and specific steps needed to identify anomalies in the data set.
2. Justify your approach for assessing the quality of the data, include:
• characteristics of the data being assessed,
• the approach used to assess the quality.
3. Justify your selected programming language and any libraries and packages that will support the data-cleaning process.
4. Provide the code you will use to identify the anomalies in the data.
Part III: Data Cleaning
D. Summarize the data-cleaning process by doing the following:
1. Describe the findings, including all anomalies, from the implementation of the data-cleaning plan from part C.
2. Justify your methods for mitigating each type of discovered anomaly in the data set
.
3. Summarize the outcome from the implementation of each data-cleaning step.
4. Provide the code used to mitigate anomalies.
5. Provide a copy of the cleaned data set.
6. Summarize the limitations of the data-cleaning process.
7. Discuss how the limitations in part D6 affect the analysis of the question or decision from part A.
E. Apply principal component analysis (PCA) to identify the significant features of the data set by doing the following:
1. List the principal components in the data set.
2. Describe how you identified the principal components of the data set.
3. Describe how the organization can benefit from the results of the PCA
Part IV. Supporting Documents
F. Documentation of error-free functionality of the code used to support the discovery of anomalies and the data cleaning process and summarizes the programming environment.
G. Reference the web sources used to acquire segments of third-party code to support the application. Be sure the web sources are reliable.
H. Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme