logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Laurence FarrelllPsychology
(5/5)

846 Answers

Hire Me
expert
Chris BarrieComputer science
(5/5)

687 Answers

Hire Me
expert
Sohail AliScience
(5/5)

856 Answers

Hire Me
expert
AbdulrazzakEngineering
(/5)

618 Answers

Hire Me
R Programming
(5/5)

The project will be preparing data for analysis, a process known as data cleaning.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

INTRODUCTION

The project will be preparing data for analysis, a process known as data cleaning. You will explore various graphs and statistics to identify outliers, consider various methods to handle missing data, such as imputation, and explore a basic use of principal component analysis (PCA) for data reduction of a setoff variables.

To complete this assessment, you will use raw data from the industry of your choice and prepare the data set for analysis. You will also create visualizations and deliver a clean data set ready for exploratory analysis.

 

SCENARIO

For this task, you will use the medical_raw_data.csv Data Set file.

 

You will review the Data Dictionary to understand the needs of the company and to prepare to clean the data. In this assessment, you will analyze the .csv data file, also referred to as the data set. 

Note: This assessment may require you to submit pictures, graphics, and/or diagrams. Each file must be an attachment no larger than 30 MB in size. Diagrams must be original and may be hand-drawn or drawn using graphics program. Do not use CAD programs because attachments will be too large.

 

REQUIREMENTS

 

Part I: Research Question

A. Describe one question or decision that you will address using the data set. The summarized question or decision must be relevant to a realistic organizational need or situation. 

 

B. Describe the variables in the data set and indicate the specific type of data being described. Use examples from the data set that support your claims. 

 

Part II: Data-Cleaning Plan

Note: You must use R as the programming language for implementing your coding solutions, manipulating the data, and creating visual representations. 

 

C. Explain the plan for cleaning the data by doing the following:

1. Propose a plan that includes the relevant techniques and specific steps needed to identify anomalies in the data set.

 

2. Justify your approach for assessing the quality of the data, include:

• characteristics of the data being assessed,

• the approach used to assess the quality.

 

3. Justify your selected programming language and any libraries and packages that will support the data-cleaning process.

 

4. Provide the code you will use to identify the anomalies in the data. 

 

Part III: Data Cleaning

 

D. Summarize the data-cleaning process by doing the following:

1. Describe the findings, including all anomalies, from the implementation of the data-cleaning plan from part C.

 

2. Justify your methods for mitigating each type of discovered anomaly in the data set

.

3. Summarize the outcome from the implementation of each data-cleaning step.

4. Provide the code used to mitigate anomalies.

 

5. Provide a copy of the cleaned data set.

 

6. Summarize the limitations of the data-cleaning process.

 

7. Discuss how the limitations in part D6 affect the analysis of the question or decision from part A.

 

E. Apply principal component analysis (PCA) to identify the significant features of the data set by doing the following:

 

1. List the principal components in the data set.

2. Describe how you identified the principal components of the data set.

3. Describe how the organization can benefit from the results of the PCA 

 

Part IV. Supporting Documents

 

F. Documentation of error-free functionality of the code used to support the discovery of anomalies and the data cleaning process and summarizes the programming environment. 

 

G. Reference the web sources used to acquire segments of third-party code to support the application. Be sure the web sources are reliable. 

 

H. Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized. 

 

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme