logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
StatAnalytica ExpertCriminology
(5/5)

834 Answers

Hire Me
expert
Nawaaj KhanEducation
(5/5)

753 Answers

Hire Me
expert
Adetayo OjeniranStatistics
(/5)

936 Answers

Hire Me
expert
Mitch BennLaw
(5/5)

767 Answers

Hire Me
R Programming
(5/5)

You are required to carry out the data science process on a public dataset of your choice

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Coursework 

Overview

You are required to carry out the data science process on a public dataset of your choice. Some relevant data sources were indicated in lectures.

The process should be carried out using RStudio and be reproducible. Your data science process should include (at least):

• Any data preparation required

• An exploratory analysis of your data

• A supervised learning experiment (regression or classification)

• Evaluation

• Presentation of your results

Data Set Selection

Although you have the freedom to choose a dataset, bear in mind the following constraints.

You need to know what the data represent, so you must have access to an adequate codebook or data dictionary.

Your main analysis must be a supervised learning experiment, either regression or classification. In order to carry out that analysis, you will need to form a rectangular multivariate dataset, i.e. a table of data where rows are instances and each column is a feature. In R terms, the data can be placed in a data frame.

If it takes a lot of effort to construct the dataframe from the original data, this will be taken account of in marking, especially if significant extra skill is required. However, you should not allow this phase to dominate the project. In particular, it is very risky to embark on a project without having a clear process for generating the target format.

The correct size is big enough to be interesting, but not so large that you struggle to store and process the data. For example, we were able to do some interesting things with ‘mtcars’, but a good ML experiment probably needs more instances (rows) and may use more features (columns).

You may not use a dataset that you have used or are currently using for coursework in other modules.

Exploratory Data Analysis

The typical contents of an exploratory data analysis are described in the relevant lecture. For a modest number of features, an analysis of every feature should be presented. If the number of features is larger, then you will need to be more selective.

You may include some investigation related to the questions that your main experiment will seek to answer. This element will be larger for some projects than others and credit will again be given as appropriate. Once again, you are advised not to let this get out of hand and prevent you from spending enough time on your main analysis.

Supervised Learning Experiment

You should define a supervised learning problem, either classification or regression. To do so, you will need to declare one of your features to be the target label. You will also need to specify your performance metric(s) e.g. accuracy, MSE.

Choose at least two algorithms and run an experiment to compare their performance on the data. The experiment will normally involve cross-validation or bootstrapping to estimate performance metrics. If you choose to use a simple train-test split, you should explain why (it is possible that this is the best you can do for your specific experiment).

As well as comparing different algorithms, you should try to achieve optimum performance for your chosen algorithms. For example: feature selection or extraction prior to running the algorithm; tuning the parameters of the algorithm.

Finally, if the model provides any forms of insight on the data and the classification, the student must report it and reflect on it.

Evaluation

This should include a discussion of performance as measured by your chosen metric(s). Other questions you should address include: What did you learn from your data and which algorithm was more effective? Can you explain why your methods were effective or ineffective? Which features were the most important predictors of the target label?

Presentation of Results

You should produce a technical report explaining your process from start to finish. The report should be generated by ‘knitting’ an R script in .Rmd or .Rnw format. The document should give enough information to allow the reader to exactly reproduce your analysis (typically by citing the data source and including all processing commands).

You should also produce a short report on your project for a non-technical audience. The intended audience may be the general public or a specific group with a presumed interest in the data. This should briefly explain what the data is and why your audience should be interested in it, as well as summarising your conclusions.

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme