logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Rech ElleeNursing
(5/5)

966 Answers

Hire Me
expert
Puneet GuptaStatistics
(/5)

690 Answers

Hire Me
expert
Kathleen HaslamResume writing
(4/5)

904 Answers

Hire Me
expert
John GrettonSociology
(5/5)

947 Answers

Hire Me
R Programming
(5/5)

The aim of the project is to produce two different pieces of data analysis work. Therefore the two datasets you choose should be completely different from each other

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Project Information

Clarifications about the description

The aim of the project is to produce 2 different pieces of data analysis work. Therefore, the 2 datasets you choose should be completely different from each other although they can belong to the same domain if you intend to combine them in your analysis. Cases like (but not limited to):

One dataset being a derivative of the other,

Both being subsets of the same original dataset,

will inevitably lead to some duplication of parts of the work (literature, objectives, design analysis, code, etc.) for which points cannot be re-awarded.

The datasets are supposed to also be publicly available. There are numerous resources on the WWW with publicly available datasets (see “Sample data sources” section). If you prefer to use a dataset that is not readily publicly available (like data from your workplace), you must ensure that:

You have full permission to use this data in a piece of work that will make both the dataset and analysis thereof public and

The dataset does not contain any personally identifying information.

Basically, by using such a dataset you will be making it publicly available so you need to secure the dataset owner’s permission to do so in advance.

Structure and Rating Grid

Objectives and Literature Review

As with every piece of data analysis, you should ideally have a question or set of questions you expect your work to answer; these are your objectives. They will be graded for realism, imagination, ambition and clarity of expression.

Your objectives are inherently tied to the state of knowledge on the domain, which you can gauge via reviewing the domain literature. This should be presented as a synthesis of the referenced works, not a compilation of summaries. Your literature review should be properly (Harvard-style) referenced. It will be graded for quality, depth and extent.

Dataset description

Your chosen datasets should be included in their original form as ancillary files. If they are prohibitively large, you should include a well-chosen, representative subset. Where and how the datasets where located and downloaded should be clearly shown. They will be graded for richness, depth and interest factor.

 The dataset description should encompass all columns and detail data types, ranges, special cases, etc. More focus should be given to the columns that will ultimately be used for the data analysis. All interesting and/or pertinent information about the dataset should be presented. The description will be graded for detail, structure, clarity, etc.

Analysis approach

Your data analysis should be designed in advance and the design documented via description and visual aids such as tables, flowcharts, and other appropriate schematics. Please note that screenshots of your code do not count as such in the general case and should be avoided unless there is a specific reason why they are appropriate.

There are 3 established approaches to data analysis[1]:

Cross Industry Standard Process for Data Mining (CRISP-DM)

Knowledge Discovery in Databases (KDD)

Sample, Explore, Modify, Model and Assess (SEMMA)

Of the three, CRISP-DM and KDD are the most generally-implementable, whereas the design of SEMMA assumes the use of the Enterprise Miner software from SAS. You can find more information on them in the “Miscellaneous Resources” section. You are free to adopt any of these for your project or follow your own. No points will be awarded for following an established methodology.

Your data analysis process will be graded for robustness, adherence to commonly accepted standards and completeness (inclusion of some kind of testing process, result evaluation, etc.) Description thereof will be graded for detail, clarity, appropriate use of visual aids, as well as the quality and variety of the latter.

Analysis results and presentation

The results of your analysis should go as far as possible towards reaching your prior stated objectives (i.e. answering the questions you were hoping to answer). Note that a robust conclusion that the dataset or the analysis aren’t enough to reach a specific conclusion isn’t a failure but is, in fact, a positive result!

Your analysis will be graded for robustness, appropriate use of statistical methods, appropriate use of code, etc. The presentation of your analysis will be graded for clarity, depth of information, appropriate use of visual aids (graphs, plots, etc.) and quality thereof.

R code

You are required to use R to an extent that showcases your aptitude with the most important operations learned during the course (file I/O, control structures, functions, etc). A substantial amount of code is expected.

Your code will be graded for extent, quality, good use of coding conventions (comments, variable naming, etc.). Note that the use of libraries, while encouraged, will not be given extra marks.

 Project report

Your ultimate project report, encompassing all the above, will be graded for structure, presentation and quality of its discussion of challenges. There is no requirement to structure your report as a scientific paper, though you are free to do so if you prefer.

Please do include your complete R code as an appendix. Obviously, your code does not count towards the word count requirement.

Deliverables

Your main deliverable will be the project report, which should be submitted via the Turnitin submission form, appropriately entitled “Project report submission”. Ideally it should be in PDF format, but MS-Word and other similar formats are also acceptable. If in doubt, ask.

Your ancillary files should be submitted via the “Project ancillary file submission” form as a zipped archive. It should contain at least the following resources:

Your R code as 1 or more .r file(s),

Your 2 datasets in the state they are input into your R code.

Please use relative filepaths in your code, such that unzipping the archive allows all code to be run without any need to edit the filepaths or move the datasets around.

Other included files can (but are not required to) be intermediate dataset outputs, graphs/plots/etc as graphics, or whatever other artefacts you deem appropriate.

Referencing and plagiarism

Ideally, do lots of the former and none of the latter. In case it hasn’t already been drilled into your heads (which it should), any type of plagiarism makes NCI very, very sad and the penalties are severe. Please re-familiarise yourselves with the NCI policy on referencing and plagiarism. Please note that re-submitting your own work (either from the same module earlier or a different one) is still considered plagiarism.

The fact that we will be dealing with code doesn’t make much difference in the guidelines about plagiarism. While programmers work by re-using each other’s code constantly, in an academic environment you are still expected to cite any code you use that is not strictly your own. Luckily, code comments provide the perfect mechanism for referencing that code’s origins.

If you’re reusing a whole script or class, you should reference all the appropriate details (URL, original author, date, etc.) in a large comment block at the top of the file.

 • If you’re reusing a code snippet, you should reference its origin URL and author in inline comments just above the snippet.

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme