logo Hurry, Grab up to 30% discount on the entire course
Order Now logo
463 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Anna CarolyanLaw
(5/5)

749 Answers

Hire Me
expert
Kaleb BryanttManagement
(5/5)

987 Answers

Hire Me
expert
Nikhil JainComputer science
(5/5)

736 Answers

Hire Me
expert
John GuthrieResume writing
(5/5)

727 Answers

Hire Me
Weka
(5/5)

The goal of this assignment is to gain practical experience is using Weka and applying it to storytelling with data.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

The goal of this assignment is to gain practical experience is using Weka and applying it to storytelling with data. You can work on this assignment on teams of up to three students.  The standard academic honesty rules apply.

First Deliverable – Dataset:

You will be selecting and downloading a dataset of your choice for the purpose of this assignment. Your selected dataset should satisfy the following criteria:

Contains at least 5 dimensions/features, including at least one categorical and one numerical dimension.

Contains a clear class label attribute (binary or multi-label).

Be of a simple tabular structure (i.e., no time series, multimedia, etc.).

Be of reasonable size, and contains at least 2K tuples.

While the assignment is open ended, you are expected to select an interesting dataset, which in turn tells an interesting story. Many such datasets are available on public repositories such as: UCI, Kaggle, KDnuggets, etc. Attached, please find some suggested datasets to select from.

Second Deliverable – Video Presentation:

Data Exploration Tasks

The name and source of dataset.

A description of how the dataset was collected or created.

A summary of the purpose of each column in your dataset, including the class label.

An overview of the data .ARFF file, and how you created it (if needed).

Explain any data quality problems you might have faced, and how it was handled.

Provide a visual overview of all the attributes in your dataset.

Discuss the top distinctive categorical attribute, which is highly correlated to the class label. Support your discussion with a visualization of that attribute.

Discuss the top distinctive numerical attribute, which is highly correlated to the class label. Support your discussion with a discretized visualization of that attribute.

Identify and discuss one attribute that clearly has no impact on the class label. Support your discussion with a visualization of that attribute.

Data Analytics Tasks:

In the following, always use K-nearest neighbor classification algorithm

Task 1. Using the default settings of K-nearest neighbor, report on the performance of your classifier (e.g., accuracy, precision, recall, etc.).

Task 2. Now try different values of K, and report on the obtained performance for those different values.

Task 3. In your opinion, what is the most suitable setting of K for your dataset?

Task 4. Given your answer to the previous task, try different settings for the split ratio and report on the obtained performance.

 

Task 5. Compare the performance of Task 4 to that of a cross-fold data partitioning.

In the following, always use the decision tree classification algorithm

 

Task 6. Using the default settings of the decision tree, report on the performance of your classifier (e.g., accuracy, precision, recall, etc.).

Task 7. Inspect the visualization of the obtained decision tree, and discuss:

1) the most distinctive features of your dataset, and 2) any interesting observations learned from the tree structure.

Task 8. Compare the observations obtained from Task 6 to your findings in the Data Exploration tasks of the assignment. That is, how the features you identified in the exploration phase are similar or different to the ones from Task 6.

Task 9. Adjust the decision tree parameters to allow overfitting, and compare to the results obtained in the previous task in terms of: 1) tree structure, and 2) classifier performance.

Task 10. In your opinion, what would be the best settings for the decision tree classifier for your dataset?

 

(5/5)
Attachments:

Expert's Answer

463 Times Downloaded

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme