The goal of this assignment is to gain practical experience is using Weka and applying it to storytelling with data. You can work on this assignment on teams of up to three students. The standard academic honesty rules apply.
First Deliverable – Dataset:
You will be selecting and downloading a dataset of your choice for the purpose of this assignment. Your selected dataset should satisfy the following criteria:
• Contains at least 5 dimensions/features, including at least one categorical and one numerical dimension.
• Contains a clear class label attribute (binary or multi-label).
• Be of a simple tabular structure (i.e., no time series, multimedia, etc.).
• Be of reasonable size, and contains at least 2K tuples.
While the assignment is open ended, you are expected to select an interesting dataset, which in turn tells an interesting story. Many such datasets are available on public repositories such as: UCI, Kaggle, KDnuggets, etc. Attached, please find some suggested datasets to select from.
Second Deliverable – Video Presentation:
Data Exploration Tasks
• The name and source of dataset.
• A description of how the dataset was collected or created.
• A summary of the purpose of each column in your dataset, including the class label.
• An overview of the data .ARFF file, and how you created it (if needed).
• Explain any data quality problems you might have faced, and how it was handled.
• Provide a visual overview of all the attributes in your dataset.
• Discuss the top distinctive categorical attribute, which is highly correlated to the class label. Support your discussion with a visualization of that attribute.
• Discuss the top distinctive numerical attribute, which is highly correlated to the class label. Support your discussion with a discretized visualization of that attribute.
• Identify and discuss one attribute that clearly has no impact on the class label. Support your discussion with a visualization of that attribute.
Data Analytics Tasks:
In the following, always use K-nearest neighbor classification algorithm
Task 1. Using the default settings of K-nearest neighbor, report on the performance of your classifier (e.g., accuracy, precision, recall, etc.).
Task 2. Now try different values of K, and report on the obtained performance for those different values.
Task 3. In your opinion, what is the most suitable setting of K for your dataset?
Task 4. Given your answer to the previous task, try different settings for the split ratio and report on the obtained performance.
Task 5. Compare the performance of Task 4 to that of a cross-fold data partitioning.
In the following, always use the decision tree classification algorithm
Task 6. Using the default settings of the decision tree, report on the performance of your classifier (e.g., accuracy, precision, recall, etc.).
Task 7. Inspect the visualization of the obtained decision tree, and discuss:
1) the most distinctive features of your dataset, and 2) any interesting observations learned from the tree structure.
Task 8. Compare the observations obtained from Task 6 to your findings in the Data Exploration tasks of the assignment. That is, how the features you identified in the exploration phase are similar or different to the ones from Task 6.
Task 9. Adjust the decision tree parameters to allow overfitting, and compare to the results obtained in the previous task in terms of: 1) tree structure, and 2) classifier performance.
Task 10. In your opinion, what would be the best settings for the decision tree classifier for your dataset?
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme