You are asked to use OpenRefine, Weka to conduct an exploratory data mining of this data, and to produce a SHORT report about what you discover from the data. Tasks and mark allocations are as follows
1. Prepare and clean the data for analysing. At this stage, you are expected to undertake the following procedures: Understand Data and Prepare Data. At the preparation stage, you should clean, and convert the data from the XLSX format into the ARFF format that can be accepted by WEKA. This includes transforming data from one type to another in order to use some particular algorithms. For example, you might need to transform values of a particular attribute from nominal to numerical in order to use a regression algorithm, or from numerical to nominal in order to use association algorithms. Therefore, you are expected to prepare more than one version of data for the analysing. (20%)
2. Analyse the data with appropriate techniques/algorithms such as Classification, Regression, Association and Clustering algorithms. At this stage, you should find out some interesting patterns, such as which kind of applicants are likely to be safe to offer loans to, does skilled residents have any advantage? Will applicants’ credit history help? It would be fine if algorithms from any three of the four categories are used: Classification, Regression, Association and Clustering. These patterns should be represented by rules (about 6 rules from each of the algorithms used), supported by statistical information, such as accuracy and coverage, if possible. (70%)
3. Based on your analysis in 2, summarise the overall findings in the data. Critically discuss and compare the algorithms used and draw an overall conclusion with justification about which techniques are most effective for making discoveries and gaining insights into the data. (10%) Total (100%)
Beware of the fact that some of the algorithms only accept nominal (qualitative) attributes. As with the classic "beer and nappies" story from lectures, the results need some interpretation in the light of common sense and basic knowledge of what the data is actually about. Also remember that the coverage and accuracy of rules generated by each algorithm, if they are available, are important.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme