You are required to undertake the following tasks in this project:
1. Business Understanding:
• Download the dataset assigned to you from the module Moodle site along with the data description file.
• Read the data description file to learn the nature of the data, such as what is the data about, where it comes from, which certain business context it is associated with, etc.
• Examine the dataset within its business context to identify meaningful problems that potentially can be addressed by using analytics.
• Translate the business problems to appropriate data mining problems and tasks.
• You may also refer to any articles published relevant to the dataset.
2. Data Understanding:
• Perform initial data exploration to get to know more about the dataset, such as the total number of instances in the dataset, the number of attributes (variables), the data type of each attribute, and the basic statistics of each attribute, including value range, average, standard deviation, skewness, kurtosis, and mode, etc.
• Identify any data quality issues, including missing values, outliers, extreme values, and imbalanced classes, etc.
• Determine if the dataset is appropriate to be used for addressing the business problems identified in Task 1. If not, re-do Task 1.
3. Data Preparation:
• Convert the dataset into SAS® format in order to carry out the required data mining tasks in SAS®.
• Choose appropriate methods for data pre- processing, which includes dealing with missing values, tackling outliers, extreme
values, and imbalanced classes, changing data types, reducing dimensionality, and
conducting data transformation and normalisation, etc., wherever appropriate.
• Determine which and how each attribute should be used in your analysis.
• Divide the whole dataset into several subsets to be used for training, test and validation in predictive modelling.
4. Modelling:
• Use the pre-processed dataset to perform the data mining tasks you have identified in Task 1.
• Choose appropriate techniques and algorithms for your analysis: two for predictive modelling, e.g., decision trees and regressions, or decision trees and nearest- neighbour algorithm, and two for descriptive modelling, e.g., basic statistical analysis and k-means clustering, or association analysis and k-means clustering.
• Determine appropriate settings of the algorithms to be applied, e.g., how many clusters to use in k-means clustering.
• Re-do data preparation in Task 3 if needed.
5. Evaluation
• Evaluate the performance of the predictive models in terms of evaluate the performance of the predictive models in terms of various measures applicable, such as accuracy, SSE (sum of squared errors), generalisation ability, simplicity and cost etc.
• Provide an explicit and concise description of the descriptive and predictive models you have created. Examine and explain what patterns and insight have been identified.
• Discuss how the descriptive and predictive models created can be used to address the original business problems identified in Task 1.
• Summarise your main findings from the project.
As a guide, aim for 2500-3000 words. The maximum word limit is 3000 words. If the total word limit is exceeded, it will affect the marks awarded to the project presentation.
Footnotes will not count towards word count totals but must only be used for referencing, not for the
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme