logo Use SA10RAM to get 10%* Discount.
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Edwin KairuStatistics

653 Answers

Hire Me
Kate DuggannMathematics

610 Answers

Hire Me
Shubham HasijaEngineering

704 Answers

Hire Me
Dan CurnowwComputer science

886 Answers

Hire Me

Create models of data (regression, neural networks, decision trees) to identify patterns relationships



This report presents the results of using the _____data mining framework, to build models that will help the management of a supermarket concentrate their resources on targeting customers that are most likely to purchase organic products. Three binary ______________models were generated using ________ analysis, decision ______ and _______  ______ .  _____and  _____ were identified as the most important predictors.  All three models were predictive and have a ________level of performance.  _____ was chosen as the champion models based on performance and also the techniques ability to provide an non-technical explanation of the model with a lift value of _____times greater than selecting customers at random.  Several recommendations are made for improving data quality for the next cycle of data mining.


Business Problem

The business problem is to identify the customers that are most likely to _____ organic products in the supermarket. A data models will be built data set (organics.xls) collected during the supermarket incentive period.  By identifying ______ who are ______ to purchase organic products the company will be able to target its marketing efforts more effectively which should result in more sales per marketing advertising spend.

Data Mining Representation

The business problem, identification of customers who are ______ to ______ organic products is a type of data mining representation known as a _____  _________ problem. The most suitable target variable is _____ which is identified as a binary variable.  The remaining variables will be given roles _____ and are assigned the default measurement levels with the exception of AFFL which has been changed from interval to ordinal. 

Methodology - data mining approach


The process to be adopted is the first flow in the virtuous cycle of data mining which has four distinctive steps:


1. Identify the business problem or opportunity.

2. Mining data to transform it into actionable information.

3. Acting on the information this is outside the remit of the brief (marketing initiative driven by the model, eg targeted marketing offer) 

4. Measuring the results of the marketing initiative this is outside the remit of the brief (measure profitability of the pilot marketing study using the model).


The data mining framework used in the generation of models for steps 1 and 2 of the virtuous cycle of data mining is based upon _____, Sample, Explore, Modify, Model and Assess.  In the brief the data set has already been provide, so there is no need to sample or collect the data.  However limitations in data collection may become apparent in the data mining process, these are discussed in the recommendations section.




Explore the raw data as provided.  This will result in a brief overview of the variables so that they can be classified as qualitative data (binary, nominal, ordinal), or quantitative data (discrete, interval) with a brief description.

Establish a Target variable (which variable can be used to establish the required results).




Assess the data quality (what changes can be made to the data classifications/levels, model roles).  If necessary modify and transform the data, impute missing values, transformations to normalise distributions of heavily skewed data.




Create models of data (regression, neural networks, decision trees) to identify patterns relationships and parameters that can predict the target variable. Each model representation has their own advantages and disadvantages.




Asses the model performance in order to identify the champion model and investigate their limitations.  If necessary make changes to the data set, model tuning through a second cycle of data mining to improve on the previous model results.  The aim of the second cycle is to upon improve model reliability, robustness, performance and avoid over fitting.



In summarising the results from the models generated arrive at a conclusion, and make recommendations for future data gathering by the business.  This should ensure that the future data mining results will be of benefit to the business and refine the data mining process.







Data Exploration


A full meta data description of the data is provided in the appendix (table 1 and 2) only important features and insights that have a bearing on data mining process will be described. These insights are generated from an exploratory data analysis of the data.  The analysis has been divided into two sections class variables which are qualitative and interval variables which are quantitative.  


From an analysis of the business problem the target variable, the variable to be predicted has already been identified as _____.  It is noted that the ratio of evidence for organic customer purchasers to non-purchasers is _____, making this a difficult data mining problem.


An indication of data quality by identifying the level of missing data is also presented, any variable with more than 50% of the data missing is normally considered unsuitable for data mining. Variables that have missing data have a bearing on regression analysis and neural network models. _____, _____ and _____ have between 5 and 11% missing data,  the data quality for remaining variables is less than 5% missing. Interval variables which are heavily skewed > 3 may be transformed to normal distributions to comply with model assumptions.  


Variables with outliers, extreme values are also identified.  These anomalous observations may be outside the scope of models and may give indications for the predictive parameters.  However the nature of outliers is that they represent only a handful of observations and insights may not be applicable to the majority of observations.  The impact of outliers on modelling was found to be negligible. Model performances were practically unaffected by the removal of outliers using the filter node.  The variable _____ was heavily skewed and has the majority of extreme values. 



Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme

Get Free Quote!

299 Experts Online