1 Online retail sales prediction
In many businesses, identifying which customers will make a purchase (and when), is a critical exercise. This is true for both brick-and-mortar outlets and online stores.
The data provided in this assignment is website traffic data acquired from an online retailer and provides information on customer’s website site visit behavior. Customers may visit the store multiple times, on multiple days, with or without making a purchase.
Your goal is to predict how much sales revenue can be expected from each customer. The variable revenue lists the amount of money that a customer spends on a given visit. Your goal is to predict how much money a customer will spend, in total, across all visits to the website, during the allotted one-year time frame (August 2016 to August 2017).
More speciffically, you will need to predict a transformation of the aggregrate customer-level sales value based on the natural log. That is, if customer i has ki revenue transactions, then you should compute:
ki
custRevenuei = revenueij ∀i ∈ customers
j=1
And then transform this variable as follows:
targetRevenuei = ln(custRevenuei + 1) ∀i ∈ customers
You will be evaluated on how well you can predict the target revenue on a test data set available at the Kaggle.com website (see the Canvas assignment page for the private competition URL)
(a) (50 points) Preparation and modeling.
i. (10 points) Data understanding. Generate a Data Quality Report. Also, choose at least two meaningful visualizations and/or analyses and explain their relevance.
ii. (10 points) Data preparation. Choose two of the most critical data preparation actions you took and explain the reasoning for these actions.
iii. (20 points) Modeling. Build an OLS model and 3 or more regression variant models (these may include robust regression, PLS, PCR, ridge regression, LASSO, elasticnet, MARS, or SVR) and summarize their performance in a table (as shown in Table 1). Clearly state your resampling approach. Note: You may combine models, techniques, etc.
iv. (10 points) Debrief. For your best predictions, describe your approach, e.g., did you examine interactions? did you use any type of model stacking? what was your secret sauce? Did you have any problems during the modeling process? If so, how did you overcome those?
(b) (50 points) Competition modeling.
Upload your predictions to the Kaggle website and check the predictive performance on the “Public Leaderboard”
You may submit multiple times throughout the competition, however, there is a limit to the number of submissions per day.
Score is based on ranked performance on the “Private Leaderboard”; extra-credit is possible.
All modeling approaches covered in lecture can be used (OLS, robust methods, dimension reduction methods, penalized methods, MARS, SVR, PCA, LDA, k-nn, t-SNE, transformations, missing value imputations, etc.) To be fair, approaches not yet discussed in detail are not allowed (e.g., tree-based models, neural network based models, clustering, are not allowed at this time.)
You must outperform the benchmark model to receive any credit.
May the odds be ever in your favor.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
29 | 30 | 31 | 1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 | 1 |