Using Logistic Regression Using Excel
1. Background:
Logistic Regression (LR) – also known as Logit Regression or Logit Model - is a statistical estimation method. Given a set of data, which classifies each data point in a binary manner, the aim is to find an exponential that can learn the patterns and return probabilities about future patterns.
As an example, assume that you are given a set of data about students that consists of if they have passed the MN50642 Optimisation unit ( , with 0 meaning fail and 1 meaning pass), and how much time they have spent reading the textbook ( ), solving examples on paper ( ), and solving problems on a computer ( ).
LR makes use of the Logistic Function , a well-behaved function with useful mathematical properties, plotted in the figure below. Simply put, the Logistic Function is a continuous approximation of a step-function, better known as an if function. It has widespread uses in Machine Learning.
The Logistic Function can be used as an estimator of the probability of an event since it returns values between 0 and 1. However, it is based only on one input, . We would like to find a linear function to replace the in the Logistic Function, of the form . The modified Logistic Function then can be used to estimate the probability that student will pass. Consequently, is the estimated probability that student will fail.
LR-P1 “maximum likelihood” model, which aims to maximise the probability that the observed pass / fail values were the most likely outcome. Note that LR-P1 is never infeasible, and the objective function is concave, making it easy to implement and solve.
However, the objective function can cause problems for large data sets. Multiplying many numbers that are in the range (0,1) results in an objective function that is very small, which can cause numerical problems. An equivalent approach is to use a “log-transform”, that is, using a logarithm function on the objective function, transforming it to a sum. The resulting model is then:
(LR-P2)
Note that the logarithm function can be of any base, but it is usually assumed to be base 2, , or 10. The two models are mathematically equivalent, but one may be preferable to the other based on the data on hand.
2. The Problem:
You are working for a Mobile Network Provider called Hi5, which aims to provide high speed 5G connection to its users. A brief analysis of the data for the last three months revealed that there is a significant amount of “churn”, users that switch over to other mobile networks. There is no established method of predicting who may churn.
Your manager (Ms Maeby Wright) decides to use her secret weapon, the business analyst she has recently recruited. She provides you with usage data of 100 customers for the last three months, 30 of which have churned. The data consists of the hours of call and GB of data used by each customer in each month. She tells you to find a method of predicting the probability a customer will churn using the given set of data. You decide to use LR, and she agrees with you.
3. Key Questions from Ms Wright
1. Implement LR-P1 and LR-P2. Discuss the relative performances of these two models and pick one to use.
2. Comment on the quality of data you are given. What would ideal data look like?
3. Would your model still work if the data consisted of 1,000 customers? How about 10,000 customers?
4. Build a worksheet that will take the optimal solution of the LR to predict the probability of churn for a customer, given the customer’s usage data for the last three months.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme