Steven leads a data science team in MegaTelCo, one of the largest telecommunication firms in the United States. MegaTelCo provides both wireless and internet services and it has hundreds of millions of customers. They are having a major problem with customer retention in their telecommunication business. Many customers leave, and it is getting increasingly difficult to acquire new customers. Since the telecommunication market is now saturated, the huge growth in the telecommunication market has tapered off. Communications companies are now engaged in battles to attract each other's customers while retaining their own. According to a report, annual churn rates for telecommunications companies average between 10% and 67% (Database Steveneting Institute, 2008). Customer churn not only increases operation and advertising cost, but also reduces revenue and damages brand image. It's long been known retention of existing customers is less expensive than acquisition of new ones. In fact, a Canadian study found it costs nearly 50 times less to retain than acquire (Telecoms, 2018). Therefore, Steven and his team to predict the probability that existing customers are going to leave the company and then send the results to the marketing team. Based on the results, the marketing team are going to design a customer retention program to maintain customers who are more likely to churn.
DO
A
In order to achieve the goal, Steven and his team need to prepare a training dataset to develop data mining models. He checks his company's enterprise data warehouse and finds that there are millions of records. Considering that it is very time-consuming to process such a big amount of records, Steven decides to start with a portion of the data. Steven writes SQL queries to obtain a random sample of 20,000 records about customers from the data warehouse system. Next, noticing that the dataset has hundreds of attributes, Steven applies feature selection techniques to include a small number of important attributes in his initial models, rather than all the attributes. Steven further cleans the data to solve the quality issues of the data such as missing or extreme values. Finally, Steven obtains a cleaned dataset with 10 predictor attributes and one target attribute (i.e., the attribute of our interest). Please find the variable definition in the Excel file.
Steven plans to first explore the data in Excel to gain a better understanding of the data and the relationship between other attributes and the target attribute. Then, Steven and his team develop multiple decision tree models and find an appropriate one to make a prediction for 100 new customers. 1. Business Understanding, Data Understanding, and Data Preparation (46 points in total)
Please answer the following questions to help you better describe how analytics is applied in this case. 1.1. Analytics Orientation: Which of the following types of analytics is mainly involved in Steven's task in this case? [5 points]
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
1.2. Analytics Orientation: What is mainly made by Steven's efforts in this case? [5 points]
Making sense
Making prediction
Making evaluation
Making decision
1.3. Which of the following nine common analytics tasks is explicitly mentioned in this case? Choose all
that apply [6 points]
Classification
Regression
Similarity Matching
Clustering
Co-occurrence Grouping
Profiling
Link Prediction
Data Reduction
Causal modeling
1.4. Please indicate whether each of the following statements is true or false by typing T or F (30 points: 3 points for each question).
The variable COLLEGE is a binominal variable.
The variable LEAVE is an ordinal variable.
The variable REPORTED_SATISFACTION is an ordinal variable.
The correlation coefficient between OVERAGE and LONG_CALLS_PER_MONTH is 0.77, indicating that customers' LONG_CALLS_PER_MONTH causes their OVERAGE.
The data this company has and the capability to extract useful knowledge from data should be as key strategic assets for this company.
If Steven wants to develop predictive models, he must judge the models based on both predictive performance and intelligibility.
When Steven's efforts help the company reduce the customer churn and improve the company's service quality, this is an example of achieving Reputation in the PAIR model. When Steven's efforts help the company identify which customers are likely to churn in a real-time manner and then the marketing team can offer those customers with a retention program immediately, this is an example of achieving Agility in the PAIR model.
If Steven uses CRISP-DM correctly, he will always get the desirable results with only one iteration.
Steven extracts the data from the company's data warehouse, which usually stores corporate information and data from operational systems and a wide range of other data resources.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme