logo Hurry, Grab up to 30% discount on the entire course
Order Now logo
770 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Arthur AskeyData mining
(5/5)

597 Answers

Hire Me
expert
Kimberley ChenCriminology
(5/5)

682 Answers

Hire Me
expert
Benard MainaFinance
(/5)

762 Answers

Hire Me
expert
rajat mehtaFinance
(/5)

844 Answers

Hire Me
Others
(5/5)

Explore different ways to improve the classification performance (accuracy or expected cost)

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Text Analytics

Assignment 3 Classification

This assignment will give you hands-on experience in building text classification models, using the application of email spam filtering. The target variable represents whether an email is either spam (1) or non-spam (0). Follow the directions and answer following questions. 

Question 1 

Explore different ways to improve the classification performance (accuracy or expected cost). You can consider the following: 

1. Feature representation: Compare 3 feature representations; binary vs. frequency vs. tf-idf

2. Classifier: compare 3 classifiers of your choice such as decision trees, neural nets, etc. 

3. OPTIONAL: Feature selection: different feature/attribute selection methods or parameters (extra credit)

Report the evaluation results of your model using split training and testing. Report the following:

1) Precision and Recall by Class

2) Confusion Matrix. 

Question 2

Calculate the total cost and expected cost (per email) based on the confusion matrix you obtained in question. Assume the cost for each mis-classified email from Spam to Non-spam is 5, and from Non-spam to Spam is 100. 

[Hint: be careful with the dimensions of the confusion matrix: which are the “actuals” and which are the “predictions”?]

Based on your observation, please analyze which combination of feature and classifier is the best. 

Question 3 (Extra credit)

Run 10-fold cross-validation instead of split sample. Does your conclusion still hold? If the observation is different, could you analyze the cause?

(5/5)
Attachments:

Expert's Answer

770 Times Downloaded

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme