1. Pete, owner of Pistol Pete’s Diamond Emporium, is investing in a diamond classification system due to his deteriorating eyesight. Pete buys and sells diamonds of varying quality: Low ($1,000-$3,000), Medium ($4,000-$7,000), and High ($8,000-$10,000). It is very important to Pete that his classifier properly classifies his diamonds so that he can not only have a profitable business, but also, so that his customers will continue to trust him as a business owner.
Using the possible cost matrix values given below, fill out the cost matrix that most accurately reflects Pete’s needs for his diamond classifier model. After completing the cost matrix, justify your proposed cost matrix.
Actual class |
|
Predicted class |
||
High |
Medium |
Low |
||
High |
|
|
|
|
Medium |
|
|
|
|
Low |
|
|
|
2. You have been given a data set containing three discrete attributes and five continuous attributes. After carefully analyzing the problem and the available attributes, you decide that one of the continuous attributes, estimate, should be used as the class attribute for your classification problem. Describe how you could use the estimate attribute when performing classification.
3. When performing an unsupervised k-means clustering, it is sufficient to generate a single clustering. Do you agree with this statement? Why or why not?
4. You have performed an unsupervised k-means clustering on a data set with two attributes and the results indicate a k value of 2. Later, a domain expert determines class values for each data instance and there is a total of four class values. Provide a possible explanation for why unsupervised clustering disagrees with the domain expert for a k value by drawing a sketch of the unsupervised clustering to go along with your explanation.
5. How many possible association rules can be generated from a transaction database containing 10 different items? If three of those items are infrequent, how many rules can be generated from all possible 2-itemsets if those 2-itemsets are all frequent? Hint: recall how we used combinatorics to determine how many k-itemsets may be generated for a given number of items, k.
6. Run the Nearest Neighbor classifier with a k-value of 7 and a Support Vector Machine with default values using 10-folds cross validation on the diabetes data set (diabetes.arff in Assignment 3 on myCourses) in Weka. Fill in the confusion matrices for the models in the tables below and use the cost matrix to compute the cost for each model. Based upon the cost, which model should be selected and why?
Nearest Neighbor (k=7) Confusion Matrix
|
Tested Negative |
Tested Positive |
Tested Negative |
|
|
Tested Positive |
|
|
Support Vector Machine Confusion Matrix
|
Tested Negative |
Tested Positive |
Tested Negative |
|
|
Tested Positive |
|
|
Cost Matrix
|
Tested Negative |
Tested Positive |
Tested Negative |
0 |
50 |
Tested Positive |
100 |
-1 |
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme