Overview:
The purpose of this assignment is for you to gain experience with the KNN and PCA algorithms, as well as to assess your ability to construct a well-organized and well-documented notebook. In this assignment you will not receive a notebook template.
Upon submission, be sure that your notebook is well documented - this is required for code development and collaboration in industry, and is needed for grading your work. You will be heavily graded on the clarity of your notebook. How easy is it to understand the meaning of your computations? Do the names of your variables clearly communicate their meaning? Is your notebook documented completely? Document/Comment your code so that someone who didn’t know about your assignment would be able to follow what you were doing every step of the way.
At the beginning of your notebook, you must include an overview of your work. You should offer a detailed explanation of all steps taken in the notebook, and you should explain your process thoroughly. Give special attention to any deviations from the below outline - there may be steps you need to add!
You will be working with the mushroom data set. You should download the necessary files at the supplied link - I have also posted the materials to Brightspace (zip file called mushrooms), in case you have difficulties downloading the data for yourself. Begin by
inspecting all of the materials and understanding the data set (note that you will not use all of the supplied files).
Directions:
The following outlines, roughly, what you aim to accomplish in this assignment:
1. Import the Mushroom data set.
2. Use the KNN algorithm to impute missing values in the dataset. Note: You are not permitted to use the KNNImputer class from Scikit-learn. Instead, you must explicitly write the code to perform the imputation using the KNeighborsClassifier algorithm (use the default settings).
The first step is to think through what will be your feature data and what will be your response data for this imputation step. You will want to one-hot-encode your feature data and label encode your response data. Next, you should train your KNN model on those instances that don’t have missing values, then have the model make predictions for those instances that have missing values. This is how you will have the KNN model impute missing data.
When you have computed the missing values, create a data structure (i.e. a list) called missing_values that contains all of the imputed values (in terms of the original categorical/letter data and not the encoded numeric data) in order of increasing index from the original data set. You must then print the first 10 instances of missing_values to the screen so we can check your work. Finally, you will impute the missing values back into the original data set before continuing, so that the next step starts fresh with a complete data set in terms of the raw data values.
Graded Concept Question #1 (include a section in your notebook): Would it still be possible to train the KNN model if you one-hot encoded the response data instead? Why or why not?
3. Train a RandomForestClassifier as well as a LogisticRegression model to predict whether a mushroom is edible or poisonous given this data set of nominally-valued characteristics. Train the model on the feature data supplied to you after you’ve one-hot encoded it. You should label encode the response data.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme