We will use the following dataset (“mini survey lung cancer.csv”) to predict Lunch Cancer. It is a comma separated file and the variables present in the file are : GENDER, AGE, SMOKING, ANXIETY, CHRONIC DISEASE, ALCOHOL CONSUMING, SHORTNESS OF BREATH, CHEST
PAIN,and LUNG_CANCER. Gender is noted M for Male and F for Female, and Age is an integer and Lung_cancer is Yes/No. The remaining variables have a 2 which represents Yes and 1 which represents No.
1. Load the file into your Python environment (5pts)
2. For the variables with the 2/1 scores, replace all the 2 with a Yes and 1 with a No. (15pts)
3. Let’s categorize the AGE group as well. For all individuals less than 65 years old, assign them as “Middle” and all individuals greater than or equal to 65 as “Old”. (20pts)
4. Use a bar plot to show whether smokers have a higher probability of of getting cancer.
a. Do the same for alcohol drinking (20pts)
5. Randomly split the data into 20% testing and 80% training.(10pts)
6. Perform 5x cross validation on training ONLY.(10pts)
7. Test your model on the 20% test data.(10pts)
8. Compare and discuss the results from cross validation and the your final test. How confident are you that it will correctly predict new data? (10pts)
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme