Question 1
Prepare the ship data stored in "ship.csv" for future analytics tasks using functions and methods of NumPy and pandas, respectively.
(a) Design your own Python program to carry out the following tasks:
(i) Read in "ship.csv" as pandas DataFrame called "ship". Since there are 6 observations where MS and Y are "." to indicate that they are missing values, declare this character as missing values in your program accordingly.
(ii) Since the variable names of this dataset are rather short and do not really describe the nature of the variables, rename the ship types to "types", construction years to "c_years", operation periods to "o_periods", the aggregated months of service to "s_months", and the number of incidents to "incidents".
(iii) For better understanding of the data, compute the average service months and the average number of incidents for the cross-products of every category in types and operation periods. The averages should be rounded to the nearest integers. Store the resulting table to an object named "shipgroup".
(iv) Replace the missing values in the variable "s_months" and "incidents" by the respective means of the other ships that share the same type AND the same operation period. Add comments to elaborate your Python program as well.
(v) Construct a Python program to save the target variable "incidents" in a pandas DataFrame named "Y". (33 marks)
(b) Except for the months of service and number of incidents, all the other variables, including "types", "c_years", and "o_periods" are actually nominal and not interval/ratio.
(i) Perform an appropriate data type conversion for these variables so that they can be recognised as categorical variables. ANL252 Copyright © 2021 Singapore University of Social Sciences (SUSS) Page 5 of 6 ECA – July Semester 2021
(ii) Construct Python code to convert all categorical variables to dummy variables and save the result as a pandas DataFrame named "X".
(iii) Researchers suggest that the aggregated months of service of each ship must be scaled down due to its wide range of values. Perform a log-transformation of this variable in the DataFrame and name the transformed variable "log_s_months". The transformed variable should be attached to both DataFrames "X" and "ship". (14 marks)
(c) Normally, we shall split the DataFrame into training and testing datasets to evaluate the predictive power of the model. Study the dataset carefully and explain why it is not sensible to split the DataFrame here, and we shall use the entire dataset for training purpose instead. (8 marks)
(d) We shall now save the prepared DataFrame "ship" as a new csv text file called "ship_prepared.csv". Furthermore, we shall also create a database called "ship.db" and export the DataFrame to the database as tables. Write a Python program to carry out these two tasks. (15 marks)
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme