1. Project Introduction
1.1 What is health insurance?
Health insurance is a type of insurance that covers medical expenses related to hospitalizationcosts, the cost of medicines, or doctor consultation fees. The cost of treating the illness can cause severe financial strain on the accumulated savings. Hence, finding an insurance plan compatible with your financial capacity is a must. With continuous rise in medical costs, there comes cases when you might have to go so far as to compromise on your children’s education quality or default onyour home loan payments if any sudden illness occurs.1 Hence, the key elements in health insurance charges are advance planning on the insurance and to make sure you are able to understand which factors are being considered in its calculations and how to optimize the charges. By considering various factors, our project focuses on the computation of medical insurance charges.
1.2 Project goal
The dataset being used in our project is a medical insurance dataset gathered from Kaggle. The goal of the project is to identify the factors that might be related with predicting the medical insurance charges and assign weightage to these factors to obtain an equation. The variables in question are age, BMI (Body Mass Index), children, gender, and location. In other words, a statistical model is performed using these parameters to predict the insurance cost incurred. The goal is not limited to predicting these charges but goes as far as to compute the reliability of the model and tries to strengthen it by taking significant data into account.
1.3 Project Significance
Health insurance provides financial protection in case of serious health problems or accidents.2 The significance of the project is to establish a better health care system i.e., predicting insurance charges is one of the most appropriate methods.
2. Data Analysis
2.1 About the Dataset
The raw data comprises of medical insurance data. It has 1338 rows and 6 columns with mixed data types containing numerical values: discrete variables, continuous variables, and categorical variables. Because the categorical values were present in the dataset, they are converted to the dummy variable for the analysis. And the final dataset comprises of nine (independent variables) and 1 (dependent variable). The features can be broadly classified into the following categories:
1. Medical Information- This includes age and BMI.
2. Family details- number of children a person has.
3. Personal information- whether the person is a smoker or a non-smoker.
4. Cardinal directions- whether the person is residing in the northeast, northwest, southeast, southwest region.
2.2 Preliminary Data Processing
After conducting a preliminary investigation into each feature, we found that all the parameters are useful for our data analysis. For the categorical variables such as gender, we have created two dummy variables (female and male) and for the location, we have created four dummy variables (northeast, northwest, southeast and southwest). BMI refers to the Body Mass Index which is derived from the mass and height of the human body. In addition, the data comprises of charges, age, location, smokers, and children. With these, there are a total of 9 independent variables and 1 dependent variable.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme