According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths. Our goal for this project is to determine how a patient is likely to get a stroke based on the input parameters like gender, age, various diseases and smoking status. Specifically , we want to see how gender and age could help us to predict stroke.
Based on our background information and prior knowledge of stroke, we had four main hypotheses:
1)Gender: women have more stroke events than men because of pregnancy, having a longer life expectancy and taking hormonal medication at earlier life.
2)Age: The older the person is , the greater the probability for a stroke to occur (above 65).
3) Smoking people have a greater probability of getting strokes.
4)Most stroke risk factors are lifestyle related, including high blood pressure, smoking, diabetes, high blood cholesterol levels.
Kaggle: We used Kaggle to find the dataset for our project as the data set includes detailed attributes about people who have strokes. We will use the data source to analyze and hopefully find a pattern with two attributes that might help in predicting stroke before it happens.
We will use this dataset to predict whether a patient is likely to get a stroke based on the input parameters like gender, age, various diseases, and smoking status.
-Kaggle, a subsidiary of Google LLC and is an online community of data scientists and machine learning practitioners.
-The table below shows patients information used in our analysis. Each row in the data provides relevant information about the patient with the corresponding input parameters such as gender, age, various diseases, and smoking status
To begin with, before starting our analysis we had to add in some coding to out data set in order to be used efficiently in our analysis as all our parameters (gender, age, smoking status) are in forms of words and need to be changed into number values in order to work on the data analysis for the project and use the excel functions. To do so, we have added 5 new columns with number values addressing information about gender, married or not, work type, residence type, and smoking status as shown on the screenshot below.
The legends are the following:
Gender: 1= Male 2=Female
Marriage status: 0=not married , 1= married
Work type: 1= private sector, 2=children , 3= government sector , 4=employed
Residence type: 1=urban residence , 2=Rural
Smoking status : empty= unknown, 1=Never smoked, 2=Smokes, 3=Formly smoked
Stroke is one type of cardiovascular disease. There are two types of stroke: ischaemic stroke and hemorrhagic stroke.The criteria used to define a stroke was that a stroke occurs when a blood vessel that carries oxygen and nutrients to the brain is either blocked (ischemic) by a clot or bursts (hemorrhagic).
According to the World Health Organization, stroke is the second leading cause of death and the third leading cause of disability. The latest global mortality estimates shows that 10.2% of total deaths were caused by stroke in 2016 and 9.9% for the year 2000. We use a tableau analytical tool to perform analysis on a dataset.
Descriptive analytics address the questions of “What happened?” or “What is happening?”. We define descriptive analytics as analytics performed which characterizes, summarizes, and organizes features and properties of the data to facilitate understanding of the results and the underlying data. Our study aims to understand the nature of Stroke relative to the criteria included in the dataset. We hope to bring awareness to the reader, through this analysis, to reflect on one’s current health and lifestyle status and how close or preventive one is for a stroke incidence.
The data contains 2115 of males. From the visualization, we can observe that 41.39% (2.01k) of male population are suffering from stroke
1. Hypertension is the primary risk factor for stroke which contains 10.50% of male population with an average glucose level as the second risk factor between 80-90 mmol/L .
2. Heart diseases being the third risk factor constitutes to 7.71% as the blood clot causes stroke
3. Smoking being the fourth risk factor, people who smoked 15.93% are lesser in number than people who never smoked and formerly smoked.
Above risk factors are seen in the age group of 50-60 yrs with BMI 25-30 Kg/m2 and working in private firms .
The data contains 2994 of females. From the visualization, we can observe that 58.59% (2.85k) of the female population are suffering from stroke
1. Hypertension is the primary risk factor for stroke which contains 9.22% of the female population with an average glucose level as second risk factor between 70-80 mmol/L
2. Heart diseases being the third risk factor constitutes to 3.77% as the blood clot causes stroke
3. Smoking being the fourth risk factor, people who never smoked 15.10% are lesser in number than people who smoke and formerly smoked. Above risk factors are seen in the age group of 40-50 yrs with BMI 20-25 Kg/m2 and working in private firms .
Diagnostic analysis is performed to check the underlying reasons for past results that have occurred. It is used to find linkage and patterns between variables. To do so we have segregated the age groups with the number of strokes in each category as seen below.
This pivot chart above divides the data of people based on their age range for further analysis later.
As seen above when doing the regression analysis based on relating stroke and gender, as the adjusted R squared value shows it is between 0 and 1 which is a positive 0.01 this means that there is low ability to explain the dependent variables ( which is stroke) based on 3 variables which are gender (0,1) , married status, smokers. In this case the adjusted R square value 0.01124 represents a low ability to explain the stroke probability based on those 3 categories.
The above table on the other hand shows that the female gender gets more strokes than men. With this being said, this is not very accurate to conclude as there is not a significant difference between the two so the hypothesis that females are more likely to get strokes is not valid.
After providing the pivot table, having a line chart helped us to recognize that females are more prone to stroke than make. As it is shown that the line in the female chart is higher than the males chart. Therefore, age plays a major role, as the people get older there is a higher chance of having strokes especially females as it is shown in the chart provided.
Based on the pivot table as well, we have done a pivot chart in order to explain it more clearly. In this chart we have included the married people as well which are labeled by orange and the unmarried by yellow, and heart diseases are labeled by gray. Moreover, people who are not married are less prone to stroke than the married ones. As we all know, people with heart disease have a higher percentage of strokes.
To sum up, using data analytics tools that we have covered during this semester, we were able to visualize the data. Therefore, based on the data, we came up that female patients, people with heart diseases, smokers, and older people who are more prone to stroke than the others
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme