Baltimore Crime Patterns Analysis
Abstract
Data mining is an out of the world approach to how we view complex entities. Through the process of mining data, many other relationships among the entities will surface and provide crucial information. Our team has a vision that if proper usage of data mining is applied on crime cases, there could potentially be an increase in crime prevention for the police department. Hence, our goal for this assignment will be data mining on the crime patterns in a town. The objectives of this report include general patterns on how crimes were committed in Baltimore - during which time are crimes more frequent and where are the crime hot spots. Equipped with these information and patterns, we hope that it can provide assistance to the police department in extensive prevention for all these crimes. That is what we want to achieve.
1. Introduction First, we choose the area that has a problem we want to tackle, upon deciding to enter the crime prevention sector, we sourced for our data set regarding crimes. Sourcing an appropriate dataset with relevant columns is important because there were some datasets without relevant information like time or type of crime. Throughout this report, there is our data description, data cleaning, data reduction and data preprocessing so that we have a good set of 5000 crimes data to work with.
Following the method, we took advantage of K-means, DBSCAN to get the results and graphs we needed to analyze the important relationships of crimes. In the modelling stage, we used the Longitude and Latitude from the dataset to plot the area with the highest crime rate. Allowing us to categorize the different neighborhoods and the amount of crime that happened. This report will also include the number of crimes in accordance with the weapon used to commit them to allow the police to take note while frisking suspicious personnel. Throughout this report, we have concluded the neighborhood and the time of day where crimes are more rampant. Which is good In assisting the police department for their manpower dissemination.
1.1 - Business Scenario
Our business scenario will be a police department engaging our data mining service to tackle the crime rate in their city. Our team provides recommendations and solutions to organizations that are looking to improve their efficiencies on what they do, and this time, we are working with the police department. The police department has provided us with a dataset of crimes happening in their city from the year 2012 and 2017, information is in the next paragraph. With this data, our team will use Weka and RStudio to further understand the further relationships among these crimes.
2. Method 2.1 - K-means
K-means aims to partition n observations into k clusters where each cluster belongs to the nearest mean (the center of the cluster) (Wikipedia, n.d.). In this report, the K-means clustering algorithm is used to confirm area groups which have not been clustered in the data set.
2.2 - DBSCAN Density-based spatial clustering of applications with noise (DBSCAN) is a densitybased clustering non-parametric algorithm. It groups together points that are closely packed together, marking as outliers points that lie alone in low-density regions (Wikipedia, n.d.). It separates the clusters of high density from low density clusters (Lutins, 2017). Therefore, it can also be used to find out the noisy data. 2.3 - RStudio For the weapon column, we categorise them using counts and arrange them in descending order. Additionally, a percentage column is added. This allows us to see the percentage for each weapon as shown in table 1 more clearly.
Formula: Percentage = Count/5000
Also, get the Result of Crimes in Every Month:
First mutate the dataset, adding a new column which splits the month from the CrimeDate. After that, change the month from integer to character. Finally, use ggplot to plot the bar chart as figure 13, which will display the crimes rates for every month.
And get the data about the Crimes Happened in Different Time of Days:
Firstly, split the hour from the CrimeTime column. Then add a new column showing the crimes that happened in which hour. Next, build a table about the crimes that happened in different hours by grouping it and lastly, summarizing it and arranging them to attain table 2. Hence, the statistics of data would be obtained. In order to get the trend, we plot a line graph by ggplot and then obtain a graph showing the trend about crimes with time goes by, named it as figure 15.
Additionally, obtaining the mean, median, min and max number of the crimes that happened in different hours. In order to compare the number of indoor crimes and outdoor crimes, ggplot from Rstudio was used to plot a bar chart shown in Figure 16
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme