Real use case of k-means clustering in the security domain…👨‍💻👩‍💻

What is K-means Clustering?

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. The goal of the k-means algorithm is to find groups in the data, with the number of groups represented by the variable k.

Now let’s look at how K-means clustering being used in Security Domain.

………………………………….Crime Analysis………………………………

Crime analysis is defined as analytical processes which provides relevant information relative to crime patterns and trend correlations to assist personnel in planning the deployment of resources for the prevention and suppression of criminal activities.

The main objectives of crime analysis include:

  1. Extraction of crime patterns by analysis of available crime and criminal data.
  2. Prediction of crime based on spatial distribution of existing data and anticipation of crime rate using different data mining techniques
  3. Detection of crime

The procedure is given below:

  1. First we take crime dataset
  2. Filter dataset according to requirement and create new dataset which has attribute according to analysis to be done
  3. Open rapid miner tool and read excel file of crime dataset and apply “Replace Missing value operator” on it and execute operation
  4. Perform “Normalize operator” on resultant dataset and execute operation
  5. Perform k means clustering on resultant dataset formed after normalization and execute operation
  6. From plot view of result plot data between crimes and get required cluster
  7. Analysis can be done on cluster formed.

k-means: Algorithm K-means clustering is one of the method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.


  1. Initially, the number of clusters must be known let it be k
  2. The initial step is the choose a set of K instances as centres of the clusters.
  3. Next, the algorithm considers each instance and assigns it to the cluster which is closest.
  4. The cluster centroids are recalculated either after whole cycle of re-assignment or each instance assignment.
  5. This process is iterated. K means algorithm complexity is O(tkn), where n is instances, c is clusters, and t is iterations and relatively efficient . It often terminates at a local optimum. Its disadvantage is applicable only when mean is defined and need to specify c, the number of clusters, in advance. It unable to handle noisy data and outliers and not suitable to discover clusters with non-convex shapes

reference — Crime Analysis using K-Means Clustering — ResearchGate




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Introduction to Reinforcement Learning (RL)

Reinforcement Learning

Music Generation using Deep Learning

Responsible Machine Learning

Unsupervised Machine Learning (KMeans Clustering) with Scikit-Learn

“Speech Recognition” Science-Research, February 2022, Week 1 — summary from Arxiv, Astrophysics…

Developing endpoint API for NLP Model

Classifying Subreddit ‘cars’ vs ‘teslamotors’ using Natural Language Processing (NLP)

Identify Customer Segmentation

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sourabh Mishra

Sourabh Mishra

More from Medium

The setup of a successful AI Chatbot

An illustration of Tusan Beach’s famous ‘Horse Head Drinking Water’ located in Sarawak, Malaysia

How has music changed? Diving into Spotify Data

Make Your Graphs Available For Visualization

Image Matching with Shopee