Understand the ins and outs of hierarchical clustering, and how it applies to marketing campaign analysis in the banking industry.
Imagine being a Data Scientist at a leading financial institution, and your task is to assist your team in categorizing existing clients into distinct profiles:low
, average
, medium
and platinum
for loan approval.
But, here is the catch:
There is no such historical label attached to these customers, so how do you proceed with the creation of these categories?
This is where clustering can help, an unsupervised machine-learning technique to group unlabeled data into similar categories.
Multiple clustering techniques exist, but this tutorial will focus more on the hierarchical clustering
approach.
It starts by providing an overview of what hierarchical clustering
is, before walking you through a step-by-step implementation in Python
using the popular Scipy
library.
Hierarchical clustering
is a technique for grouping data into a tree of clusters called dendrograms, representing the hierarchical relationship between the underlying clusters.
The hierarchical clustering algorithm relies on distance measures to form clusters, and it typically involves the following main steps:
- Computation of the distance matrix containing the distance between each pair of data points using a particular distance metric such as Euclidean distance, Manhattan distance, or cosine similarity
- Merge the two clusters that are the closest in distance
- Update the distance matrix with regard to the new clusters
- Repeat steps 1, 2, and 3 until all the…