AGGLOMERATIVE HIERARCHICAL CLUSTERING ALGORITHM

This is the most often used form of hierarchical clustering method. It's used to organize things into clusters depending on how similar they are to one another. This is a type of bottom-up clustering in which each data point has its own cluster. The clusters are then linked together. Similar clusters are merged at each iteration until all of the data points are part of a single large root cluster. Small clusters are best found via agglomerative clustering. When the method is finished, the output appears like a dendrogram, allowing you to readily view the clusters.

Hierarchical clustering (also known as hierarchical cluster analysis or HCA) is a cluster analysis approach that aims to create a hierarchy of clusters. There are two types of hierarchical clustering strategies: Agglomerative: A "bottom-up" technique in which each observation begins in its own cluster and pairs of clusters are combined as one advances up the hierarchy. Divisive: This is a "top-down" technique, in which all observations begin in the same cluster and splits are conducted iteratively as one progresses down the hierarchy. In general, mergers and splits are determined greedily. A dendrogram is typically used to display the results of hierarchical clustering.

Agglomerative Clustering

Initially each data point is considered as an individual cluster. At each iteration, the similar clusters merge with other clusters until 1/ K clusters are formed. The main advantage is that we donā€™t need to specify the number of clusters, this comes with a price: performance š¯‘‚(š¯‘›3) . In sklearnā€™s implementation, we can specify the number of clusters to assist the algorithmā€™s performance. The algorithm step is shown below: Make a proximity matrix. Assume that each data point is a cluster. Repetition: Merge the two nearest clusters and update the proximity matrix until only one of the K clusters remains. For example, suppose we have six data points: A,B,C,D,E,F. In the first phase, we view each of the six data points as a separate cluster, as illustrated in the graphic below.

THE MAIN CODE OF AGGLOMERATIVE HIERARCHICAL CLUSTERING ALGORITHM

  • from sklearn.cluster import AgglomerativeClustering

Full Code Of Implementing Hierarchical Clustering Algorithm

Practicing Hierarchical clustering algorithm (download the dataset here):