The hierarchical clustering method performs a standard bottom-up agglomerative hierarchical clustering of objects. You can either build up a hierarchy of all the items with one root node, or given a threshold parameter cluster until there are no more clusters under a certain distance apart, getting an array of hierarchies.
hcluster
takes these arguments:
-
items
- The items to be clustered. The items can be anything, but should be arrays (vectors) if
any of the predefined distance metrics are used.
-
metric
- The distance metric for measuring the distance between two items. Can either be a function that takes two items and returns a float representing the distance, or it can be one of the pre-defined functions:
-
clusterfck.EUCLIDEAN_DISTANCE
-
clusterfck.MANHATTAN_DISTANCE
-
clusterfck.MAX_DISTANCE
-
linkage
- The linkage criteria for determining the distance between two clusters. Can either be a function that takes two items and returns an item that represents a merge, or one of the predefined linkage criteria:
-
clusterfck.AVERAGE_LINKAGE
- the distance between two clusters is an average of the
differences between the items in the clusters.
-
clusterfck.SINGLE_LINKAGE
- the distance between clusters is the smallest distance
between an item from each cluster.
-
clusterfck.COMPLETE_LINKAGE
- the distance between clusters is the largest distance
between two items in the clusters.
-
threshold
- The optional stopping criterion. When every cluster is more than threshold
distance apart, clustering is stopped and the current set of hierarchies is returned.
Clustering returns an array of hierarchies of the original objects. Each hierarchy has a
left
and
right
cluster, and the leaf clusters have a
canonical
property with the original item:
[
{"canonical":[20,120,102],
"size":1
},
{"canonical":[250,255,253],
"left":{
"canonical":[250,255,253],
"size":1
},
"right":{
"canonical":[255,255,240],
"size":1
},
"size":2
},
{"canonical":[100,54,300],
"size":1
}
]
This webpage takes the colors from an image and clusters them based on their rgb values using
different linkage criteria. You can see the code for this page on
clusterfck's github.