Fork me on GitHub

clusterfck is a JavaScript library for hierarchical clustering. Clustering is used to group similar items together. Hierarchical clustering in particular is used when a hierarchy of items is needed or when the number of clusters isn't known ahead of time. An example use, clustering similar colors based on their rgb values:

var clusterfck = require("clusterfck");

var colors = [[255, 255, 240],
              [20, 120, 102],
              [250, 255, 253],
              [100, 54, 300]];

var threshold = 14; // only combine two clusters with distance less than 14

var clusters = clusterfck.hcluster(colors, clusterfck.EUCLIDEAN_DISTANCE,
	clusterfck.AVERAGE_LINKAGE, threshold);
clusters will be an array of clusters. Each cluster is a hierarchy with left and right clusters. The leaf clusters will have a canonical property for the original item, in this case the vector of rgb values that represents the color.

Download

To use this in the browser, download the latest clusterfck.js. Performing the clustering on a Worker thread is recommended, as it can be an expensive operation.

To use this as a commonJS package or server-side, checkout or download the code from GitHub and install on node with npm:

cd clusterfck
npm install .

Hierarchical Clustering
The hierarchical clustering method performs a standard bottom-up agglomerative hierarchical clustering of objects. You can either build up a hierarchy of all the items with one root node, or given a threshold parameter cluster until there are no more clusters under a certain distance apart, getting an array of hierarchies.
API
var clusters = clusterfck.hcluster(items, metric, linkage, threshold)
hcluster takes these arguments:
  • items - The items to be clustered. The items can be anything, but should be arrays (vectors) if any of the predefined distance metrics are used.
  • metric - The distance metric for measuring the distance between two items. Can either be a function that takes two items and returns a float representing the distance, or it can be one of the pre-defined functions:
    • clusterfck.EUCLIDEAN_DISTANCE
    • clusterfck.MANHATTAN_DISTANCE
    • clusterfck.MAX_DISTANCE
  • linkage - The linkage criteria for determining the distance between two clusters. Can either be a function that takes two items and returns an item that represents a merge, or one of the predefined linkage criteria:
    • clusterfck.AVERAGE_LINKAGE - the distance between two clusters is an average of the differences between the items in the clusters.
    • clusterfck.SINGLE_LINKAGE - the distance between clusters is the smallest distance between an item from each cluster.
    • clusterfck.COMPLETE_LINKAGE - the distance between clusters is the largest distance between two items in the clusters.
  • threshold - The optional stopping criterion. When every cluster is more than threshold distance apart, clustering is stopped and the current set of hierarchies is returned.
Clustering returns an array of hierarchies of the original objects. Each hierarchy has a left and right cluster, and the leaf clusters have a canonical property with the original item:
[
  {"canonical":[20,120,102],
   "size":1
  },
  {"canonical":[250,255,253],
   "left":{
      "canonical":[250,255,253],
      "size":1
    },
   "right":{
      "canonical":[255,255,240],
      "size":1
    },
   "size":2
  },
  {"canonical":[100,54,300],
   "size":1
  }
]
Demo
This webpage takes the colors from an image and clusters them based on their rgb values using different linkage criteria. You can see the code for this page on clusterfck's github.