This R package provides a fast C++ (re)implementation of several density-based algorithms with a focus on the DBSCAN family for clustering spatial data. The package includes:
Clustering
Outlier Detection
Fast Nearest-Neighbor Search (using kd-trees)
The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search, and are typically faster than the native R implementations (e.g., dbscan in package fpc
), or the implementations in WEKA, ELKI and Python’s scikit-learn.
Stable CRAN version: install from within R with
Current development version: Download package from AppVeyor or install from GitHub (needs devtools).
Load the package and use the numeric variables in the iris dataset
Run DBSCAN
DBSCAN clustering for 150 objects.
Parameters: eps = 0.4, minPts = 4
The clustering contains 4 cluster(s) and 25 noise points.
0 1 2 3 4
25 47 38 36 4
Available fields: cluster, eps, minPts
Visualize results (noise is shown in black)
Calculate LOF (local outlier factor) and visualize (larger bubbles in the visualization have a larger LOF)
Run OPTICS
OPTICS clustering for 150 objects.
Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA
Available fields: order, reachdist, coredist, predecessor, minPts, eps, eps_cl, xi
Extract DBSCAN-like clustering from OPTICS and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)
Extract a hierarchical clustering using the Xi method (captures clusters of varying density)
Run HDBSCAN (captures stable clusters)
HDBSCAN clustering for 150 objects.
Parameters: minPts = 4
The clustering contains 2 cluster(s) and 0 noise points.
1 2
100 50
Available fields: cluster, minPts, cluster_scores, membership_prob, outlier_scores, hc
Visualize the results as a simplified tree
See how well each point corresponds to the clusters found by the model used
colors <- mapply(function(col, i) adjustcolor(col, alpha.f = hdb$membership_prob[i]),
palette()[hdb$cluster+1], seq_along(hdb$cluster))
plot(x, col=colors, pch=20)
The dbscan package is licensed under the GNU General Public License (GPL) Version 3. The OPTICSXi R implementation was directly ported from the ELKI framework’s Java implementation (GNU AGPLv3), with explicit permission granted by the original author, Erich Schubert.