Skip to content

Fuzzy clustering

Where crisp k-means assigns each point to exactly one cluster, fuzzy clustering gives each point a graded membership in every cluster. fuzzytool.cluster works on plain NumPy arrays of shape (n_samples, n_features) and offers three algorithms plus validity metrics. Every algorithm takes a seed for reproducibility.

Function Norm / idea Good for
fuzzy_cmeans Euclidean, memberships sum to 1 spherical clusters
gustafson_kessel adaptive per-cluster Mahalanobis norm ellipsoidal clusters
possibilistic_cmeans typicalities (no sum-to-1) noisy data / outliers
import fuzzytool as fz
from fuzzytool import cluster
from fuzzytool.datasets import make_blobs

X = make_blobs(centers=((0, 0), (6, 6), (0, 6)), seed=0)

res = fz.fuzzy_cmeans(X, c=3, m=2.0, seed=0)
res.centers     # (3, 2) cluster prototypes
res.u           # (3, n) membership matrix (columns sum to 1)
res.labels      # hard assignment, argmax over u
res.n_iter      # iterations to converge

Validity metrics

How many clusters? Compare runs with internal validity indices:

cluster.partition_coefficient(res.u)   # in (1/c, 1]; higher = crisper
cluster.partition_entropy(res.u)       # in [0, log c); lower = crisper
cluster.xie_beni(X, res.centers, res.u)  # compactness/separation; lower = better

A common recipe is to sweep c and pick the value that maximizes the partition coefficient (or minimizes Xie-Beni).

Visualization

import matplotlib.pyplot as plt
from fuzzytool import viz

viz.plot_clusters(X, res)   # 2-D scatter; opacity encodes top membership
plt.show()

Fuzzy c-means on three blobs: points colored by cluster, centers marked, opacity encoding top membership