Improved K-Means Algorithm: Integrating Density Peaks and Adaptive K-Value for Mall Customer Segmentation

Customer segmentation is a core application of data mining in the retail industry. Traditional K-means clustering is widely adopted here for its simple principle and high computational efficiency, yet it has notable drawbacks: random initial clustering centers easily lead to local optimal solutions, it is highly sensitive to abnormal data, and the cluster number K relies on manual experience, resulting in unstable clustering performance. This paper designs an improved K-means algorithm, which filters outliers through a two-layer mechanism combining Local Outlier Factor and distance threshold. It also constructs a multi-index system with Silhouette Coefficient, Calinski-Harabasz and Davies-Bouldin indices to automatically determine the optimal K-value, optimizes initial centers via density peak clustering, and introduces weighted Euclidean distance to enhance clustering compactness. Experiments on the Mall Customer Segmentation dataset compare the proposed algorithm with traditional K-means, K-medoids and DBSCAN. Results show it achieves a Silhouette Coefficient of 0.5821, a CH index of 1025.36 and a DB index of 0.5107, outperforming all comparison algorithms in all indicators with more reasonable and stable clustering results. Applied to mall customer segmentation, this algorithm divides customers into 5 groups with distinct characteristics, providing solid data support for malls to formulate scientific and differentiated marketing strategies.

Liked Liked