Systems Engineering and Electronics

• SOFTWARE ALGORITHM AND SIMULATION • Previous Articles     Next Articles

Tag clustering algorithm LMMSK: improved K-means algorithm based on latent semantic analysis

Jing Yang and Jun Wang*   

  1. School of Economics and Management, Beihang University, Beijing 100191, China
  • Online:2017-04-25 Published:2010-01-03

Abstract:

With the wide application of Web 2.0 and social software, there are more and more tag-related studies and applications. Because of the randomness and the personalization in users’ tagging, tag research continues to encounter data space and semantics obstacles. With the min-max similarity (MMS) to establish the initial centroids, the traditional K-means clustering algorithm is firstly improved to the MMSK-means clustering algorithm, the superiority of which has been tested; based on MMSK-means and combined with latent semantic analysis (LSA), here secondly emerges a new tag clustering algorithm, LMMSK. Finally, three algorithms for tag clustering, MMSK-means, tag clustering based on LSA (LSA-based algorithm) and LMMSK, have been run on Matlab, using a real tag-resource dataset obtained from the Delicious Social Bookmarking System from 2004 to 2009. LMMSK’s clustering result turns out to be the most effective and the most accurate. Thus, a better tag-clustering algorithm is found for greater application of social tags in personalized search, topic identification or knowledge community discovery. In addition, for a better comparison of the clustering results, the clustering corresponding results matrix (CCR matrix) is proposed, which is promisingly expected to be an effective tool to capture the evolutions of the social tagging system.