Detecting spam comments posted in micro-blogs using the self-extensible spam dictionary
IEEE Xplore :http://ieeexplore.ieee.org/document/7511605/
Abstract:The high popularity of Weibo has greatly enriched people's lives, allowing online users to share their feelings through posting comments. However, more and more spam comments are also being posted in users' blogs on this social media. In this paper, in order to effectively detect spam comments in Chinese micro-blogs, we introduce semantic analysis to construct a Self-Extensible Spam Dictionary which automatically expands itself when new words emerge on the micro-blogs frequently. The use of semantic analysis can provide us with additional features which are beneficial to detecting spam comments. A Proportion-Weight Filter (PWF) model is also proposed to detect two kinds of spam comments (AD and vulgar comments), by filtering the spam-weight and the spam-proportion of the Weibo comments based on our Self-Extensible Spam Dictionary criteria. Our experimental results demonstrate that when detecting a combination of both AD and vulgar spam comments, we can achieve an average detection accuracy of 87.9%. Particularly for AD spam comments detection, we can achieve an average accuracy of 96.2%, which is preferable compared to when using machine learning methods. The statistical analysis of the results verifies that our proposed methods can identify the spam comments effectively and to relatively high degrees of accuracy.
**Published in: **Communications (ICC), 2016 IEEE International Conference on