学术论文---用智能垃圾词典来识别微博中的垃圾评论

Detecting spam comments posted in micro-blogs using the self-extensible spam dictionary

IEEE Xplore :http://ieeexplore.ieee.org/document/7511605/

Abstract:The high popularity of Weibo has greatly enriched people's lives, allowing online users to share their feelings through posting comments. However, more and more spam comments are also being posted in users' blogs on this social media. In this paper, in order to effectively detect spam comments in Chinese micro-blogs, we introduce semantic analysis to construct a Self-Extensible Spam Dictionary which automatically expands itself when new words emerge on the micro-blogs frequently. The use of semantic analysis can provide us with additional features which are beneficial to detecting spam comments. A Proportion-Weight Filter (PWF) model is also proposed to detect two kinds of spam comments (AD and vulgar comments), by filtering the spam-weight and the spam-proportion of the Weibo comments based on our Self-Extensible Spam Dictionary criteria. Our experimental results demonstrate that when detecting a combination of both AD and vulgar spam comments, we can achieve an average detection accuracy of 87.9%. Particularly for AD spam comments detection, we can achieve an average accuracy of 96.2%, which is preferable compared to when using machine learning methods. The statistical analysis of the results verifies that our proposed methods can identify the spam comments effectively and to relatively high degrees of accuracy.

**Published in: **Communications (ICC), 2016 IEEE International Conference on

学术论文---用智能垃圾词典来识别微博中的垃圾评论

Detecting spam comments posted in micro-blogs using the self-extensible spam dictionary

推荐阅读更多精彩内容