智能技术学报

文章详情

稿件标题: Research on Micro-blog New Word Recognition Based on SVM
稿件作者: Chaoting Xiao, Jianhou Gan, Bin Wen, Wei Zhang, Xiaochun Cao
关键字词: new words recognition; Natural Language Processing (NLP); enhanced mutual information; relative adjacency entropy; mapReduce; SVM
文章摘要: New word discovery possesses a significance in the field of Natural Language Processing (NLP). As the effect of mutual information on multi-string is not good, we improve the traditional mutual information and adjacency entropy method respectively and put forward enhancement of mutual information and relative adjacency entropy. As multi-feature massive data bring the problem of slow speed, we use the MapReduce parallel computing model to extract some features such as, enhancement of mutual information, relative adjacency entropy and background document frequency. With the extracted eight features, the feature vectors of the candidate words are formed, and the SVM model can be trained by the labelled corpus. The experiments show that the proposed method accelerates the computing speed and shortens the time required by the whole recognition process. In addition, comparing with the existing methods, we can see that the F value reaches 86.98%.
收录刊物: 2017年2卷2期
稿件基金:
浏览次数: 83
下载次数: 46
点击下载