最近在使用python中的textblob模板分析电商评论的情感,途中遇到了一些坑再这里记录下:
首先给出官方文档:
https://textblob.readthedocs.io/en/dev/
简单地使用其介绍的文档确实挺简单的
from textblob import TextBlob
train = [
('I love this sandwich.', 'pos'),
('this is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('this is my best work.', 'pos'),
("what an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('he is my sworn enemy!', 'neg'),
('my boss is horrible.', 'neg')
]
test = [
('the beer was good.', 'pos'),
('I do not enjoy my job', 'neg'),
("I ain't feeling dandy today.", 'neg'),
("I feel amazing!", 'pos'),
('Gary is a friend of mine.', 'pos'),
("I can't believe I'm doing this.", 'neg')
]
from textblob.classifiers import NaiveBayesClassifier
cl = NaiveBayesClassifier(train)
cl.classify("This is an amazing library!")
这些代码都是官方文档里的,返回的是pos,即"This is an amazing library!"这句话是积极的
但这要来处理大量的文本数据,单靠这几个训练集未免太草率了,于是官方又给出了一个叫 情感分析仪的东西,使用了nltk 中的语料库,格式同样也是有消极和积极两类,代码如下:
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
blob = TextBlob("I love this library", analyzer=NaiveBayesAnalyzer())
blob.sentiment
Sentiment(classification='pos', p_pos=0.7996209910191279, p_neg=0.2003790089808724)
但执行时报了错,叫你执行
import nltk
nltk.download('movie_reviews')
可你执行后会出现这个错误:
[nltk_data] Error loading movie_reviews: <urlopen error [WinError
[nltk_data] 10054] 远程主机强迫关闭了一个现有的连接。>
意思是你要下载的数据资源连接不了,参考网上的解决办法,有人给出了手动加载的办法https://blog.csdn.net/qq_37891889/article/details/104418106
这里补充下,要打开那个界面需要执行python语句
nltk.download() 此时就会弹出那个界面,把下载的资源压缩到对应的目录就可以了
然后再执行
from nltk.book import *
加载下
这样就可以继续官方给的代码了
不过那个 NaiveBayesAnalyzer的方法好慢,自我猜测是语句出现了重复执行,可惜找不到文档,建议还是用另一个 PatternAnalyzer试试