蓄(tuo)谋(yan)已久的我终于开始正式学习python啦,学习了三个教程:
1.python基础:《父与子的编程之旅》,通俗的python教程。
2.爬虫基础:Python爬虫学习系列教程_by崔庆才
3.方便好用的库:Beautiful Soup 4.4.0 文档【官方】
然后成功写出了一只能蠕动的爬虫,鸡冻!鸡冻!虽然很渣,但是终于成功了鸡冻啊!
# coding:utf-8
# 爬取指定页码的糗事百科24h页面的作者、内容、点赞数、评论数
import requests
from bs4 import BeautifulSoup
while True :
page = raw_input('请输入要显示的“糗事百科24h”页码: ')
url = 'http://www.qiushibaike.com/hot/page/' + page
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = {'User-Agent': user_agent}
html = requests.get(url, headers = headers)
soup = BeautifulSoup(html.text, 'lxml')
content_left = soup.find('div', id = 'content-left', class_ = 'col1')
authors = content_left.find_all('h2')
contents = content_left.find_all('div', class_ = 'content')
comments = content_left.find_all('span', class_ = 'stats-comments')
votes = content_left.find_all('span', class_ = 'stats-vote')
for i in range(int(len(contents))):
print authors[i].text
print contents[i].text
print votes[i].text
print comments[i].text
print '_____________________________________'