现在想看个电影越来越麻烦,在线看就要各种会员,下载看就要先忍受一波各种垃圾广告的狂轰乱炸,于是,写个爬虫抓取电影资源的下载链接。
1.这里以比特兔为例(其实各种bt网站大同小异)
(网址)[http://www.btrabbit.cc/]
2.搜索一部电影如守法公民,网址变为“http://www.btrabbit.cc/search/守法公民.html”
3.右键检查(Chrome),copy Xpath即可直接获得下载路径
4.源码:
# -*- coding: utf-8 -*-
import os
import sys
import re
import requests
from lxml import html
reload(sys)
sys.setdefaultencoding('utf8')
def analyUrl(name):
url='http://www.btrabbit.cc/search/%s.html'%name
response=requests.get(url).content
selector = html.fromstring(response)
hrefs=selector.xpath('//div[@class="search-item detail-width"]')
sourcelist=[]
if len(hrefs)>0:
href=hrefs[0]
for x in hrefs:
name=x.xpath('div[@class="item-title"]/h3/a/@title')
nameStr=''
nameStr=nameStr+name[0]
detail=href.xpath('div[@class="item-bar"]/a/text()')
if detail:
nameStr=nameStr+detail[0]
sourcelist.append(nameStr)
downUrl=x.xpath('div[@class="item-bar"]/a/@href')
sourcelist.append(downUrl[0])
if len(sourcelist)==2:
break
return sourcelist
def searchFH(name):
seedstr = '\n'.join(analyUrl(name))
return seedstr
if __name__ == '__main__':
print searchFH('守法公民')
5.完成。