常用符号
- .:匹配任意字符,换行符\n除外。几个点就匹配几个字符
a = 'xyxy123'
b1 = re.findall('x.',a)
b2 = re.findall('x..',a)
print(b2)
print(b1)
#['xyx']
#['xy', 'xy']
- *:匹配前一个字符0次或者无限次
a = 'xyxyxx123'
b1 = re.findall('x*',a)
print(b1)
['x', '', 'x', '', 'xx', '', '', '', '']
- ?:匹配前一个字符0次或1次
a = 'xyxyxx123'
b1 = re.findall('x?',a)
print(b1)
#['x', '', 'x', '', 'x', 'x', '', '', '', '']
- .*:贪心算法
sd = 'asdfxxixxfhgjlkxxlovexxewrxhiu'
f = re.findall('xx.*?xx',sd)
print(f)
#['xxixxfhgjlkxxlovexx']
- .*?:非贪心算法
sd = 'asdfxxixxfhgjlkxxlovexxewrxhiu'
f = re.findall('xx.*?xx',sd)
print(f)
#['xxixx', 'xxlovexx']
- ():括号内的内容作为结果输出
sd = 'asdfxxixxfhgjlkxxlovexxewrxhiu'
f = re.findall('xx(.*?)xx',sd)
print(f)
#['i', 'love']
常用方法
1.findall():匹配所有符合规律的内容,并返回包含结果的列表
** 匹配数字时的用法**
sd = 'xxx123hkhlk245'
r = re.findall('(\d+)',sd)
print(r)
#['123', '245']
2.search():匹配并提取第一个符合规律的内容,返回一个正则表达式对象
sd = 'asdfxxixxfhgjlkxxlovexxewrxhiu'
s = re.search('xx.*xxfhgjlkxx(.*?)xx',sd).group(1)
print(s)
#love
3.sub():替换符合规律的内容, 返回替换后的值
sd = 'asdfxxixxfhgjlkxxlovexxewrxhiu'
r = re.sub('xx(.*?)xx','xx%dxx'%123,sd)
print(r)
#asdfxx123xxfhgjlkxx123xxewrxhiu
#利用sub实现翻页功能
total = 400
old_url = 'http://tieba.baidu.com/f?kw=%E6%9D%8E%E6%AF%85&ie=utf-8&pn=100'
for page in range(0,total+1,50):
url = re.sub('pn=\d+','pn=%d'%page,old_url)
print(url)
#http://tieba.baidu.com/f?kw=%E6%9D%8E%E6%AF%85&ie=utf-8&pn=0
#http://tieba.baidu.com/f?kw=%E6%9D%8E%E6%AF%85&ie=utf-8&pn=50
#http://tieba.baidu.com/f?kw=%E6%9D%8E%E6%AF%85&ie=utf-8&pn=100
#http://tieba.baidu.com/f?kw=%E6%9D%8E%E6%AF%85&ie=utf-8&pn=150
#http://tieba.baidu.com/f?kw=%E6%9D%8E%E6%AF%85&ie=utf-8&pn=200
#http://tieba.baidu.com/f?kw=%E6%9D%8E%E6%AF%85&ie=utf-8&pn=250
#http://tieba.baidu.com/f?kw=%E6%9D%8E%E6%AF%85&ie=utf-8&pn=300
#http://tieba.baidu.com/f?kw=%E6%9D%8E%E6%AF%85&ie=utf-8&pn=350
#http://tieba.baidu.com/f?kw=%E6%9D%8E%E6%AF%85&ie=utf-8&pn=400