Python网络爬虫数据格式学习（转换headers、表单和urlencode数据为字典格式）

最近在学习爬虫时经常要复制浏览器的headers和表单数据到Python进行操作，但是复制过来的IE的数据格式是对用制表符（'\t'）进行分隔，而Chrome复制过来的是用冒号（':'）分隔，不能够直接转为字典格式使用。为了方便以后编程就自己写了个小程序进行转换。

Python的标准库应该有类似的方法，但自己找不到，知道的朋友麻烦告知下。谢谢！

# -*- coding: utf-8 -*-
"""
@author: Cy
"""
def strtodict(inputstr,sep=':',linesep='\n'):
    #linesep为行分隔符标记，默认为换行符。
    #sep为内部分隔符标记，默认为冒号
    if linesep !='\n':
        inputstr=inputstr.replace(linesep,'\n')
    strlist=RemoveEmptyLineInList(inputstr.split('\n'))
    strdicts={}
    for line in strlist:
        line=line.split(sep)
        if sep==':':
            strdicts[line[0]]=':'.join(line[1:])
        else:
            strdicts[line[0]]=line[1]
    return strdicts


def RemoveEmptyLineInList(listObj):
    newList = []
    for val in listObj:
        if val :
            newList.append(val);
    return newList

还发现了`urllib`库的`urllib.parse.unquote()`可以将IE浏览器里的已经urlencode的地址转化为原始数据。

tmppostdata=urllib.parse.unquote(urlencodedata)
postdata=strtodict(tmppostdata,sep='=',linesep='&')

2016年9月4日还发现了别人另外一种更简洁的代码：

dict([item.split('=') for item in url_encode_data.split('&')])

最后编辑于：2017.12.04 01:14:07

Python网络爬虫数据格式学习（转换headers、表单和urlencode数据为字典格式）

Python的标准库应该有类似的方法，但自己找不到，知道的朋友麻烦告知下。 谢谢！

还发现了urllib库的urllib.parse.unquote()可以将IE浏览器里的已经urlencode的地址转化为原始数据。

推荐阅读更多精彩内容

Python的标准库应该有类似的方法，但自己找不到，知道的朋友麻烦告知下。谢谢！

还发现了`urllib`库的`urllib.parse.unquote()`可以将IE浏览器里的已经urlencode的地址转化为原始数据。