最近在写代码的时候,遇到一个诡异的List越界的问题:IndexError: list index out of range ,对着出错的这一段代码左看右看都发现不了问题:
只有老老实实一条条的跟踪下来,结果发现一个很有趣的现象,每一次读完一行数据后,原始的csv文件中总有一行空行。
最后在stackoverflow上找到了比较经典的解释,原来 python3里面对 str和bytes类型做了严格的区分,不像python2里面某些函数里可以混用。所以用python3来写wirterow时,打开文件不要用wb模式,只需要使用w模式,然后带上newline=‘’。
In Python 2.X, it was required to open the csvfile with 'b' because the csv module does its own line termination handling.
In Python 3.X, the csv module still does its own line termination handling, but still needs to know an encoding for Unicode strings. The correct way to open a csv file for writing is:
outputfile=open("out.csv",'w',encoding='utf8',newline='')
encoding
can be whatever you require, but newline=''
suppresses text mode newline handling. On Windows, failing to do this will write \r\r\n file line endings instead of the correct \r\n. This is mentioned in the 3.X csv.reader documentation only, but csv.writer requires it as well.
所以需要将之前 写CSV文件的方式改为以下代码则运行成功
# 将每一条数据抽离,保存在 citys.csv 文件中
with open("./citys.csv", "w",newline='') as f:
writ = csv.writer(f)
for city_tag in city_tags:
# 获取 <a> 标签的 href 链接
city_url = city_tag.get("href")
# 获取 <a> 标签的文字,如:天津
city_name = city_tag.get_text()
writ.writerow((city_name, city_url))
运行后获得正常的文件:
特记之。