处理数据篇:
1.合理的利用多线程,不用multiprocess模块,用concurrent futures的模块,可以根据电脑本身有多少个cpu,然后就开多少个进程,加快数据的处理。
参考https://zhuanlan.zhihu.com/p/40637138
def resize_img(img):
img = cv2.imread(img)
img = cv2.resize(img,(256,256))
img_files = glob.glob(dst_path +'/*.jpg')
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor as executor:
for image_file, resize_file in zip(img_files, executor.map(resize_img, img_files)):
print(f"A thumbnail for {image_file} was saved as {resize_file}")#这句可能有问题,可以注释掉
自己电脑逻辑cpu有12个,所以开了12个
查看逻辑cpu个数
cat /proc/cpuinfo| grep "processor"| wc -l
2.csv数据读入
有csv的包,直接读
后来改成了用pandas来读取,可以直接对数据进行处理,方便一丢丢。比如选取数据什么的。
import pandas as pd
df = pd.read_csv('train_label.csv',names=['name','label'])
print(df.shape[0]) #打印第一列大小
p_data = df[df['label']==1] #取出label为1的数据
n_data = df[df['label']==0]
res = pd.concat([p_data,n_data],axis=0,ignore_index = True)#忽略原有的index,重新排序
print(res.shape)
print(p_data.shape,n_data.shape)
print(df.iat[0,0])
print(df.iat[0,1])
ds = df.sample()
print(ds.iat[0,0]) #访问第0行第0个原始
train_df = df[0:100]
mini_df = pd.read_csv('mining_label.csv',names=['name','label'])
total = pd.concat([train_df,mini_df],axis=0)
print(total.iat[100,0])
print(total.iat[200,0])
3.换源
-i https://pypi.tuna.tsinghua.edu.cn/simple(
4.读取视频的帧率
cap = cv2.VideoCapture(path)
if cap.isOpened():
print(cap.get(cv2.CAP_PROP_FPS))
print(cap.get(cv2.CAP_PROP_FRAME_COUNT))