前言
最近618入手了一台带NVIDIA MX250显卡的笔记本,由于本人希望了解CUDA方面知识,因此特意选择了带显卡的笔记本。虽然MX250是入门级独立显卡,为了学习还是够用了。 显卡是采用并行计算的,在深度学习训练和推理方面相比CPU有较大的优势(瘦死骆驼比马大),因此笔者分别采用Intel Core i7 -10510U 以及NVIDIA MX250显卡基于PyTorch 测试了一些常见的CNN模型。 结果: MX230入门级独立显卡大约 = I7-10510U推理性能 x 2 ~3.
Requirement
- Ubuntu 18.04.4
- PyTorch 1.5.1 (with cuda)
- torchvision
- cuda 11.0
Device
-
MX250显卡
图片.png
-
Intel CPU
4核,8线程
图片.png
深度学习CNN模型测试
Model List
- AlexNet
- ResNet-50
- ResNet-18
- ResNet-101
- MobileNet-v2
- SqueezeNet1-1
测试方法
- warn_up = 3, 热机,避免开始时候测量误差大
- loops =10, 每个模型跑10次,计算平均时间
- 分别在CPU和GPU端测试(GPU段测试时候注意CUDA同步问题, CUDA是异步执行的,因此需要在代码中加入CUDA同步)
测试结果
- CPU
==========AlexNet==========
Avg time:30.015921592712402 ms
==========ResNet-50==========
Avg time:80.41181564331055 ms
==========ResNet-18==========
Avg time:31.624460220336914 ms
==========ResNet-101==========
Avg time:124.81389045715332 ms
==========MobileNet-v2==========
Avg time:18.62039566040039 ms
==========SqueezeNet1-1==========
Avg time:15.979170799255371 ms
- MX250 显卡
==========AlexNet==========
Avg time:10.455155372619629 ms
==========ResNet-50==========
Avg time:28.374290466308594 ms
==========ResNet-18==========
Avg time:11.450338363647461 ms
==========ResNet-101==========
Avg time:51.11570358276367 ms
==========MobileNet-v2==========
Avg time:6.742191314697266 ms
==========SqueezeNet1-1==========
Avg time:3.6443233489990234 ms
测试代码
import torch
import torchvision
import torchvision.models as models
import time
import numpy as np
def test_on_device(model, dump_inputs, warn_up, loops, device_type):
if device_type == 'cuda':
assert torch.cuda.is_available()
device = torch.device(device_type)
# model = models.alexnet.alexnet(pretrained=False).to(device)
model.to(device)
model.eval()
dump_inputs = dump_inputs.to(device)
with torch.no_grad():
executions = []
for i in range(warn_up + loops):
if device_type == 'cuda':
torch.cuda.synchronize()
start = time.time()
_ = model(dump_inputs)
if device_type == 'cuda':
torch.cuda.synchronize() # CUDA sync
end = time.time()
executions.append((end-start)*1000) # ms
# print(f'Avg time:{np.mean(executions)} ms')
return np.mean(executions[warn_up:])
if __name__ == "__main__":
# print(torch.cuda.is_available())
model_list = {
'AlexNet': models.alexnet(),
'ResNet-50': models.resnet50(),
'ResNet-18': models.resnet18(),
'ResNet-101': models.resnet101(),
'MobileNet-v2':models.mobilenet_v2(),
'SqueezeNet1-1': models.squeezenet1_1()
}
batch_size = 1
for name, model in model_list.items():
print('='*10+f'{name}'+'='*10)
avg_time = test_on_device(model=model, dump_inputs=torch.rand(batch_size, 3, 224, 224), warn_up=3, loops=10, device_type='cuda')
print(f'Avg time:{avg_time} ms')