PyTorch- 笔记本Nvidia MX250 显卡模型推理性能测试

前言

最近618入手了一台带NVIDIA MX250显卡的笔记本，由于本人希望了解CUDA方面知识，因此特意选择了带显卡的笔记本。虽然MX250是入门级独立显卡，为了学习还是够用了。显卡是采用并行计算的，在深度学习训练和推理方面相比CPU有较大的优势（瘦死骆驼比马大），因此笔者分别采用Intel Core i7 -10510U 以及NVIDIA MX250显卡基于PyTorch 测试了一些常见的CNN模型。 结果: MX230入门级独立显卡大约 = I7-10510U推理性能 x 2 ~3.

Requirement

Ubuntu 18.04.4
PyTorch 1.5.1 (with cuda)
torchvision
cuda 11.0

Device

MX250显卡

图片.png

Intel CPU
4核，8线程

图片.png

深度学习CNN模型测试

Model List

AlexNet
ResNet-50
ResNet-18
ResNet-101
MobileNet-v2
SqueezeNet1-1

测试方法

warn_up = 3, 热机，避免开始时候测量误差大
loops =10, 每个模型跑10次，计算平均时间
分别在CPU和GPU端测试（GPU段测试时候注意CUDA同步问题， CUDA是异步执行的，因此需要在代码中加入CUDA同步）

测试结果

==========AlexNet==========
Avg time:30.015921592712402 ms
==========ResNet-50==========
Avg time:80.41181564331055 ms
==========ResNet-18==========
Avg time:31.624460220336914 ms
==========ResNet-101==========
Avg time:124.81389045715332 ms
==========MobileNet-v2==========
Avg time:18.62039566040039 ms
==========SqueezeNet1-1==========
Avg time:15.979170799255371 ms

MX250 显卡


==========AlexNet==========
Avg time:10.455155372619629 ms
==========ResNet-50==========
Avg time:28.374290466308594 ms
==========ResNet-18==========
Avg time:11.450338363647461 ms
==========ResNet-101==========
Avg time:51.11570358276367 ms
==========MobileNet-v2==========
Avg time:6.742191314697266 ms
==========SqueezeNet1-1==========
Avg time:3.6443233489990234 ms

测试代码

import torch
import torchvision
import torchvision.models as models
import time
import numpy as np

def test_on_device(model, dump_inputs, warn_up, loops, device_type):
    if device_type == 'cuda':
        assert torch.cuda.is_available()
    device = torch.device(device_type)

    # model = models.alexnet.alexnet(pretrained=False).to(device)
    model.to(device)
    model.eval()
    dump_inputs = dump_inputs.to(device)

    with torch.no_grad():
        executions = []
        for i in range(warn_up + loops):
            if device_type == 'cuda':
                torch.cuda.synchronize()
            start = time.time()
            _ = model(dump_inputs)
            if device_type == 'cuda':
                torch.cuda.synchronize() # CUDA sync
            end = time.time()
            executions.append((end-start)*1000) # ms
    # print(f'Avg time:{np.mean(executions)} ms')
    return np.mean(executions[warn_up:])


if __name__ == "__main__":
    # print(torch.cuda.is_available())
    model_list = {
        'AlexNet': models.alexnet(),
        'ResNet-50': models.resnet50(),
        'ResNet-18': models.resnet18(),
        'ResNet-101': models.resnet101(),
        'MobileNet-v2':models.mobilenet_v2(),
        'SqueezeNet1-1': models.squeezenet1_1()
    }

    batch_size = 1
    for name, model in model_list.items():
        print('='*10+f'{name}'+'='*10)
        avg_time = test_on_device(model=model, dump_inputs=torch.rand(batch_size, 3, 224, 224), warn_up=3, loops=10, device_type='cuda')
        print(f'Avg time:{avg_time} ms')

最后编辑于：2020.06.20 01:16:06