拥抱transformer，和变形金刚战斗到底

最近是transformer杠上了，参试了下detr的目标检测项目，又接触了NLP的机器翻译，可以说transformer已经有一统nlp和cv的趋势，当然卷积神经网络的特点和地位还是有的，目前更多是transformer和CNN的融合，然后在更多领域应用，同时随着微软亚研院的swin-transformer问世，使得传统的detr训练速度问题得以解决，应该说是从swin transformer 开始在cv领域真正的看到实用价值，更类似是yolov3对于目标检测的意义。

在这之前应该更好的了解transformer的来源，其实是在NLP领域，这里记录飞浆Vision Transformer的课来理解下。

image.png

transformer从某个角度起的非常好就是像变形金刚一样，我们指定最早变形金刚其实都不会变形，但是后来通过扫描汽车，飞机等的特征转换成自身的架构特点和变身模式，从而有了变形金刚，而这样就可以通过一种方式从任何看到的形态变成机器人形态。这也真是transformer的核心理念。

当然这里的transformer首先是在NLP领域，近年来已经基本统治了自然语言处理的benchmark前十名

image.png

而2020年开始transformer又开始在图像领域开始发热

image.png

基本图像分类，目标检测，图像分割等领域都是基于transformer屠榜了。

ViT中以目标检测为例，最早的detr说明了ViT的可行性，Deformable和Anchor DETR等说明了有效性，而swin则更是进一步说明transformer在各领域的通用性，特别是视觉和多模态，正如变形金刚一样，只要通过编解码的方式可以将任何模式的数据映射到另一个模式，从而实现模型的通用性。

transfomer也是更好的将各个模态的而数据特征映射到统一的空间，这样就更像我们的人类学习一样，具有迁移性，就好像我们看到苹果，拿一张画或图，再听语言说明就可以立体的学习什么是苹果，从而映射到一个概念的空间和外延等。
又比如NLP的多语言翻译一样，与其学习任意两种语言转换，不如转到一个共有的语义空间然后转为任何语言，我们人往往也是如此学习的，所以外语学多了会触类旁通，学别的也快，就是这个道理。
所以NLP和CV领域也逐渐像大预训练和迁移微调转变，基本都是通过微调预训练的模型，再去解决下游任务，而不是过去一个个模型来。

学习方法很重要，transformer看起来理论比过去各种模型结构少了很多，比如detr ，但是要精通和调参却非常难，往往现在的模型的参数都是非常庞大的。核心就是coding is all you need 实践第一。

image.png

who am I 我是谁
where am I 我在哪
what should I do 我要干嘛

image.png

学神经网络数据结构也是第一个要熟悉的，看清数据怎么变换的。

废话少说，先搭个swin来体验吧

部署训练STOD

开源地址： https://github.com/SwinTransformer/Swin-Transformer-Object-Detection
论文：https://arxiv.org/pdf/2103.14030.pdf

环境部署

swin是微软亚洲研究院基于mmdetection目标检测库搭建的，所以先构建mmdetection镜像
使用官方docker 文件夹里的Dockerfile 构建镜像 yihui8776/mmdetecion:v1

ARG PYTORCH="1.6.0"
ARG CUDA="10.1"
ARG CUDNN="7"

FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel

ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX"
ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all"
ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"

RUN apt-get update && apt-get install -y ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Install MMCV
#RUN pip install  --default-timeout=200  mmcv-full==1.3.9  -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html
RUN pip install  --default-timeout=2000  mmcv-full==1.0.5  -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html --trusted-host download.openmmlab.com

#COPY mmcv_full-1.3.9-cp37-cp37m-manylinux1_x86_64.whl  /workspace
#RUN pip install /workspace/mmcv_full-1.3.9-cp37-cp37m-manylinux1_x86_64.whl

# Install MMDetection
RUN conda clean --all
RUN git clone https://github.com/open-mmlab/mmdetection.git /mmdetection
WORKDIR /mmdetection
ENV FORCE_CUDA="1"
RUN pip install -r requirements/build.txt
RUN pip install --no-cache-dir -e . -i https://pypi.douban.com/simple

构建镜像 yihui8776/swindetr:v0.1

FROM  yihui8776/mmdetection:v1

MAINTAINER yihui8776  <wangyaohui8776@sina.com>

RUN apt-get update
RUN apt-get install -y  vim openssh-server && \
  apt-get clean && \
  rm -rf /var/lib/apt/lists/*

# SSH Server
RUN sed -i 's/^\(PermitRootLogin\).*/\1 yes/g' /etc/ssh/sshd_config && \
    sed -i 's/^PermitEmptyPasswords .*/PermitEmptyPasswords yes/g' /etc/ssh/sshd_config && \
            echo 'root:ai1234' > /tmp/passwd && \
                    chpasswd < /tmp/passwd && \
                            rm -rf /tmp/passwd


RUN pip install jupyter -i https://pypi.doubanio.com/simple

COPY . /workspace
COPY run_jupyter.sh /
RUN chmod +x  /run_jupyter.sh


WORKDIR /workspace

EXPOSE 22

EXPOSE 8888

CMD ["/run_jupyter.sh", "--allow-root"]

运行容器

docker run --gpus '"device=1,2,3"' -itd --shm-size 12G -v /media/nizhengqi/sdf/wyh/data:/workspace/data -v /media/nizhengqi/sdf/wyh/Swin-Transformer-Object-Detection:/workspace -p 8890:8888 -p 2223:2222 --name swdetr yihui8776/swindetr:v0.1

进入容器编译安装 apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -r requirements.txt
python setup.py install --cpp_ext
测试编译mmdet如报错AttributeError: module ‘pycocotools’ has no attribute ‘version’则
pip uninstall pycocotools
pip install mmpycocotools
测试python demo/image_demo.py demo/demo.jpg configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py mask_rcnn_swin_tiny_patch4_window7.pth
预训练模型文件在GitHub可以得到

训练自己的数据集

数据转换
主要使用maskrcnn训练，需要转为coco格式在标注增加 segmentation数据转换基本和detr相同，voc转coco

points =  [[xmin, ymin], [xmax, ymin], [xmin, ymax], [xmax, ymax]]
                seg = [np.asarray(points).flatten().tolist()]
                ann = {
                    "area": o_width * o_height,
                    "iscrowd": 0,
                    "image_id": image_id,
                    "bbox": [xmin, ymin, o_width, o_height],
                    "category_id": category_id,
                    "id": bnd_id,
                    "ignore": 0,
                    "segmentation": seg,
                }

位置在 /media/nizhengqi/sdf/wyh/data/safehat
容器内位于 data/safehat
python voc2coco.py xml/xml_train.py annotations/instances_train2017.json
python voc2coco.py xml/xml_val.py annotations/instances_val2017.json
图片位置也是 data/safehat/train2017 和 data/safehat/val2017
相应修改configs/base/datasets/coco_detection.py中数据集路径并调整samples_per_gpu和workers_per_gpu

# 修改数据集的类型，路径
dataset_type = 'CocoDataset'
data_root = '/home/coco/'

# 修改img_size等参数，CUDA out of memory时可以修改
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    # 原本为1333*800
    #dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='Resize', img_scale=(416, 416), keep_ratio=True),

# 修改batch_size
data = dict(
    samples_per_gpu=1, # 每块GPU上的sample个数，batch_size = gpu数目*该参数
    workers_per_gpu=1, # 每块GPU上的workers的个数
    # 以train为例
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json', # 标注路径
        img_prefix=data_root + 'train2017/', # 训练图片路径
        pipeline=train_pipeline),

修改权重文件
主要修改类别为自己的类别数 cat changeclass.py

import torch

#model_path = "E:/workspace/Swin-Transformer-Object-Detection/checkpoints/mask_rcnn_swin_tiny_patch4_window7.pth",
model_save_dir = "./"

pretrained_weights = torch.load('mask_rcnn_swin_tiny_patch4_window7.pth')
#pretrained_weights = torch.load('E:\workspace\Swin-Transformer-Object-Detection\checkpoints\cascade_mask_rcnn_swin_small_patch4_window7.pth')
num_class = 35   #实际类别数

pretrained_weights['state_dict']['roi_head.bbox_head.fc_cls.weight'].resize_(num_class + 1, 1024)
pretrained_weights['state_dict']['roi_head.bbox_head.fc_cls.bias'].resize_(num_class + 1)
pretrained_weights['state_dict']['roi_head.bbox_head.fc_reg.weight'].resize_(num_class * 4, 1024)
pretrained_weights['state_dict']['roi_head.bbox_head.fc_reg.bias'].resize_(num_class * 4)
pretrained_weights['state_dict']['roi_head.mask_head.conv_logits.weight'].resize_(num_class, 256, 1, 1)
pretrained_weights['state_dict']['roi_head.mask_head.conv_logits.bias'].resize_(num_class)

torch.save(pretrained_weights, "{}/mask_rcnn_swin_{}.pth".format(model_save_dir, num_class))
#torch.save(pretrained_weights, "{}/cascade_mask_rcnn_swin_{}.pth".format(model_save_dir, num_class))

修改configs_base_\models\mask_rcnn_swin_fpn.py中num_classes 两个地方改为具体类别数这里是35
修改configs_base_\default_runtime.py中interval,load_from

\checkpoint_config = dict(interval=1) # 每训练一个epoch，保存一次权重
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
custom_hooks = [dict(type='NumClassCheckHook')]

dist_params = dict(backend='nccl')
log_level = 'INFO'
#load_from = None
load_from = "mask_rcnn_swin_35.pth"   #模型文件位置 加载backbone
resume_from = None  #加载继续训练
workflow = [('train', 1)]   #训练模式流程

调整代数和学习率等参数修改configs\swin\mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py中的max_epochs、lr参数
文件改为
coco_detection的base = [
'../base/models/mask_rcnn_swin_fpn.py',
'../base/datasets/coco_detection.py',
'../base/schedules/schedule_1x.py',
'../base/default_runtime.py']

data = dict(train=dict(pipeline=train_pipeline))

optimizer = dict(_delete_=True, type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05,
                 paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
                                                 'relative_position_bias_table': dict(decay_mult=0.),
                                                 'norm': dict(decay_mult=0.)}))
lr_config = dict(step=[27, 33])
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
#不用fp16则注释掉下面
# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

修改mmdet/core/evalution/class_names.py和mmdet/datasets/coco.py中的标签


    CLASSES = ('hat',
        'person',
        'hand',
        'insulating_gloves',
        'workclothes_clothes',
        'workclothes_trousers',
        'winter_clothes',
        'winter_trousers',
        'vest',
        'noworkclothes_clothes',
        'noworkclothes_trousers',
        'roll_workclothes',
        'roll_shirts',
        'roll_noworkclothes',
        'shorts',
        'safteybelt',
        'work_men',
        'stranger_men',
        'down',
        'smoking',
        'big_smoking',
        'height',
        'noheight',
        'holes',
        'fence',
        'oxygen_vertically',
        'oxygen_horizontally',

def coco_classes():
    return [
        'hat',
        'person',
        'hand',
        'insulating_gloves',
        'workclothes_clothes',
        'workclothes_trousers',
        'winter_clothes',
        'winter_trousers',
        'vest',
        'noworkclothes_clothes',
        'noworkclothes_trousers',
        'roll_workclothes',
        'roll_shirts',
        'roll_noworkclothes',
        'shorts',
        'safteybelt',
        'work_men',
        'stranger_men',
        'down',
        'smoking',
        'big_smoking',
        'height',
        'noheight',
        'holes',
        'fence',
        'oxygen_vertically',
        'oxygen_horizontally',
        'single_ladder',
        'double_ladder',
        'fire',
        'gas_tank',
        'extinguisher',
        'groundrod',
        'big_smoking',
        'bottle'
    ]

这里修改完需要编译python setup.py install
不然会出现“AssertionError: The num_classes (20) in Shared2FCBBoxHead of MMDataParallel does not matches the length of CLASSES 80) in RepeatDataset"的报错

训练

所有修改完后可以开始训练python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py
使用编号为3的单个gpu训练
python ./tools/train.py configs/swin/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py --gpu-ids 3
使用多gpu训练
tools/dist_train.sh configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 4
训练Log及权重保存在"Swin-Transformer-Object-Detection-master/work_dirs/"中
测试

python tools/test.py configs/swin/mask_rcnn_swin_small_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py mask_rcnn_swin_small_patch4_window7.pth --eval segm

输出demo，输出为cls,x1,y1,x2,y2的txt格式

from argparse import ArgumentParser
from mmdet.apis import inference_detector, init_detector
import numpy as np
import os
from tqdm import tqdm

def main():
    parser = ArgumentParser()
    parser.add_argument('--img-path', default='/data/wj/test/',help='Image file')
    parser.add_argument('--config', default='../work_dirs/cascade_rcnn_x101_64x4d_fpn_20e_coco/cascade_rcnn_x101_64x4d_fpn_20e_coco.py' ,help='Config file')
    parser.add_argument('--checkpoint', default='../work_dirs/cascade_rcnn_x101_64x4d_fpn_20e_coco/latest.pth', help='Checkpoint file')
    parser.add_argument(
        '--device', default='cuda:0', help='Device used for inference')
    parser.add_argument(
        '--score-thr', type=float, default=0.3, help='bbox score threshold')
    args = parser.parse_args()
    imgs_path = args.img_path
    save_path = '../output/'

    # build the model from a config file and a checkpoint file
    model = init_detector(args.config, args.checkpoint, device=args.device)
    for img_path in tqdm(os.listdir(imgs_path)):
        img = os.path.join(imgs_path, img_path)
        result = inference_detector(model, img)
        bboxes = np.vstack(result)
        labels = [
            np.full(bbox.shape[0], i, dtype=np.int32)
            for i, bbox in enumerate(result)
        ]
        labels = np.concatenate(labels)
        score_thr = args.score_thr
        if score_thr > 0:
            assert bboxes.shape[1] == 5
            scores = bboxes[:, -1]
            inds = scores > score_thr
            bboxes = bboxes[inds, :]
            labels = labels[inds]
        if len(bboxes) == 0:
            txt_path = os.path.join(save_path, '{}.txt'.format(img_path.split('.')[0]))
            with open(txt_path, 'w') as f:
                f.write("")
        for i, (bbox, label) in enumerate(zip(bboxes, labels)):
            bbox_int = bbox.astype(np.int32)
            x1, y1, x2, y2, conf = bbox_int
            txt_path = os.path.join(save_path, '{}.txt'.format(img_path.split('.')[0]))
            with open(txt_path, 'a') as f:
                f.write("{} {} {} {} {}\n".format(label, x1, y1, x2, y2))

常见问题在使用mmdetection2.0框架训练目标检测模型时候，出现IndexError: list index out of range错误这很有可能是class数目的问题mmdetection/mmdet/datasets/coco.py：中的CLASSED变量对应的类别是否正确

mmdetection/mmdet/core/evaluation/class_names.py：coco_classes()函数返回的类别是否正确

mmdetection/configs/base/models/mask_rcnn_swin_fpn.py：中num_classes对应的类别数是否正确总之要反复查看，最后要重新编译，这常被网上的介绍忽略了。

参考
https://mp.weixin.qq.com/s?__biz=MzU4NTY4Mzg1Mw==&mid=2247503352&idx=1&sn=577b46fbb1d63f4c7dc322184be80ae0&chksm=fd844b1acaf3c20ca041e00028cb3c32143b743720b63a68704f9408548ce9509e91fc64dce8&token=523858561&lang=zh_CN#rd

最后编辑于：2022.01.05 15:51:14

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 216,997评论 6赞 502
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 92,603评论 3赞 392
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 163,359评论 0赞 353
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,309评论 1赞 292
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 67,346评论 6赞 390
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,258评论 1赞 300
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,122评论 3赞 418
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 38,970评论 0赞 275
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,403评论 1赞 313
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,596评论 3赞 334
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 39,769评论 1赞 348
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,464评论 5赞 344
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,075评论 3赞 327
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,705评论 0赞 22
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 32,848评论 1赞 269
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 47,831评论 2赞 370
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,678评论 2赞 354

拥抱transformer，和变形金刚战斗到底

部署训练STOD

环境部署

训练自己的数据集

训练

推荐阅读更多精彩内容