基于mmaction2的TimeSformer训练somethingv2数据集和自定义数据

mmaction2 部署

这里先在windows上部署测试
conda create -n mmaction2 --clone openmmlab
pip install -r requirements/build.txt
pip install -v -e .
注意mmcv-full 版本小于1.4.2
测试

import torch
from mmaction.apis import init_recognizer, inference_recognizer

config_file = 'configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py'
device = 'cuda:0' # or 'cpu'
device = torch.device(device)

model = init_recognizer(config_file, device=device)
# inference the demo video
inference_recognizer(model, 'demo/demo.mp4')

数据集准备

Something数据集是一个大型的带有标签的记录了人类与日常生活中的一些物体之间的动作数据集，动作的类别共174类something-V1和something-V2的主要区别就是V2的视频数量更多了，从V1的108,499增加到了220,847。v2链接：https://pan.baidu.com/s/1NCqL7JVoFZO6D131zGls-A
提取码：07ka
对数据及的划分的代码的话推荐TSM的作者放出来的划分代码，可以轻松根据原始的csv文件把数据集划分成训练、验证、以及测试数据集
https://github.com/mit-han-lab/temporal-shift-module/tree/master/tools
解压拼接数据集
cat 20bn-something-something-v2-?? | tar zx
安装ffmpeg
下载到本地
wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz
解压
tar -xvf ffmpeg-git-amd64-static.tar.xz
cd ffmpeg-git-20220302-amd64-static/

image.png

用代码来批量将视频转为数据帧
里面调用了ffmpeg，可以在cmd那里修改参数

from __future__ import print_function, division
import os
import sys
import subprocess

def class_process(dir_path, dst_dir_path):
  class_path = dir_path
  if not os.path.isdir(class_path):
    return

  dst_class_path = dst_dir_path
  if not os.path.exists(dst_class_path):
    os.mkdir(dst_class_path)

  for file_name in os.listdir(class_path):
    if '.webm' not in file_name:
      continue
    name, ext = os.path.splitext(file_name)
    dst_directory_path = os.path.join(dst_class_path, name)

    video_file_path = os.path.join(class_path, file_name)
    try:
      if os.path.exists(dst_directory_path):
        if not os.path.exists(os.path.join(dst_directory_path, '000001.jpg')):
          subprocess.call('rm -r \"{}\"'.format(dst_directory_path), shell=True)
          print('remove {}'.format(dst_directory_path))
          os.mkdir(dst_directory_path)
        else:
          continue
      else:
        os.mkdir(dst_directory_path)
    except:
      print(dst_directory_path)
      continue
#调用ffmpeg工具进行分视频帧
    cmd = 'ffmpeg -i \"{}\" -vf scale=-1:240 \"{}/%06d.jpg\"'.format(video_file_path, dst_directory_path)
    print(cmd)
#运行脚本
    subprocess.call(cmd, shell=True)
    print('\n')

if __name__=="__main__":
  print ("HELLO")
  dir_path = sys.argv[1]
  dst_dir_path = sys.argv[2]

  count=0
  for class_name in os.listdir(dir_path):
    print (count)
    count=count+1
    class_process(dir_path, dst_dir_path)

python video_jpg_ucf101_hmdb51.py /mnt/e/BaiduNetdiskDownload/somethingV2/20bn-something-something-v2/ /mnt/e/workspace/mmaction2/data/somethingv2/
这个跑了有4、5天大概

sthv2 数据训练

_base_ = ['../../_base_/default_runtime.py']

# model settings
model = dict(
    type='Recognizer3D',
    backbone=dict(
        type='TimeSformer',
        pretrained=  # noqa: E251
        'https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth',  # noqa: E501
        num_frames=8,
        img_size=224,
        patch_size=16,
        embed_dims=768,
        in_channels=3,
        dropout_ratio=0.,
        transformer_layers=None,
        attention_type='divided_space_time',
        norm_cfg=dict(type='LN', eps=1e-6)),
    cls_head=dict(type='TimeSformerHead', num_classes=174, in_channels=768),
    # model training and testing settings
    train_cfg=None,
    test_cfg=dict(average_clips='prob'))

# dataset settings
#直接使用视频格式
dataset_type = 'VideoDataset'
data_root = 'data/sthv2/videos'
data_root_val = 'data/sthv2/videos'
ann_file_train = 'data/sthv2/sthv2_train_list_videos.txt'
ann_file_val = 'data/sthv2/sthv2_val_list_videos.txt'
ann_file_test = 'data/sthv2/sthv2_val_list_videos.txt'
#转为数据帧格式
#dataset_type = 'RawframeDataset'
#data_root = 'data/sthv2/rawframes'
#data_root_val = 'data/sthv2/rawframes'
#ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt'
#ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt'
#ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt'



img_norm_cfg = dict(
    mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_bgr=False)

train_pipeline = [
    dict(type='DecordInit'),   #由于是视频，需要先加编解码
    dict(type='SampleFrames', clip_len=8, frame_interval=30, num_clips=1), #数据帧采样，表示沿时序维度方向，以间隔为8帧的方式，采集30帧图像。
 #num_clips = N 相当于对一个视频进行 N 次 sample、测试，将结果 ensemble，如1表示采集一个clip（可以简单理解为batch_size=1，后续其会被覆盖）
    dict(type='DecordDecode'), #视频数据需要解码
    dict(type='RandomRescale', scale_range=(256, 320)),
    dict(type='RandomCrop', size=224),
    dict(type='Flip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'), # 调整输出形状
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), # 统一数据格式
    dict(type='ToTensor', keys=['imgs', 'label'])# 转换为pytorch需要的Tensor数组
]
val_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=8,
        frame_interval=30,
        num_clips=1,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
test_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=8,
        frame_interval=30,
        num_clips=1,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 224)),
    dict(type='ThreeCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
data = dict(
    videos_per_gpu=2,#每个GPU加载2个视频数据,可以理解为batch_size
    workers_per_gpu=2, #每个GPU分配2个线程
    test_dataloader=dict(videos_per_gpu=1),
# 指定训练，验证，测试数据集路径文件夹配置
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))
#评估指标
evaluation = dict(
    interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'])

# optimizer，模型训练的优化器
optimizer = dict(
    type='SGD',
    lr=0.005/8/4,
    momentum=0.9,
    paramwise_cfg=dict(
        custom_keys={ #冻结骨架的偏执，即不是训练backbone
            '.backbone.cls_token': dict(decay_mult=0.0),
            '.backbone.pos_embed': dict(decay_mult=0.0),
            '.backbone.time_embed': dict(decay_mult=0.0)
        }),
    weight_decay=1e-4,
    nesterov=True)  # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))

 
# learning policy，学习率的衰减策略
# lr_config = dict(policy='CosineAnealing', min_lr=0)
lr_config = dict(policy='step', step=[5,10])
total_epochs = 15

# runtime settings
checkpoint_config = dict(interval=1)  #多少代间隔保存一次模型
work_dir = './work_dirs/timesformer_divST_8x32x1_ssv2'

学习率根据GPU个数和batch大小改变，
原来是8个GPU * batchsize = 8 现在是1个GPU * batchsize=2，
lr=0.005/8/4

这里先使用的是video格式数据，直接训练
从头训练验证
python tools/train.py configs/recognition/timesformer/timesformer_divST_8x32x1_ssv2.py --work-dir work_dirs/timesformer_divST_8x32x1_ssv2 --gpus 0
如果在windows 无gpu情况要设置gpu为0

也可加随机数
python tools/train.py configs/recognition/timesformer/timesformer_divST_8x32x1_ssv2.py --work-dir work_dirs/timesformer_divST_8x32x1_ssv2
--validate --seed 0 --deterministic

断点续练
python tools/train.py work_dirs/timesformer_divST_8x32x1_ssv2/timesformer_divST_8x32x1_ssv2.py --work-dir work_dirs/timesformer_divST_8x32x1_ssv2 --gpus 0 --resume-from work_dirs/timesformer_divST_8x32x1_ssv2/epoch_9.pth

验证测试
python tools/test.py configs/recognition/timesformer/timesformer_divST_8x32x1_ssv2.py work_dirs/timesformer_divST_8x32x1_ssv2/epoch_6.pth --eval top_k_accuracy mean_class_accuracy --out result6.json
保存结果

image.png

调用摄像头实时推理
python .\demo\webcam_demo.py .\work_dirs\timesformer_divST_8x32x1_ssv2\timesformer_divST_8x32x1_ssv2.py .\work_dirs\timesformer_divST_8x32x1_ssv2\epoch_15.pth .\tools\data\sthv2\label_map.txt --average-size 5 --threshold 0.2

自定义数据集

以tiny数据集为例，这里就两个类，训练30个视频，开发测试10个视频，也是先使用视频直接训练
tsn 测试

import os.path as osp

from mmaction.datasets import build_dataset
from mmaction.models import build_model
from mmaction.apis import train_model

import mmcv

from mmcv import Config
cfg = Config.fromfile('./configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py')

from mmcv.runner import set_random_seed

# Modify dataset type and path
cfg.dataset_type = 'VideoDataset'
cfg.data_root = 'data/kinetics400_tiny/train/'
cfg.data_root_val = 'data/kinetics400_tiny/val/'
cfg.ann_file_train = 'data/kinetics400_tiny/kinetics_tiny_train_video.txt'
cfg.ann_file_val = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'
cfg.ann_file_test = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'

#cfg.data.videos_per_gpu=1
#cfg.data.workers_per_gpu=1
cfg.data.test.type = 'VideoDataset'
cfg.data.test.ann_file = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'
cfg.data.test.data_prefix = 'data/kinetics400_tiny/val/'

cfg.data.train.type = 'VideoDataset'
cfg.data.train.ann_file = 'data/kinetics400_tiny/kinetics_tiny_train_video.txt'
cfg.data.train.data_prefix = 'data/kinetics400_tiny/train/'

cfg.data.val.type = 'VideoDataset'
cfg.data.val.ann_file = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'
cfg.data.val.data_prefix = 'data/kinetics400_tiny/val/'

# The flag is used to determine whether it is omnisource training
cfg.setdefault('omnisource', False)
# Modify num classes of the model in cls_head
cfg.model.cls_head.num_classes = 2
# We can use the pre-trained TSN model
cfg.load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'

# Set up working dir to save files and logs.
cfg.work_dir = './test'

# The original learning rate (LR) is set for 8-GPU training.
# We divide it by 8 since we only use one GPU.
cfg.data.videos_per_gpu = cfg.data.videos_per_gpu // 16
cfg.optimizer.lr = cfg.optimizer.lr / 8 / 16
cfg.total_epochs = 10

# We can set the checkpoint saving interval to reduce the storage cost
cfg.checkpoint_config.interval = 5
# We can set the log print interval to reduce the the times of printing log
cfg.log_config.interval = 5

# Set seed thus the results are more reproducible
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)

# Save the best
cfg.evaluation.save_best='auto'

# Build the dataset
datasets = [build_dataset(cfg.data.train)]

# Build the recognizer
model = build_model(cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))

# Create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_model(model, datasets, cfg, distributed=False, validate=True)


from mmaction.apis import single_gpu_test
from mmaction.datasets import build_dataloader
from mmcv.parallel import MMDataParallel

# Build a test dataloader
dataset = build_dataset(cfg.data.test, dict(test_mode=True))
data_loader = build_dataloader(
        dataset,
        videos_per_gpu=1,
        workers_per_gpu=cfg.data.workers_per_gpu,
        dist=False,
        shuffle=False)
model = MMDataParallel(model, device_ids=[0])
outputs = single_gpu_test(model, data_loader)

eval_config = cfg.evaluation
eval_config.pop('interval')
eval_res = dataset.evaluate(outputs, **eval_config)
for name, val in eval_res.items():
    print(f'{name}: {val:.04f}')

同样根据 GPU个数和videos_per_gpu 数修改lr
主要跑通测试tsn

接着才是重点，要用TimeSformer训练

timesformer_divST_8x32x1_15e_kinetics_tiny.py

_base_ = ['../../_base_/runtimetiny.py']

# model settings
model = dict(
    type='Recognizer3D',
    backbone=dict(
        type='TimeSformer',
        pretrained=  # noqa: E251
        'https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth',  # noqa: E501
        num_frames=8,
        img_size=224,
        patch_size=16,
        embed_dims=768,
        in_channels=3,
        dropout_ratio=0.,
        transformer_layers=None,
        attention_type='divided_space_time',
        norm_cfg=dict(type='LN', eps=1e-6)),
    cls_head=dict(type='TimeSformerHead', num_classes=2, in_channels=768),
    # model training and testing settings
    train_cfg=None,
    test_cfg=dict(average_clips='prob'))

# dataset settings
dataset_type = 'VideoDataset'
data_root = 'data/kinetics400_tiny/train'
data_root_val = 'data/kinetics400_tiny/val'
ann_file_train = 'data/kinetics400_tiny/kinetics_tiny_train_video.txt'
ann_file_val = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'
ann_file_test = 'data/kinetics400_tiny/kinetics_tiny_val_video.txt'

img_norm_cfg = dict(
    mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_bgr=False)

train_pipeline = [
    dict(type='DecordInit'),
    dict(type='SampleFrames', clip_len=8, frame_interval=32, num_clips=1),
    dict(type='DecordDecode'),
    dict(type='RandomRescale', scale_range=(256, 320)),
    dict(type='RandomCrop', size=224),
    dict(type='Flip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=8,
        frame_interval=32,
        num_clips=1,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
test_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=8,
        frame_interval=32,
        num_clips=1,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 224)),
    dict(type='ThreeCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
data = dict(
    videos_per_gpu=2,
    workers_per_gpu=2,
    test_dataloader=dict(videos_per_gpu=1),
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))

evaluation = dict(
    interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'])

# optimizer
optimizer = dict(
    type='SGD',
    lr=0.005/8,
    momentum=0.9,
    paramwise_cfg=dict(
        custom_keys={
            '.backbone.cls_token': dict(decay_mult=0.0),
            '.backbone.pos_embed': dict(decay_mult=0.0),
            '.backbone.time_embed': dict(decay_mult=0.0)
        }),
    weight_decay=1e-4,
    nesterov=True)  # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))

# learning policy
lr_config = dict(policy='step', step=[5, 8])
total_epochs = 10

# runtime settings
checkpoint_config = dict(interval=1)
work_dir = './work_dirs/timesformer_divST_8x32x1_15e_kinetics_tiny'

python tools/train.py configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics_tiny.py --gpus 0

推理测试
cat tinyinfer.py
from mmaction.apis import inference_recognizer, init_recognizer
import os

# Choose to use a config and initialize the recognizer
config = 'configs/recognition/timesformer/timesformer_divST_8x32x1_15e_kinetics_tiny.py'
# Setup a checkpoint file to load
checkpoint = 'work_dirs/timesformer_divST_8x32x1_15e_kinetics_tiny/epoch_10.pth'
# Initialize the recognizer
model = init_recognizer(config, checkpoint, device='cuda:0')
# Use the recognizer to do inference


label = 'tools/data/kinetics/label_map_k2.txt'
labels = open(label).readlines()
labels = [x.strip() for x in labels]

path = 'data/kinetics400_tiny/val'  #  
for root, dirs, names in os.walk(path):
    for name in names:
        ext = os.path.splitext(name)[1]  #  
        if ext == '.mp4':
            video = os.path.join(root, name)
            results = inference_recognizer(model, video)

            #labels = open(label).readlines()
            #labels = [x.strip() for x in labels]
            results = [(labels[k[0]], k[1]) for k in results]
            print(name)
            for result in results:
                print(f'{result[0]}: ', result[1])

自己定义的标签文件，0为爬绳，1 为吹玻璃
cat tools/data/kinetics/label_map_k2.txt
climbing a rope
blowing glass

这里是输出每个视频对两个类别的预测概率

image.png

日志
python tools/analysis/analyze_logs.py plot_curve work_dirs/timesformer_divST_8x32x1_15e_kinetics_tiny/20220403_010309.log.json --keys top1_acc --out acc1.pdf

image.png

日志分析

root@83c3d6970b59:/workspace# python tools/analysis/analyze_logs.py cal_train_time work_dirs/timesformer_divST_8x32x1_15e_kinetics_tiny/20220403_010309.log.json
-----Analyze train time of work_dirs/timesformer_divST_8x32x1_15e_kinetics_tiny/20220403_010309.log.json-----
slowest epoch 5, average time is 0.8540
fastest epoch 4, average time is 0.8354
time std over epochs is 0.0063
average iter time: 0.8425 s/iter

模型复杂度分析

/tools/analysis/get_flops.py 是根据 flops-counter.pytorch 库改编的脚本，用于计算输入变量指定模型的 FLOPs 和参数量。

python tools/analysis/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]

其他模型部署相关

模型转换

导出 MMAction2 模型为 ONNX 格式（实验特性）
/tools/deployment/pytorch2onnx.py 脚本用于将模型转换为 ONNX 格式。同时，该脚本支持比较 PyTorch 模型和 ONNX 模型的输出结果，验证输出结果是否相同。本功能依赖于 onnx 以及 onnxruntime，使用前请先通过 pip install onnx onnxruntime 安装依赖包。请注意，可通过 --softmax 选项在行为识别器末尾添加 Softmax 层，从而获取 [0, 1] 范围内的预测结果。

对于行为识别模型，请运行：

python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --shape $SHAPE --verify

对于时序动作检测模型，请运行：

python tools/deployment/pytorch2onnx.py $CONFIG_PATH $CHECKPOINT_PATH --is-localizer --shape $SHAPE --verify

发布模型
tools/deployment/publish_model.py 脚本用于进行模型发布前的准备工作，主要包括：
(1) 将模型的权重张量转化为 CPU 张量。 (2) 删除优化器状态信息。 (3) 计算模型权重文件的哈希值，并将哈希值添加到文件名后。

python tools/deployment/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}

例如,

python tools/deployment/publish_model.py work_dirs/tsn_r50_1x1x3_100e_kinetics400_rgb/latest.pth tsn_r50_1x1x3_100e_kinetics400_rgb.pth

最终，输出文件名为 tsn_r50_1x1x3_100e_kinetics400_rgb-{hash id}.pth。

5- 指标评价

tools/analysis/eval_metric.py 脚本通过输入变量指定配置文件，以及对应的结果存储文件，计算某一评价指标。

结果存储文件通过 tools/test.py 脚本（通过参数 --out ${RESULT_FILE} 指定）生成，保存了指定模型在指定数据集中的预测结果。

python tools/analysis/eval_metric.py ${CONFIG_FILE} ${RESULT_FILE} [--eval ${EVAL_METRICS}] [--cfg-options ${CFG_OPTIONS}] [--eval-options ${EVAL_OPTIONS}]

6- 打印完整配置

tools/analysis/print_config.py 脚本会解析所有输入变量，并打印完整配置信息。

python tools/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}]

检查视频
tools/analysis/check_videos.py 脚本利用指定视频编码器，遍历指定配置文件视频数据集中所有样本，寻找无效视频文件（文件破损或者文件不存在），并将无效文件路径保存到输出文件中。请注意，删除无效视频文件后，需要重新生成视频文件列表。

python tools/analysis/check_videos.py ${CONFIG} [-h] [--options OPTIONS [OPTIONS ...]] [--cfg-options CFG_OPTIONS [CFG_OPTIONS ...]] [--output-file OUTPUT_FILE] [--split SPLIT] [--decoder ]