cast的解读
cast本意是映射,学过c++的应该都知道各种的cast,其实就是数据类型转换。
我们还是直接上代码,构造函数:
Cast::Cast()
{
one_blob_only = true;
support_inplace = false;
support_packing = true;
}
这次发现一个之前没遇到的support_packing,按照常识来理解应该是将数据打包后进行计算,这一点我们具体看代码再来验证。
- 单输入单输出
- 不支持就地运算
- 支持packing
再来看参数加载函数:
int Cast::load_param(const ParamDict& pd)
{
type_from = pd.get(0, 0);
type_to = pd.get(1, 0);
return 0;
}
两个参数:
- type_from:原始类型
- type_to:新类型
up主在头文件里面给出了注释:
// element type
// 0 = auto
// 1 = float32
// 2 = float16
// 3 = int8
// 4 = bfloat16
推理函数,还是分析过程写在注释里:
int Cast::forward(const Mat& bottom_blob, Mat& top_blob, const Option& opt) const
{
//如果原始类型等于新类型则直接赋值
if (type_from == type_to)
{
top_blob = bottom_blob;
return 0;
}
int w = bottom_blob.w;
int h = bottom_blob.h;
int channels = bottom_blob.c;
int dims = bottom_blob.dims;
size_t elemsize = bottom_blob.elemsize;
int elempack = bottom_blob.elempack;
size_t out_elemsize = elemsize;
//根据新类型来确定输出矩阵元素的size,下面程序显然是以字节为单位,字节数乘打包字节个数就等于输出矩阵元素的size
if (type_to == 1)
{
// float32
out_elemsize = 4 * elempack;
}
else if (type_to == 2)
{
// float16
out_elemsize = 2 * elempack;
}
else if (type_to == 3)
{
// int8
out_elemsize = elempack;
}
else if (type_to == 4)
{
// bfloat16
out_elemsize = 2 * elempack;
}
//为输出矩阵分配内存空间
if (dims == 1)
{
top_blob.create(w, out_elemsize, elempack, opt.blob_allocator);
}
else if (dims == 2)
{
top_blob.create(w, h, out_elemsize, elempack, opt.blob_allocator);
}
else if (dims == 3)
{
top_blob.create(w, h, channels, out_elemsize, elempack, opt.blob_allocator);
}
if (top_blob.empty())
return -100;
int size = w * h * elempack;
//float32 -> float16
if (type_from == 1 && type_to == 2)
{
//openmp指令,用于多线程
#pragma omp parallel for num_threads(opt.num_threads)
for (int q = 0; q < channels; q++)
{
const float* ptr = bottom_blob.channel(q);
unsigned short* outptr = top_blob.channel(q);
for (int i = 0; i < size; i++)
{
outptr[i] = float32_to_float16(ptr[i]);
}
}
}
//float16 -> float32
if (type_from == 2 && type_to == 1)
{
//openmp指令,用于多线程
#pragma omp parallel for num_threads(opt.num_threads)
for (int q = 0; q < channels; q++)
{
const unsigned short* ptr = bottom_blob.channel(q);
float* outptr = top_blob.channel(q);
for (int i = 0; i < size; i++)
{
outptr[i] = float16_to_float32(ptr[i]);
}
}
}
//int8 -> float32
if (type_from == 3 && type_to == 1)
{
//openmp指令,用于多线程
#pragma omp parallel for num_threads(opt.num_threads)
for (int q = 0; q < channels; q++)
{
const signed char* ptr = bottom_blob.channel(q);
float* outptr = top_blob.channel(q);
for (int i = 0; i < size; i++)
{
outptr[i] = (float)ptr[i];
}
}
}
//float32 -> bfloat16
if (type_from == 1 && type_to == 4)
{
//openmp指令,用于多线程
#pragma omp parallel for num_threads(opt.num_threads)
for (int q = 0; q < channels; q++)
{
const float* ptr = bottom_blob.channel(q);
unsigned short* outptr = top_blob.channel(q);
for (int i = 0; i < size; i++)
{
outptr[i] = float32_to_bfloat16(ptr[i]);
}
}
}
//bfloat16 -> float32
if (type_from == 4 && type_to == 1)
{
//openmp指令,用于多线程
#pragma omp parallel for num_threads(opt.num_threads)
for (int q = 0; q < channels; q++)
{
const unsigned short* ptr = bottom_blob.channel(q);
float* outptr = top_blob.channel(q);
for (int i = 0; i < size; i++)
{
outptr[i] = bfloat16_to_float32(ptr[i]);
}
}
}
// TODO more cast type
return 0;
}
总体来说cast运算还是很简单的,就是个类型转换。当然这里的类型转换的实现细节并没有说,也不是这个系列文章的解读重点。
下面是pr内容:
cast
y = cast(x)
- one_blob_only
- support_packing
param id | name | type | default |
---|---|---|---|
0 | type_from | int | 0 |
1 | type_to | int | 0 |
Element type:
- 0 = auto
- 1 = float32
- 2 = float16
- 3 = int8
- 4 = bfloat16