2021-03-08 Speech-Transformer项目模型参数统计

读入已训练好并保存的模型

pthfile = r'/home/user1/Project/Speech-Transformer/egs/aishell/exp/train_m1_n6_in80_elayer6_head8_k64_v64_model512_inner2048_drop0.1_pe5000_emb512_dlayer6_share1_ls0.1_epoch150_shuffle1_bs16_bf30000_mli800_mlo150_k0.2_warm4000/final.pth.tar'  # 
net = torch.load(pthfile, map_location=torch.device('cpu'))  # 由于模型原本是用GPU保存的,如果电脑上没有GPU,可以转化到CPU上

print(type(net))  # 类型是 dict
print(len(net))   # 长度为 22,即存在22个 key-value 键值对
执行结果

列出所有键值

for k in net.keys():
    print(k) 
# 22个key
# LFR_m  LFR_n    d_input    n_layers_enc    n_head    d_k    d_v
# d_model    d_inner    dropout    pe_maxlen    sos_id    eos_id
# vocab_size    d_word_vec    n_layers_dec    tgt_emb_prj_weight_sharing
# state_dict    optim_dict    epoch    tr_loss    cv_loss

打印出当前保存的epoch数

print(net["epoch"])
>>> 118

统计模型参数量方法(一)

统计state_dict内包含的参数数量

psum = 0
for key, value in net["state_dict"].items():
    print(key)
    print(value.size())
    print(value.numel())
    psum += value.numel()
    # print(key, value.size(), sep=" ")
print(psum)

输出结果:

encoder.linear_in.weight
torch.Size([512, 80])
40960
encoder.linear_in.bias
torch.Size([512])
512
encoder.layer_norm_in.weight
torch.Size([512])
512
encoder.layer_norm_in.bias
torch.Size([512])
512
encoder.positional_encoding.pe
torch.Size([1, 5000, 512])
2560000
encoder.layer_stack.0.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.0.slf_attn.w_qs.bias
torch.Size([512])
512
encoder.layer_stack.0.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
encoder.layer_stack.0.slf_attn.w_ks.bias
torch.Size([512])
512
encoder.layer_stack.0.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.0.slf_attn.w_vs.bias
torch.Size([512])
512
encoder.layer_stack.0.slf_attn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.0.slf_attn.layer_norm.bias
torch.Size([512])
512
encoder.layer_stack.0.slf_attn.fc.weight
torch.Size([512, 512])
262144
encoder.layer_stack.0.slf_attn.fc.bias
torch.Size([512])
512
encoder.layer_stack.0.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
encoder.layer_stack.0.pos_ffn.w_1.bias
torch.Size([2048])
2048
encoder.layer_stack.0.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
encoder.layer_stack.0.pos_ffn.w_2.bias
torch.Size([512])
512
encoder.layer_stack.0.pos_ffn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.0.pos_ffn.layer_norm.bias
torch.Size([512])
512
encoder.layer_stack.1.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.1.slf_attn.w_qs.bias
torch.Size([512])
512
encoder.layer_stack.1.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
encoder.layer_stack.1.slf_attn.w_ks.bias
torch.Size([512])
512
encoder.layer_stack.1.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.1.slf_attn.w_vs.bias
torch.Size([512])
512
encoder.layer_stack.1.slf_attn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.1.slf_attn.layer_norm.bias
torch.Size([512])
512
encoder.layer_stack.1.slf_attn.fc.weight
torch.Size([512, 512])
262144
encoder.layer_stack.1.slf_attn.fc.bias
torch.Size([512])
512
encoder.layer_stack.1.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
encoder.layer_stack.1.pos_ffn.w_1.bias
torch.Size([2048])
2048
encoder.layer_stack.1.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
encoder.layer_stack.1.pos_ffn.w_2.bias
torch.Size([512])
512
encoder.layer_stack.1.pos_ffn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.1.pos_ffn.layer_norm.bias
torch.Size([512])
512
encoder.layer_stack.2.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.2.slf_attn.w_qs.bias
torch.Size([512])
512
encoder.layer_stack.2.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
encoder.layer_stack.2.slf_attn.w_ks.bias
torch.Size([512])
512
encoder.layer_stack.2.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.2.slf_attn.w_vs.bias
torch.Size([512])
512
encoder.layer_stack.2.slf_attn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.2.slf_attn.layer_norm.bias
torch.Size([512])
512
encoder.layer_stack.2.slf_attn.fc.weight
torch.Size([512, 512])
262144
encoder.layer_stack.2.slf_attn.fc.bias
torch.Size([512])
512
encoder.layer_stack.2.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
encoder.layer_stack.2.pos_ffn.w_1.bias
torch.Size([2048])
2048
encoder.layer_stack.2.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
encoder.layer_stack.2.pos_ffn.w_2.bias
torch.Size([512])
512
encoder.layer_stack.2.pos_ffn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.2.pos_ffn.layer_norm.bias
torch.Size([512])
512
encoder.layer_stack.3.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.3.slf_attn.w_qs.bias
torch.Size([512])
512
encoder.layer_stack.3.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
encoder.layer_stack.3.slf_attn.w_ks.bias
torch.Size([512])
512
encoder.layer_stack.3.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.3.slf_attn.w_vs.bias
torch.Size([512])
512
encoder.layer_stack.3.slf_attn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.3.slf_attn.layer_norm.bias
torch.Size([512])
512
encoder.layer_stack.3.slf_attn.fc.weight
torch.Size([512, 512])
262144
encoder.layer_stack.3.slf_attn.fc.bias
torch.Size([512])
512
encoder.layer_stack.3.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
encoder.layer_stack.3.pos_ffn.w_1.bias
torch.Size([2048])
2048
encoder.layer_stack.3.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
encoder.layer_stack.3.pos_ffn.w_2.bias
torch.Size([512])
512
encoder.layer_stack.3.pos_ffn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.3.pos_ffn.layer_norm.bias
torch.Size([512])
512
encoder.layer_stack.4.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.4.slf_attn.w_qs.bias
torch.Size([512])
512
encoder.layer_stack.4.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
encoder.layer_stack.4.slf_attn.w_ks.bias
torch.Size([512])
512
encoder.layer_stack.4.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.4.slf_attn.w_vs.bias
torch.Size([512])
512
encoder.layer_stack.4.slf_attn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.4.slf_attn.layer_norm.bias
torch.Size([512])
512
encoder.layer_stack.4.slf_attn.fc.weight
torch.Size([512, 512])
262144
encoder.layer_stack.4.slf_attn.fc.bias
torch.Size([512])
512
encoder.layer_stack.4.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
encoder.layer_stack.4.pos_ffn.w_1.bias
torch.Size([2048])
2048
encoder.layer_stack.4.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
encoder.layer_stack.4.pos_ffn.w_2.bias
torch.Size([512])
512
encoder.layer_stack.4.pos_ffn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.4.pos_ffn.layer_norm.bias
torch.Size([512])
512
encoder.layer_stack.5.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.5.slf_attn.w_qs.bias
torch.Size([512])
512
encoder.layer_stack.5.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
encoder.layer_stack.5.slf_attn.w_ks.bias
torch.Size([512])
512
encoder.layer_stack.5.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
encoder.layer_stack.5.slf_attn.w_vs.bias
torch.Size([512])
512
encoder.layer_stack.5.slf_attn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.5.slf_attn.layer_norm.bias
torch.Size([512])
512
encoder.layer_stack.5.slf_attn.fc.weight
torch.Size([512, 512])
262144
encoder.layer_stack.5.slf_attn.fc.bias
torch.Size([512])
512
encoder.layer_stack.5.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
encoder.layer_stack.5.pos_ffn.w_1.bias
torch.Size([2048])
2048
encoder.layer_stack.5.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
encoder.layer_stack.5.pos_ffn.w_2.bias
torch.Size([512])
512
encoder.layer_stack.5.pos_ffn.layer_norm.weight
torch.Size([512])
512
encoder.layer_stack.5.pos_ffn.layer_norm.bias
torch.Size([512])
512
decoder.tgt_word_emb.weight
torch.Size([4233, 512])
2167296
decoder.positional_encoding.pe
torch.Size([1, 5000, 512])
2560000
decoder.layer_stack.0.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.0.slf_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.0.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.0.slf_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.0.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.0.slf_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.0.slf_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.0.slf_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.0.slf_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.0.slf_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.0.enc_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.0.enc_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.0.enc_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.0.enc_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.0.enc_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.0.enc_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.0.enc_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.0.enc_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.0.enc_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.0.enc_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.0.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
decoder.layer_stack.0.pos_ffn.w_1.bias
torch.Size([2048])
2048
decoder.layer_stack.0.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
decoder.layer_stack.0.pos_ffn.w_2.bias
torch.Size([512])
512
decoder.layer_stack.0.pos_ffn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.0.pos_ffn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.1.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.1.slf_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.1.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.1.slf_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.1.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.1.slf_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.1.slf_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.1.slf_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.1.slf_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.1.slf_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.1.enc_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.1.enc_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.1.enc_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.1.enc_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.1.enc_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.1.enc_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.1.enc_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.1.enc_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.1.enc_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.1.enc_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.1.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
decoder.layer_stack.1.pos_ffn.w_1.bias
torch.Size([2048])
2048
decoder.layer_stack.1.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
decoder.layer_stack.1.pos_ffn.w_2.bias
torch.Size([512])
512
decoder.layer_stack.1.pos_ffn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.1.pos_ffn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.2.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.2.slf_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.2.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.2.slf_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.2.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.2.slf_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.2.slf_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.2.slf_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.2.slf_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.2.slf_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.2.enc_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.2.enc_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.2.enc_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.2.enc_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.2.enc_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.2.enc_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.2.enc_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.2.enc_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.2.enc_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.2.enc_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.2.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
decoder.layer_stack.2.pos_ffn.w_1.bias
torch.Size([2048])
2048
decoder.layer_stack.2.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
decoder.layer_stack.2.pos_ffn.w_2.bias
torch.Size([512])
512
decoder.layer_stack.2.pos_ffn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.2.pos_ffn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.3.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.3.slf_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.3.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.3.slf_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.3.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.3.slf_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.3.slf_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.3.slf_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.3.slf_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.3.slf_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.3.enc_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.3.enc_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.3.enc_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.3.enc_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.3.enc_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.3.enc_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.3.enc_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.3.enc_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.3.enc_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.3.enc_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.3.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
decoder.layer_stack.3.pos_ffn.w_1.bias
torch.Size([2048])
2048
decoder.layer_stack.3.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
decoder.layer_stack.3.pos_ffn.w_2.bias
torch.Size([512])
512
decoder.layer_stack.3.pos_ffn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.3.pos_ffn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.4.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.4.slf_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.4.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.4.slf_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.4.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.4.slf_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.4.slf_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.4.slf_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.4.slf_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.4.slf_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.4.enc_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.4.enc_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.4.enc_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.4.enc_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.4.enc_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.4.enc_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.4.enc_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.4.enc_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.4.enc_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.4.enc_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.4.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
decoder.layer_stack.4.pos_ffn.w_1.bias
torch.Size([2048])
2048
decoder.layer_stack.4.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
decoder.layer_stack.4.pos_ffn.w_2.bias
torch.Size([512])
512
decoder.layer_stack.4.pos_ffn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.4.pos_ffn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.5.slf_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.5.slf_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.5.slf_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.5.slf_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.5.slf_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.5.slf_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.5.slf_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.5.slf_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.5.slf_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.5.slf_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.5.enc_attn.w_qs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.5.enc_attn.w_qs.bias
torch.Size([512])
512
decoder.layer_stack.5.enc_attn.w_ks.weight
torch.Size([512, 512])
262144
decoder.layer_stack.5.enc_attn.w_ks.bias
torch.Size([512])
512
decoder.layer_stack.5.enc_attn.w_vs.weight
torch.Size([512, 512])
262144
decoder.layer_stack.5.enc_attn.w_vs.bias
torch.Size([512])
512
decoder.layer_stack.5.enc_attn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.5.enc_attn.layer_norm.bias
torch.Size([512])
512
decoder.layer_stack.5.enc_attn.fc.weight
torch.Size([512, 512])
262144
decoder.layer_stack.5.enc_attn.fc.bias
torch.Size([512])
512
decoder.layer_stack.5.pos_ffn.w_1.weight
torch.Size([2048, 512])
1048576
decoder.layer_stack.5.pos_ffn.w_1.bias
torch.Size([2048])
2048
decoder.layer_stack.5.pos_ffn.w_2.weight
torch.Size([512, 2048])
1048576
decoder.layer_stack.5.pos_ffn.w_2.bias
torch.Size([512])
512
decoder.layer_stack.5.pos_ffn.layer_norm.weight
torch.Size([512])
512
decoder.layer_stack.5.pos_ffn.layer_norm.bias
torch.Size([512])
512
decoder.tgt_word_prj.weight
torch.Size([4233, 512])
2167296
53635584

打印optim_dict内部值

for key, value in net["optim_dict"].items():
    print(key)
    print(type(value))
结果显示

打印optim_dict内部param_groups

groups = net["optim_dict"]["param_groups"]
print(groups)
print(len(groups))

执行结果:

[{'lr': 2.9553222446278096e-05, 'betas': (0.9, 0.98), 'eps': 1e-09, 'weight_decay': 0, 'amsgrad': False, 'params': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256]}]
1

打印optim_dict内部state

state = net["optim_dict"]["state"]
print(len(state)) 
for key, value in state.items():
    print(key, type(value), sep=" ")

执行结果:

257
0 <class 'dict'>
1 <class 'dict'>
2 <class 'dict'>
3 <class 'dict'>
4 <class 'dict'>
5 <class 'dict'>
6 <class 'dict'>
7 <class 'dict'>
8 <class 'dict'>
9 <class 'dict'>
10 <class 'dict'>
11 <class 'dict'>
12 <class 'dict'>
13 <class 'dict'>
14 <class 'dict'>
15 <class 'dict'>
16 <class 'dict'>
17 <class 'dict'>
18 <class 'dict'>
19 <class 'dict'>
20 <class 'dict'>
21 <class 'dict'>
22 <class 'dict'>
23 <class 'dict'>
24 <class 'dict'>
25 <class 'dict'>
26 <class 'dict'>
27 <class 'dict'>
28 <class 'dict'>
29 <class 'dict'>
30 <class 'dict'>
31 <class 'dict'>
32 <class 'dict'>
33 <class 'dict'>
34 <class 'dict'>
35 <class 'dict'>
36 <class 'dict'>
37 <class 'dict'>
38 <class 'dict'>
39 <class 'dict'>
40 <class 'dict'>
41 <class 'dict'>
42 <class 'dict'>
43 <class 'dict'>
44 <class 'dict'>
45 <class 'dict'>
46 <class 'dict'>
47 <class 'dict'>
48 <class 'dict'>
49 <class 'dict'>
50 <class 'dict'>
51 <class 'dict'>
52 <class 'dict'>
53 <class 'dict'>
54 <class 'dict'>
55 <class 'dict'>
56 <class 'dict'>
57 <class 'dict'>
58 <class 'dict'>
59 <class 'dict'>
60 <class 'dict'>
61 <class 'dict'>
62 <class 'dict'>
63 <class 'dict'>
64 <class 'dict'>
65 <class 'dict'>
66 <class 'dict'>
67 <class 'dict'>
68 <class 'dict'>
69 <class 'dict'>
70 <class 'dict'>
71 <class 'dict'>
72 <class 'dict'>
73 <class 'dict'>
74 <class 'dict'>
75 <class 'dict'>
76 <class 'dict'>
77 <class 'dict'>
78 <class 'dict'>
79 <class 'dict'>
80 <class 'dict'>
81 <class 'dict'>
82 <class 'dict'>
83 <class 'dict'>
84 <class 'dict'>
85 <class 'dict'>
86 <class 'dict'>
87 <class 'dict'>
88 <class 'dict'>
89 <class 'dict'>
90 <class 'dict'>
91 <class 'dict'>
92 <class 'dict'>
93 <class 'dict'>
94 <class 'dict'>
95 <class 'dict'>
96 <class 'dict'>
97 <class 'dict'>
98 <class 'dict'>
99 <class 'dict'>
100 <class 'dict'>
101 <class 'dict'>
102 <class 'dict'>
103 <class 'dict'>
104 <class 'dict'>
105 <class 'dict'>
106 <class 'dict'>
107 <class 'dict'>
108 <class 'dict'>
109 <class 'dict'>
110 <class 'dict'>
111 <class 'dict'>
112 <class 'dict'>
113 <class 'dict'>
114 <class 'dict'>
115 <class 'dict'>
116 <class 'dict'>
117 <class 'dict'>
118 <class 'dict'>
119 <class 'dict'>
120 <class 'dict'>
121 <class 'dict'>
122 <class 'dict'>
123 <class 'dict'>
124 <class 'dict'>
125 <class 'dict'>
126 <class 'dict'>
127 <class 'dict'>
128 <class 'dict'>
129 <class 'dict'>
130 <class 'dict'>
131 <class 'dict'>
132 <class 'dict'>
133 <class 'dict'>
134 <class 'dict'>
135 <class 'dict'>
136 <class 'dict'>
137 <class 'dict'>
138 <class 'dict'>
139 <class 'dict'>
140 <class 'dict'>
141 <class 'dict'>
142 <class 'dict'>
143 <class 'dict'>
144 <class 'dict'>
145 <class 'dict'>
146 <class 'dict'>
147 <class 'dict'>
148 <class 'dict'>
149 <class 'dict'>
150 <class 'dict'>
151 <class 'dict'>
152 <class 'dict'>
153 <class 'dict'>
154 <class 'dict'>
155 <class 'dict'>
156 <class 'dict'>
157 <class 'dict'>
158 <class 'dict'>
159 <class 'dict'>
160 <class 'dict'>
161 <class 'dict'>
162 <class 'dict'>
163 <class 'dict'>
164 <class 'dict'>
165 <class 'dict'>
166 <class 'dict'>
167 <class 'dict'>
168 <class 'dict'>
169 <class 'dict'>
170 <class 'dict'>
171 <class 'dict'>
172 <class 'dict'>
173 <class 'dict'>
174 <class 'dict'>
175 <class 'dict'>
176 <class 'dict'>
177 <class 'dict'>
178 <class 'dict'>
179 <class 'dict'>
180 <class 'dict'>
181 <class 'dict'>
182 <class 'dict'>
183 <class 'dict'>
184 <class 'dict'>
185 <class 'dict'>
186 <class 'dict'>
187 <class 'dict'>
188 <class 'dict'>
189 <class 'dict'>
190 <class 'dict'>
191 <class 'dict'>
192 <class 'dict'>
193 <class 'dict'>
194 <class 'dict'>
195 <class 'dict'>
196 <class 'dict'>
197 <class 'dict'>
198 <class 'dict'>
199 <class 'dict'>
200 <class 'dict'>
201 <class 'dict'>
202 <class 'dict'>
203 <class 'dict'>
204 <class 'dict'>
205 <class 'dict'>
206 <class 'dict'>
207 <class 'dict'>
208 <class 'dict'>
209 <class 'dict'>
210 <class 'dict'>
211 <class 'dict'>
212 <class 'dict'>
213 <class 'dict'>
214 <class 'dict'>
215 <class 'dict'>
216 <class 'dict'>
217 <class 'dict'>
218 <class 'dict'>
219 <class 'dict'>
220 <class 'dict'>
221 <class 'dict'>
222 <class 'dict'>
223 <class 'dict'>
224 <class 'dict'>
225 <class 'dict'>
226 <class 'dict'>
227 <class 'dict'>
228 <class 'dict'>
229 <class 'dict'>
230 <class 'dict'>
231 <class 'dict'>
232 <class 'dict'>
233 <class 'dict'>
234 <class 'dict'>
235 <class 'dict'>
236 <class 'dict'>
237 <class 'dict'>
238 <class 'dict'>
239 <class 'dict'>
240 <class 'dict'>
241 <class 'dict'>
242 <class 'dict'>
243 <class 'dict'>
244 <class 'dict'>
245 <class 'dict'>
246 <class 'dict'>
247 <class 'dict'>
248 <class 'dict'>
249 <class 'dict'>
250 <class 'dict'>
251 <class 'dict'>
252 <class 'dict'>
253 <class 'dict'>
254 <class 'dict'>
255 <class 'dict'>
256 <class 'dict'>

统计模型参数量方法(二)

读入模型中state_dict

model.load_state_dict(net['state_dict'])

统计参数数量

num_params = 0
for param in model.parameters():
    num_params += param.numel()
print(num_params / 1e6)
print(num_params)

执行结果:

46.348288
46348288

总结:

方法(二)计算的是准确的模型参数量,方法(一)可能包含了些其他参数。
方法一 53635584 个参数
方法二 46348288 个参数
相差 7287296 个参数
经过对比,刚好相差了如下几个参数:

encoder.positional_encoding.pe
torch.Size([1, 5000, 512])
2560000
decoder.tgt_word_emb.weight
torch.Size([4233, 512])
2167296
decoder.positional_encoding.pe
torch.Size([1, 5000, 512])
2560000
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 206,968评论 6 482
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 88,601评论 2 382
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 153,220评论 0 344
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 55,416评论 1 279
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 64,425评论 5 374
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,144评论 1 285
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,432评论 3 401
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,088评论 0 261
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,586评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,028评论 2 325
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,137评论 1 334
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,783评论 4 324
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,343评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,333评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,559评论 1 262
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,595评论 2 355
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,901评论 2 345

推荐阅读更多精彩内容