区块链技术博客
www.b2bchain.cn

前向推理,合并Conv层与BN层

这篇文章主要介绍了前向推理,合并Conv层与BN层的讲解,通过具体代码实例进行22776 讲解,并且分析了前向推理,合并Conv层与BN层的详细步骤与相关技巧,需要的朋友可以参考下https://www.b2bchain.cn/?p=22776

本文实例讲述了2、树莓派设置连接WiFi,开启VNC等等的讲解。分享给大家供大家参考文章查询地址https://www.b2bchain.cn/7039.html。具体如下:

目录

  • 前言
  • 合并Conv层与BN层
    • 合并原因
    • 合并的数学原理
    • caffe版本的实现(python,嵌入式端)
    • Darknet版本的实现(C/C++,服务端)

今儿再补充一篇之前一直想写,没写的文章。下面会陆续写下聚集好久没写的博文。

前言

为何想到这,为何将caffe模型的合并,在这里源于对海思35XX系列开发板前向推理优化的原因。
我是用darknet训练的yolo模型,转为caffemodel(darknet转caffemodel,之前我也写的博文代码。讲解以后也会好好补充完,代码先上,便于先用起来再说),然后在用RuyiStudio转为.wk模型,出于这个原因,我就想能不能做到算子融合,提升前向推理的速度,那么就有了这个文章,同时这个思路可以使用到其他的工业应用上。
注意python是用的python3.x版本。

合并Conv层与BN层

合并原因

在训练深度模型时,BN层能够加快网络收敛,并且能够控制过拟合,一般放在卷积层之后(注意只是说一般,其实有些情况下,我是将BN层放在Conv层之前的,比如DenseNet网络)。BN层将数据归一化后,能够有效解决梯度消失与梯度爆炸问题。虽然BN层在训练起到了积极的作用,然而在网络前向推理时多了一些层的运算,影响了模型的性能,且占据了更多的内存或显存空间目前,很多网络模型(ResNet,MobileNet,Xception,shuffleNet等)都使用了BN技术,所以,有必要将BN层的参数合并到卷积层,来提升模型前向推理的速度,以及减少存储空间。

合并的数学原理

前向推理,合并Conv层与BN层

caffe版本的实现(python,嵌入式端)

这个pyhton脚本能够将BN层、Scale层的权值合并到卷积层中,进而提升网络前向推断的性能。
但要注意,合并之后,网络结构的文件,要将BN层和Scale层去掉。
主要用于嵌入式设备端
合并的python代码
caffe的python接口

#!/usr/bin/env python # -*- coding: UTF-8 -*- # python3.x   import numpy as np import sys sys.path.append(r"/home/XXX/caffe/python")  # 你的caffe的python接口路径 import os import os.path as osp import google.protobuf as pb import google.protobuf.text_format from argparse import ArgumentParser import caffe   caffe.set_mode_cpu()   def load_and_fill_biases(src_model, src_weights, dst_model, dst_weights):     with open(src_model) as f:         model = caffe.proto.caffe_pb2.NetParameter()         pb.text_format.Merge(f.read(), model)       for i, layer in enumerate(model.layer):         if layer.type == 'Convolution': # or layer.type == 'Scale':             # Add bias layer if needed             if layer.convolution_param.bias_term == False:                 layer.convolution_param.bias_term = True                 layer.convolution_param.bias_filler.type = 'constant'                 layer.convolution_param.bias_filler.value = 0.0       with open(dst_model, 'w') as f:         f.write(pb.text_format.MessageToString(model))       caffe.set_mode_cpu()     net_src = caffe.Net(src_model, src_weights, caffe.TEST)     net_dst = caffe.Net(dst_model, caffe.TEST)     for key in net_src.params.keys():         for i in range(len(net_src.params[key])):             net_dst.params[key][i].data[:] = net_src.params[key][i].data[:]       if dst_weights is not None:         # Store params         pass       return net_dst     def merge_conv_and_bn(net, i_conv, i_bn, i_scale):     # This is based on Kyeheyon's work     assert(i_conv != None)     assert(i_bn != None)       def copy_double(data):         return np.array(data, copy=True, dtype=np.double)       key_conv = net._layer_names[i_conv]     key_bn = net._layer_names[i_bn]     key_scale = net._layer_names[i_scale] if i_scale else None       # Copy     bn_mean = copy_double(net.params[key_bn][0].data)     bn_variance = copy_double(net.params[key_bn][1].data)     num_bn_samples = copy_double(net.params[key_bn][2].data)       # and Invalidate the BN layer     net.params[key_bn][0].data[:] = 0     net.params[key_bn][1].data[:] = 1     net.params[key_bn][2].data[:] = 1       if num_bn_samples[0] == 0:         num_bn_samples[0] = 1       # if net.params.has_key(key_scale):     if key_scale in net.params:         print('Combine {:s} + {:s} + {:s}'.format(key_conv, key_bn, key_scale))         scale_weight = copy_double(net.params[key_scale][0].data)         scale_bias = copy_double(net.params[key_scale][1].data)         net.params[key_scale][0].data[:] = 1         net.params[key_scale][1].data[:] = 0       else:         print('Combine {:s} + {:s}'.format(key_conv, key_bn))         scale_weight = 1         scale_bias = 0       weight = copy_double(net.params[key_conv][0].data)     bias = copy_double(net.params[key_conv][1].data)       alpha = scale_weight / np.sqrt(bn_variance / num_bn_samples[0] + 1e-5)     net.params[key_conv][1].data[:] = bias * alpha + (scale_bias - (bn_mean / num_bn_samples[0]) * alpha)     for i in range(len(alpha)):         net.params[key_conv][0].data[i] = weight[i] * alpha[i]     def merge_batchnorms_in_net(net):     # for each BN     for i, layer in enumerate(net.layers):         if layer.type != 'BatchNorm':             continue           l_name = net._layer_names[i]           l_bottom = net.bottom_names[l_name]         assert(len(l_bottom) == 1)         l_bottom = l_bottom[0]         l_top = net.top_names[l_name]         assert(len(l_top) == 1)         l_top = l_top[0]           can_be_absorbed = True           # Search all (bottom) layers         for j in range(i - 1, -1, -1):             tops_of_j = net.top_names[net._layer_names[j]]             if l_bottom in tops_of_j:                 if net.layers[j].type not in ['Convolution', 'InnerProduct']:                     can_be_absorbed = False                 else:                     # There must be only one layer                     conv_ind = j                     break           if not can_be_absorbed:             continue           # find the following Scale         scale_ind = None         for j in range(i + 1, len(net.layers)):             bottoms_of_j = net.bottom_names[net._layer_names[j]]             if l_top in bottoms_of_j:                 if scale_ind:                     # Followed by two or more layers                     scale_ind = None                     break                   if net.layers[j].type in ['Scale']:                     scale_ind = j                       top_of_j = net.top_names[net._layer_names[j]][0]                     if top_of_j == bottoms_of_j[0]:                         # On-the-fly => Can be merged                         break                   else:                     # Followed by a layer which is not 'Scale'                     scale_ind = None                     break             merge_conv_and_bn(net, conv_ind, i, scale_ind)       return net     def process_model(net, src_model, dst_model, func_loop, func_finally):     with open(src_model) as f:         model = caffe.proto.caffe_pb2.NetParameter()         pb.text_format.Merge(f.read(), model)       for i, layer in enumerate(model.layer):         map(lambda x: x(layer, net, model, i), func_loop)       map(lambda x: x(net, model), func_finally)       with open(dst_model, 'w') as f:         f.write(pb.text_format.MessageToString(model))     # Functions to remove (redundant) BN and Scale layers to_delete_empty = [] def pick_empty_layers(layer, net, model, i):     if layer.type not in ['BatchNorm', 'Scale']:         return       bottom = layer.bottom[0]     top = layer.top[0]       if (bottom != top):         # Not supperted yet         return       if layer.type == 'BatchNorm':         zero_mean = np.all(net.params[layer.name][0].data == 0)         one_var = np.all(net.params[layer.name][1].data == 1)           if zero_mean and one_var:             print 'Delete layer: {}'.format(layer.name)             to_delete_empty.append(layer)       if layer.type == 'Scale':         no_scaling = np.all(net.params[layer.name][0].data == 1)         zero_bias = np.all(net.params[layer.name][1].data == 0)           if no_scaling and zero_bias:             print 'Delete layer: {}'.format(layer.name)             to_delete_empty.append(layer)     def remove_empty_layers(net, model):     map(model.layer.remove, to_delete_empty)     # A function to add 'engine: CAFFE' param into 1x1 convolutions def set_engine_caffe(layer, net, model, i):     if layer.type == 'Convolution':         if layer.convolution_param.kernel_size == 1             or (layer.convolution_param.kernel_h == layer.convolution_param.kernel_w == 1):             layer.convolution_param.engine = dict(layer.convolution_param.Engine.items())['CAFFE']     def main():     # Set default output file names     if args.output_model is None:        file_name = osp.splitext(args.model)[0]        args.output_model = file_name + '_inference.prototxt'     if args.output_weights is None:        file_name = osp.splitext(args.weights)[0]        args.output_weights = file_name + '_inference.caffemodel'       net = load_and_fill_biases(args.model, args.weights, args.model + '.temp.pt', None)     net = merge_batchnorms_in_net(net)       process_model(net, args.model + '.temp.pt', args.output_model,                   [pick_empty_layers, set_engine_caffe],                   [remove_empty_layers])       # Store params     net.save(args.output_weights)     if __name__ == '__main__':    parser = ArgumentParser(            description="Generate Batch Normalized model for inference")    parser.add_argument('--model', default="MobileNetYOLO_deploy.prototxt", help="The net definition prototxt")    parser.add_argument('--weights', default="MobileNetYOLO_deploy.caffemodel", help="The weights caffemodel")    parser.add_argument('--output_model')    parser.add_argument('--output_weights')    args = parser.parse_args()    main() 

后期,我会将两次转化的模型代码合并一次性转换。并添加测试转后模型的代码。

Darknet版本的实现(C/C++,服务端)

这里主要用于了服务端

  1. 代码添加修改
    保持合并后的参数。文件parser.c中添加
//保存convolutional_weights void save_convolutional_weights_nobn(layer l, FILE *fp) {     if(l.binary){         //save_convolutional_weights_binary(l, fp);         //return;     } #ifdef GPU     if(gpu_index >= 0){         pull_convolutional_layer(l);     } #endif     int num = l.nweights;     //fwrite(l.biases, sizeof(float), l.n, fp);     /*if (l.batch_normalize){         fwrite(l.scales, sizeof(float), l.n, fp);         fwrite(l.rolling_mean, sizeof(float), l.n, fp);         fwrite(l.rolling_variance, sizeof(float), l.n, fp);     }*/     if (l.batch_normalize) { 		for (int j = 0; j < l.n; j++) { 			l.biases[j] = l.biases[j] - l.scales[j] * l.rolling_mean[j] / (sqrt(l.rolling_variance[j]) + 0.000001f); 			for (int k = 0; k < l.size*l.size*l.c; k++) { 				l.weights[j*l.size*l.size*l.c + k] = l.scales[j] * l.weights[j*l.size*l.size*l.c + k] / (sqrt(l.rolling_variance[j]) + 0.000001f); 			} 		} 	}     fwrite(l.biases, sizeof(float), l.n, fp);      fwrite(l.weights, sizeof(float), num, fp); }  

inference时加载更改后的.weights文件,parser.c中添加代码

void load_convolutional_weights_nobn(layer l, FILE *fp) {     if(l.binary){         //load_convolutional_weights_binary(l, fp);         //return;     }     if(l.numload) l.n = l.numload;     int num = l.c/l.groups*l.n*l.size*l.size;     fread(l.biases, sizeof(float), l.n, fp); 	//fprintf(stderr, "Loading l.biases num:%d,size:%d*%dn", l.n, l.n, sizeof(float));          fread(l.weights, sizeof(float), num, fp); 	//fprintf(stderr, "Loading weights num:%d,size:%d*%dn", num, num,sizeof(float));     if(l.c == 3) scal_cpu(num, 1./256, l.weights, 1);     if (l.flipped) {         transpose_matrix(l.weights, l.c*l.size*l.size, l.n);     }     if (l.binary) binarize_weights(l.weights, l.n, l.c*l.size*l.size, l.weights); }  

增加配置参数代码,在detector.c

 //detector.c //void run_detector(int argc, char **argv)中增加部分代码  void run_detector(int argc, char **argv) { 	......     if(0==strcmp(argv[2], "test")) test_detector(datacfg, cfg, weights, filename, thresh, hier_thresh, outfile, fullscreen);     else if(0==strcmp(argv[2], "train")) train_detector(datacfg, cfg, weights, gpus, ngpus, clear);     else if(0==strcmp(argv[2], "valid")) validate_detector(datacfg, cfg, weights, outfile);     else if(0==strcmp(argv[2], "valid2")) validate_detector_flip(datacfg, cfg, weights, outfile);     else if(0==strcmp(argv[2], "recall")) validate_detector_recall(cfg, weights);     else if(0==strcmp(argv[2], "demo")) {         list *options = read_data_cfg(datacfg);         int classes = option_find_int(options, "classes", 20);         char *name_list = option_find_str(options, "names", "data/names.list");         char **names = get_labels(name_list);         demo(cfg, weights, thresh, cam_index, filename, names, classes, frame_skip, prefix, avg, hier_thresh, width, height, fps, fullscreen);     }     //add here     else if(0==strcmp(argv[2], "combineBN")) test_detector_comBN(datacfg, cfg, weights, filename, weightname,thresh, hier_thresh, outfile, fullscreen);  } //增加test_detector_comBN函数 void test_detector_comBN(char *datacfg, char *cfgfile, char *weightfile, char *filename,char *weightname ,float thresh, float hier_thresh, char *outfile, int fullscreen) {     list *options = read_data_cfg(datacfg);     char *name_list = option_find_str(options, "names", "data/names.list");     char **names = get_labels(name_list);      image **alphabet = load_alphabet();     network *net = load_network(cfgfile, weightfile, 0);     // 定点化保存参数     save_weights_nobn(net, weightname); }  

以上是主要实现需要添加修改的部分,然后编译一下,重新生成可执行文件。
2. 编译
如何是linux系统

cd darknet/ make all 

windows系统
就重新编译darknet就可以了,这里就不多说了。
3. 生成无BN的权重和网络结构模型文件
执行combineBN命令

./darknet detector combineBN cfg/2024.data cfg/yolov3.cfg yolov3.weights data/1.jpg yolo_inference_nobn.weights 

其中,.data、cfg、weights是自己的参数文件,合并之后的权重文件为yolo_inference_nobn.weights
注意需要修给cfg文件,将所有的batch_normalize=1改为batch_normalize=0,也就是说,将batch_normalize去掉了。
4. 使用合并之后的权重进行推理测试:

./darknet detector test cfg/2024.data cfg/yolov3_nobn.cfg yolo_inference_nobn.weights data/1.jpg 

结果如下:
前向推理,合并Conv层与BN层
图片就不上了。

本文转自互联网,侵权联系删除前向推理,合并Conv层与BN层

赞(0) 打赏
部分文章转自网络,侵权联系删除b2bchain区块链学习技术社区 » 前向推理,合并Conv层与BN层
分享到: 更多 (0)

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

b2b链

联系我们联系我们