一般Convolutional Neural Networks包含卷积层,BN层,激活层以及池化层。池化层较为简单,不再赘述。借此机会详细的介绍其他三层是如何实现的,以及如何自定义卷积层参数。
Convolution layer网上写卷积的博客不计其数,大都是长篇大论,其实卷积十分简单,见下图。
上图所示输入为 【5,5 ,1】 的图像,卷积核大小为 3 * 3,步长为1 【一步一步走】,padding=0【如果为1会在图像外面补一圈0】,偏置为0。可以初步的理解卷积操作为提取图像特征。
【注意】当输入的channels为多维时,一个卷积核会生成对应维度的w,进行卷积最后相加。
卷积核中的值是意思代表什么含义中?
经过了解发现卷积核中的值即权重w,偏置bias为b,即y=w * x + b。
如何查看卷积中的权值和偏置?
代码如下:
import torch import torch.nn as nn conv = nn.Conv2d(in_channels = 3, out_channels = 1, kernel_size=3,stride=1,padding=0) print('weight: ',conv.weight) print('bias: ',conv.bias) #输出 weight: Parameter containing: tensor([[[[ 0.0409, -0.1187, -0.1277], [ 0.1090, 0.1126, -0.1540], [ 0.0520, 0.0716, 0.0857]], [[ 0.1192, 0.0912, 0.0131], [-0.0120, 0.0832, 0.0190], [ 0.0125, 0.0831, 0.1276]], [[-0.1231, 0.1494, -0.0117], [ 0.0709, 0.1686, -0.1689], [-0.1288, 0.0996, 0.0310]]]], requires_grad=True) bias: Parameter containing: tensor([0.1709], requires_grad=True)
这个值是怎么来的呢?
经观察发现,每次运行这段代码,获得的值均不相同,由此可以推断出w和b是随机生成的。
w和b的值可以自定义吗?
当然可以,代码如下
import torch import torch.nn as nn conv = nn.Conv2d(in_channels = 3, out_channels = 1, kernel_size=3,stride=1,padding=0) ones=torch.Tensor(np.ones([1,3,3,3])) # 产生3*3*3的卷积核,channel与输入的channel对应 # print(self.conv1.weight) conv.weight=torch.nn.Parameter(ones) conv.bias=torch.nn.Parameter(torch.Tensor([1])) print('weight: ',conv.weight) print('bias: ',conv.bias) #输出 weight: Parameter containing: tensor([[[[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]], [[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]], [[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]]]], requires_grad=True) bias: Parameter containing: tensor([1.], requires_grad=True)
实验成功,带入真实数据可否进行卷积运算呢?
代码如下。
import torch import torch.nn as nn import numpy as np data = torch.tensor([[ [[1,1,1],[10,1,1],[1,1,1],[1,1,1],[1,1,1]], [[2,2,2],[20,2,2],[2,2,2],[2,2,2],[2,2,2]], [[3,3,3],[30,3,3],[3,3,3],[3,3,3],[3,3,3]], [[4,4,4],[40,4,4],[4,4,4],[4,4,4],[4,4,4]], [[5,5,5],[50,5,5],[5,5,5],[5,5,5],[5,5,5]] ]]).float() print(data.shape) data = data.permute(0,3,1,2) #将输入的shape,(1,5,5,3)——>(1,3,5,5) class CNN(nn.Module): def __init__(self): super(CNN,self).__init__() ones=torch.Tensor(np.ones([1,3,3,3])) #产生3*3*3的卷积核,channel与输入的channel对应 self.conv1 = nn.Conv2d(in_channels = 3, out_channels = 1, kernel_size=3,stride=1,padding=0,bias=False) self.conv1.weight=torch.nn.Parameter(ones) #自定义weight self.conv1.bias=torch.nn.Parameter(torch.Tensor([1])) #自定义bias def forward(self,x): out = self.conv1(x) return out print(data) net = CNN() print(net(data)) #input输出 tensor([[[[ 1., 10., 1., 1., 1.], [ 2., 20., 2., 2., 2.], [ 3., 30., 3., 3., 3.], [ 4., 40., 4., 4., 4.], [ 5., 50., 5., 5., 5.]], [[ 1., 1., 1., 1., 1.], [ 2., 2., 2., 2., 2.], [ 3., 3., 3., 3., 3.], [ 4., 4., 4., 4., 4.], [ 5., 5., 5., 5., 5.]], [[ 1., 1., 1., 1., 1.], [ 2., 2., 2., 2., 2.], [ 3., 3., 3., 3., 3.], [ 4., 4., 4., 4., 4.], [ 5., 5., 5., 5., 5.]]]]) #output输出 tensor([[[[109., 109., 55.], [163., 163., 82.], [217., 217., 109.]]]], grad_fn=)
在此我们验证一下左上角数据:109 ,顺便验证卷积操作。卷积核参数已经自定义为全是1。故
左上角卷积结果:
第一维度:1 + 10 + 1 + 2 + 20 + 2 + 3 + 30 + 3 + 30 + 3 = 72 第二维度:1 + 1 + 1 + 2 + 2 + 2 + 3 + 3 + 3 = 18 第三维度:1 + 1 + 1 + 2 + 2 + 2 + 3 + 3 + 3 = 18 72 + 18 + 18 = 108 最后加上偏置bias(自定义为1)= 109
由此验证自定义卷积层参数可行,且卷积操作正确。
Batch Normalization Layer关于BN层的博客也有很多,其实BN层很简单。
即对输入的一个batch的数据,在同一纬度上求均值和方差,在进行更新。γ初始值为1,β初始值为0,将不断进行学习更新。
验证代码如下。
官方代码实现。
import torch import torch.nn as nn import numpy as np input = torch.tensor([[ [[1,1,1],[10,1,1],[1,1,1],[1,1,1],[1,1,1]], [[2,2,2],[20,2,2],[2,2,2],[2,2,2],[2,2,2]], [[3,3,3],[30,3,3],[3,3,3],[3,3,3],[3,3,3]], [[4,4,4],[40,4,4],[4,4,4],[4,4,4],[4,4,4]], [[5,5,5],[50,5,5],[5,5,5],[5,5,5],[5,5,5]] ]]).float() input = input.permute(0,3,1,2) m=nn.BatchNorm2d(3) output=m(input) print(m.weight) print(m.bias) print(output) #输出 Parameter containing: tensor([1., 1., 1.], requires_grad=True) Parameter containing: tensor([0., 0., 0.], requires_grad=True) tensor([[[[-0.5883, 0.1272, -0.5883, -0.5883, -0.5883], [-0.5088, 0.9221, -0.5088, -0.5088, -0.5088], [-0.4293, 1.7171, -0.4293, -0.4293, -0.4293], [-0.3498, 2.5121, -0.3498, -0.3498, -0.3498], [-0.2703, 3.3070, -0.2703, -0.2703, -0.2703]], [[-1.4142, -1.4142, -1.4142, -1.4142, -1.4142], [-0.7071, -0.7071, -0.7071, -0.7071, -0.7071], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.7071, 0.7071, 0.7071, 0.7071, 0.7071], [ 1.4142, 1.4142, 1.4142, 1.4142, 1.4142]], [[-1.4142, -1.4142, -1.4142, -1.4142, -1.4142], [-0.7071, -0.7071, -0.7071, -0.7071, -0.7071], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.7071, 0.7071, 0.7071, 0.7071, 0.7071], [ 1.4142, 1.4142, 1.4142, 1.4142, 1.4142]]]], grad_fn=)
可以看到初始γ为1,β为0。
通过公式自己实现,代码如下。
import torch import torch.nn as nn import numpy as np input = torch.tensor([[ [[1,1,1],[10,1,1],[1,1,1],[1,1,1],[1,1,1]], [[2,2,2],[20,2,2],[2,2,2],[2,2,2],[2,2,2]], [[3,3,3],[30,3,3],[3,3,3],[3,3,3],[3,3,3]], [[4,4,4],[40,4,4],[4,4,4],[4,4,4],[4,4,4]], [[5,5,5],[50,5,5],[5,5,5],[5,5,5],[5,5,5]] ]]).float() input = input.permute(0,3,1,2) batch, channel, w, h = input.shape for c in range(channel): data = input[:,c,:,:] mean = data.mean() var = data.var(unbiased=False) input[:,c,:,:] = (input[:,c,:,:] - mean) / (np.sqrt(var+1e-5)) print(input) #输出 tensor([[[[-0.5883, 0.1272, -0.5883, -0.5883, -0.5883], [-0.5088, 0.9221, -0.5088, -0.5088, -0.5088], [-0.4293, 1.7171, -0.4293, -0.4293, -0.4293], [-0.3498, 2.5121, -0.3498, -0.3498, -0.3498], [-0.2703, 3.3070, -0.2703, -0.2703, -0.2703]], [[-1.4142, -1.4142, -1.4142, -1.4142, -1.4142], [-0.7071, -0.7071, -0.7071, -0.7071, -0.7071], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.7071, 0.7071, 0.7071, 0.7071, 0.7071], [ 1.4142, 1.4142, 1.4142, 1.4142, 1.4142]], [[-1.4142, -1.4142, -1.4142, -1.4142, -1.4142], [-0.7071, -0.7071, -0.7071, -0.7071, -0.7071], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.7071, 0.7071, 0.7071, 0.7071, 0.7071], [ 1.4142, 1.4142, 1.4142, 1.4142, 1.4142]]]])
可以看到两个输出值相同,验证了公式的正确性。另外自己实现,使用tensor求解方差时:
var = data.var(unbiased=False)
一定要unbiased=False,即不使用贝塞尔校正。贝塞尔校正求方差公式如下:
正常求取方差公式如下:
使用贝塞尔校正求取的方差会略大于正确值。(足足排查了半个小时才发现)。
Activation function激活函数比较好理解,主要是将bn层后的输出映射为非线性。这里以实现relu为例。
官方实现
import torch import torch.nn as nn input = torch.tensor([[ [[-1,-1,-1],[-10,-1,-1],[-1,-1,-1],[-1,-1,-1],[-1,-1,-1]], [[2,2,2],[20,2,2],[2,2,2],[2,2,2],[2,2,2]], ]]).float() relu = nn.ReLU() print(relu(input))
自己实现
import torch import torch.nn as nn input = torch.tensor([[ [[-1,-1,-1],[-10,-1,-1],[-1,-1,-1],[-1,-1,-1],[-1,-1,-1]], [[2,2,2],[20,2,2],[2,2,2],[2,2,2],[2,2,2]], ]]).float() def relu(input): return torch.maximum(input,torch.tensor(0)) print(relu(input))完整的实现整个过程
代码如下
import torch import torch.nn as nn import numpy as np from torchvision import transforms data1 = np.array([ [[1,1,1],[10,1,1],[1,1,1],[1,1,1],[1,1,1]], [[2,2,2],[20,2,2],[2,2,2],[2,2,2],[2,2,2]], [[3,3,3],[30,3,3],[3,3,3],[3,3,3],[3,3,3]], [[4,4,4],[40,4,4],[4,4,4],[4,4,4],[4,4,4]], [[5,5,5],[50,5,5],[5,5,5],[5,5,5],[5,5,5]] ], dtype='uint8') data2 = np.ones((5,5,3), dtype='uint8') print(data1) print(data2) print(data1.shape) print(data2.shape) #数据预处理,请看前面的博客 data1 = transforms.ToTensor()(data1) data2 = transforms.ToTensor()(data2) data1 = torch.unsqueeze(data1, 0) data2 = torch.unsqueeze(data2, 0) input = torch.cat((data1, data2),0) class CNN(nn.Module): def __init__(self): super(CNN,self).__init__() ones=torch.Tensor(np.ones([1,3,3,3])) self.conv1 = nn.Conv2d(in_channels = 3, out_channels = 1, kernel_size=3,stride=1,padding=0,bias=False) self.conv1.weight=torch.nn.Parameter(ones) self.conv1.bias=torch.nn.Parameter(torch.Tensor([1])) self.bn1 = nn.BatchNorm2d(1) self.relu = nn.ReLU() def forward(self,x): print(x) out = self.conv1(x) print(out) out = self.bn1(out) print(out) return out net = CNN() net(input) #输出 # input tensor([[[[0.0039, 0.0392, 0.0039, 0.0039, 0.0039], [0.0078, 0.0784, 0.0078, 0.0078, 0.0078], [0.0118, 0.1176, 0.0118, 0.0118, 0.0118], [0.0157, 0.1569, 0.0157, 0.0157, 0.0157], [0.0196, 0.1961, 0.0196, 0.0196, 0.0196]], [[0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0078, 0.0078, 0.0078, 0.0078, 0.0078], [0.0118, 0.0118, 0.0118, 0.0118, 0.0118], [0.0157, 0.0157, 0.0157, 0.0157, 0.0157], [0.0196, 0.0196, 0.0196, 0.0196, 0.0196]], [[0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0078, 0.0078, 0.0078, 0.0078, 0.0078], [0.0118, 0.0118, 0.0118, 0.0118, 0.0118], [0.0157, 0.0157, 0.0157, 0.0157, 0.0157], [0.0196, 0.0196, 0.0196, 0.0196, 0.0196]]], [[[0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039]], [[0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039]], [[0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039], [0.0039, 0.0039, 0.0039, 0.0039, 0.0039]]]]) #卷积 tensor([[[[1.4235, 1.4235, 1.2118], [1.6353, 1.6353, 1.3176], [1.8471, 1.8471, 1.4235]]], [[[1.1059, 1.1059, 1.1059], [1.1059, 1.1059, 1.1059], [1.1059, 1.1059, 1.1059]]]], grad_fn=对于卷积层的讲解如上所示。如想了解数据预处理以及损失函数CE Loss请看主页博文。如有疑问欢迎提出。) #BN tensor([[[[ 4.0822e-01, 4.0822e-01, -4.0822e-01], [ 1.2247e+00, 1.2247e+00, -2.7576e-06], [ 2.0411e+00, 2.0411e+00, 4.0822e-01]]], [[[-8.1644e-01, -8.1644e-01, -8.1644e-01], [-8.1644e-01, -8.1644e-01, -8.1644e-01], [-8.1644e-01, -8.1644e-01, -8.1644e-01]]]], grad_fn= ) #activation function tensor([[[[0.4082, 0.4082, 0.0000], [1.2247, 1.2247, 0.0000], [2.0411, 2.0411, 0.4082]]], [[[0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000]]]], grad_fn= )