手写minist数字识别(验证集最高正确率为99.8%)
环境
- python 3.8.13
- pytorch 1.12.1+cu113
一、 导包,设置GPU环境
In:
%matplotlib inline
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchinfo import summary# 判断是否有cuda环境,如果有则使用cuda环境
# 当torch._C._cuda_getDeviceCount() > 0,重启有时会解决问题
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
Out:
cuda
torchvision.datasets.MNIST 参数
参数:root (string):数据集的根目录,其中'FashionMNIST/raw/train-images-idx3-ubyte'和'FashionMNIST/raw/t10k-images-idx3-ubyte'如果已经存在,则跳过下载。train (bool,可选):如果为True,则从'train-images-idx3-ubyte'创建数据集,否则从't10k-images-idx3-ubyte'中创建。download (bool,可选):如果为True,则从internet下载数据集把它放在根目录。如果dataset已经下载,则不会下载再次下载。transform(可调用,可选):接受PIL映像的函数/转换并返回一个转换后的格式。例如`transforms.RandomCrop`。target_transform(可调用的,可选的):一个函数/转换目标并转换它。
在vscode 中 ctrl + 鼠标左键
查看源码
In:
# 设置根目录,训练集,检查下载,输出
data_root = '365data'
training_data = datasets.MNIST(root=data_root,train=True,download=True,transform=ToTensor(),
)
test_data = datasets.MNIST(root=data_root,train=False,download=True,transform=ToTensor(),
)
torch.utils.data.DataLoader 参数解析
直接上源码中的参数解释(仅部分)
dataset
数据源batch_size
将数据集拆为batch_size大小的batch,用于在这个batch在求梯度均值,且减小了内存的开销,建议是32的倍数(cuda核数是2的倍数)。shuffle
是否乱序,分桶乱序机制。
DataLoader(dataset: Dataset, batch_size: int | None = 1, shuffle: bool | None = None)Args:dataset (Dataset): dataset from which to load the data.batch_size (int, optional): how many samples per batch to load(default: ``1``).shuffle (bool, optional): set to ``True`` to have the data reshuffledat every epoch (default: ``False``).sampler (Sampler or Iterable, optional): defines the strategy to drawsamples from the dataset. Can be any ``Iterable`` with ``__len__``implemented. If specified, :attr:`shuffle` must not be specified.
In:
batch_size = 64train_dataloader = DataLoader(training_data, batch_size=batch_size, shuffle = True)
test_dataloader = DataLoader(test_data, batch_size = batch_size, shuffle = True)
In:
for X, y in test_dataloader:print("Shape of X [BatchSize, Channel, Height, Weight]: ", X.shape)print("Shape of y: ", y.shape, y.dtype)break
Out:
Shape of X [BatchSize, Channel, Height, Weight]: torch.Size([64, 1, 28, 28]) Shape of y: torch.Size([64]) torch.int64
In:
figure = plt.figure(figsize=(10, 4))
cols, rows = 5, 2
for i in range(1, cols * rows + 1):# 返回元素取值范围为[0-high),size大小的tensor。再通过item取值。idx = torch.randint(len(test_data), size=(1,)).item()img, label = test_data[idx]figure.add_subplot(rows, cols, i)plt.title(label)plt.axis("off")plt.imshow(img.squeeze(), cmap="gray")
plt.show()
Out:
二、定义模型
重点
我们通过继承nn.Module
来定义子类模型,然后初始化__init__
继承其它属性。每个nn. Module
的子类都应该有forward
方法传递输入的数据。
我这里的神经网络结构:绘图网站
In:
num_classes = 10class Model(nn.Module):"""模拟LeNet搭建的模型"""def __init__(self) -> None:super().__init__()# self.flatten = nn.Flatten()self.conv_relu_stack = nn.Sequential(# 卷积层nn.Conv2d(1, 32 , kernel_size = 3),# 激活函数nn.ReLU(),# 池化层nn.MaxPool2d(2),# 卷积层nn.Conv2d(32, 64, kernel_size = 3),# 激活函数nn.ReLU(),# 池化层nn.MaxPool2d(2))self.dense_relu_stack = nn.Sequential(# 全连接层nn.Flatten(),# 线性层nn.Linear(5 * 5 * 64, 128),nn.Dropout(0.3),nn.ReLU(),nn.Linear(128, 64),nn.Dropout(0.3),nn.ReLU(),nn.Linear(64 , num_classes))def forward(self, x):# x = self.flatten(x)x = self.conv_relu_stack(x)x = self.dense_relu_stack(x)# x = F.log_softmax(x, dim = 1) # 如果用交叉熵,就不用这个return x
In:
# model = Model().to(device = device)
model = Model().to(device = device)print(model)
Out:
Model((conv_relu_stack): Sequential((0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))(1): ReLU()(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)(3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))(4): ReLU()(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False))(dense_relu_stack): Sequential((0): Flatten(start_dim=1, end_dim=-1)(1): Linear(in_features=1600, out_features=128, bias=True)(2): Dropout(p=0.3, inplace=False)(3): ReLU()(4): Linear(in_features=128, out_features=64, bias=True)(5): Dropout(p=0.3, inplace=False)(6): ReLU()(7): Linear(in_features=64, out_features=10, bias=True)) )
1. cnn 参数公式(参数量,计算量)
计算量
输入特征图f=(B,H,W,C),卷积核kernel=(K,S,C,O)
- B:batch size
- H, W, C:输入特征图的长宽及通道数
- K, S:kernel size, 步长(stride)
- O:输出通道数
1.单次卷积,是单个kernel 的 k ∗ k ∗ c k * k * c k∗k∗c个元素与输入特征图的对应位置做乘法,需要 k ∗ k ∗ c k * k * c k∗k∗c次乘法操作,此时得到 k ∗ k ∗ c k * k * c k∗k∗c个数值,将其累加,需要 k ∗ k ∗ c − 1 k * k * c - 1 k∗k∗c−1
次加法。
所以一次卷积所需要的乘加计算量:
2 ∗ c ∗ k 2 − 1 2*c*k^2 -1 2∗c∗k2−1
2.上一步操作后只是得到了输出feature map 的某个通道中的一个元素。而输出feature map的单个通道的元素大小为:
N h = [ H + P h − K S ] 向下取整 N w = [ H + P w − K S ] 向下取整 N_h=[\frac{H+P_h-K}{S}]_{向下取整}\\ N_w=[\frac{H+P_w-K}{S}]_{向下取整} Nh=[SH+Ph−K]向下取整Nw=[SH+Pw−K]向下取整
其中, P h P_h Ph, P w P_w Pw 分别为h和w方向上的padding大小。
每得到一个元素就需要一次卷积操作,得到最终的fp的一个通道需要的操作数:
( 2 ∗ c ∗ k 2 − 1 ) ∗ ( N h ∗ N w ) (2*c*k^2-1)*(N_h*N_w) (2∗c∗k2−1)∗(Nh∗Nw)
3.一个通道需要步骤2的操作数,最终的feature map有O个通道,且batch size为B。
则最终计算量:
B ∗ ( 2 ∗ c ∗ k 2 − 1 ) ∗ ( N h ∗ N w ) ∗ O B*(2*c*k^2-1)*(N_h*N_w)*O B∗(2∗c∗k2−1)∗(Nh∗Nw)∗O
参数量
CNN网络的参数量和特征图的大小无关,仅和卷积核的大小,偏置及BN有关。
对于 k e r n e l = ( K , S , C , O ) kernel = (K,S,C,O) kernel=(K,S,C,O)的卷积核,其权重参数量为 K ∗ K ∗ C ∗ O K * K * C * O K∗K∗C∗O,加上偏置量,α,,共需要 O O O个参数。
最终需要的参数量:
( K 2 ∗ C + 1 ) ∗ O (K^2 * C + 1) * O (K2∗C+1)∗O
2. 手算参数量
# name size parameters
--- -------- ------------------------- ------------------------0 input 1x28x28 01 conv2d1 (28-(3-1))=26 -> 32x26x26 (3*3*1+1)*32 = 3202 maxpool1 32x13x13 03 conv2d2 (12-(3-1))=10 -> 64x11x11 (3*3*32+1)*64 = 18,4964 maxpool2 64x5x5 05 dense 128 (64*5*5+1)*128 = 204,9286 dense 64 (128+1)*64 = 8,256 6 output 10 (64+1)*10 = 650
In:
summary(model)
Out:
=================================================================
Layer (type:depth-idx) Param #
=================================================================
Model --
├─Sequential: 1-1 --
│ └─Conv2d: 2-1 320
│ └─ReLU: 2-2 --
│ └─MaxPool2d: 2-3 --
│ └─Conv2d: 2-4 18,496
│ └─ReLU: 2-5 --
│ └─MaxPool2d: 2-6 --
├─Sequential: 1-2 --
│ └─Flatten: 2-7 --
│ └─Linear: 2-8 204,928
│ └─Dropout: 2-9 --
│ └─ReLU: 2-10 --
│ └─Linear: 2-11 8,256
│ └─Dropout: 2-12 --
│ └─ReLU: 2-13 --
│ └─Linear: 2-14 650
=================================================================
Total params: 232,650
Trainable params: 232,650
Non-trainable params: 0
=================================================================
三、 训练模型
1. 设置超参数
In:
loss_fn = nn.CrossEntropyLoss() # 交叉熵损失函数,该损失函数不需要softmax层
learning_rate = 1e-3
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
2. 训练函数
model.train()
是模型可训练模式,参数是可变的。
model.eval()
是模型执行模式,参数固定不可变。
optimizer.zero_grad()
梯度置0
loss.backward()
求梯度
optimizer.step()
根据梯度计算w的值
In:
def train(dataloader, model, loss_fn, optimizer):model.train()size = len(dataloader.dataset)batchs = len(dataloader)train_acc,train_loss = 0, 0for batch, (X, y) in enumerate(dataloader):X, y = X.to(device), y.to(device)# 计算损失值pred = model(X)loss = loss_fn(pred, y)# 反向传播optimizer.zero_grad() # 梯度置0loss.backward() # 求梯度optimizer.step() # 根据梯度计算w的值train_acc += (pred.argmax(1) == y).type(torch.float).sum().item()train_loss += loss.item()# if batch % 100 == 0:# loss, current = loss.item(), batch * len(X)# print(f"loss: {loss:>7f} [{current:5d}/{size:>5d}]")train_acc /= sizetrain_loss /= batchs return train_acc, train_loss
In:
def validate(dataloader, model):model.eval()size = len(dataloader.dataset)batchs = len(dataloader)validate_acc, validate_loss = 0, 0with torch.no_grad():for X, y in dataloader:X, y = X.to(device), y.to(device)pred = model(X)validate_acc += (pred.argmax(1) == y).type(torch.float).sum().item()validate_loss += loss_fn(pred, y).item()validate_acc /= sizevalidate_loss /= batchsreturn validate_acc, validate_loss
3. 执行函数
In:
epochs = 50
res = {"train_loss" : [], "train_acc" : [], "val_loss" : [] , "val_acc" : []}for i in range(epochs):epoch_train_acc, epoch_train_loss = train(train_dataloader, model, loss_fn, optimizer)epoch_validate_acc, epoch_validate_loss = validate(test_dataloader,model)print(f'Epoch:{i:3d}, Train_acc:{epoch_train_acc:.4f}, Train_loss:{epoch_train_loss:.4f}, ')res["train_acc"].append(epoch_train_acc)res["train_loss"].append(epoch_train_loss)res["val_acc"].append(epoch_validate_acc)res["val_loss"].append(epoch_validate_loss)
Out:
Epoch: 0, Train_acc:0.897, Train_loss:0.329, Epoch: 1, Train_acc:0.972, Train_loss:0.098, Epoch: 2, Train_acc:0.980, Train_loss:0.073, ----------------此处省略45行-----------------, Epoch: 48, Train_acc:0.998, Train_loss:0.005, Epoch: 49, Train_acc:0.998, Train_loss:0.006,
四、结果可视化
In:
# 设置字体为雅黑,unicode显示负号,像素dpi为100
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.rcParams['figure.dpi'] = 100epoch_range = range(epochs)plt.figure(figsize = (12, 3))plt.subplot(1, 2, 1)
plt.plot(epoch_range, res['train_acc'], label = 'Training Accuracy')
plt.plot(epoch_range, res['val_acc'], label = 'Validation Accuracy')
plt.legend(loc = "lower right")
plt.title("Training and Validation Accuracy")plt.subplot(1, 2, 2)
plt.plot(epoch_range, res['train_loss'], label = 'Training Loss')
plt.plot(epoch_range, res['val_loss'], label = 'Validation Loss')
plt.legend(loc = "upper right")
plt.title('Training and Validation Loss')plt.show()
Out:
In:
print("最终准确率",res["val_acc"][-1])
Out:
最终准确率 0.9998833333333333
In:
torch.save(model,'p1/p1.pth')
In:
model_read = torch.load('p1/p1.pth')
In:
img, label = test_data[idx]
plt.title(label)
plt.axis("off")
plt.imshow(img.squeeze(), cmap="gray")
plt.show()
Out:
In:
model.eval()
torch.argmax(model(img.reshape(1,1,28,28).to(device)))
Out:
[“tensor(4, device=‘cuda:0’)”]