CNN和LSTM的计算复杂度分析

前言：今天做边缘计算的时候，在评估模型性能的时候发现NPU计算的大部分时间都花在了LSTM上，使用的是Bi-LSTM（耗时占比98%），CNN耗时很短，不禁会思考为什么LSTM会花费这么久时间。

首先声明一下实验条件：这里使用的是振动信号，输入的数据，长度是1024，通道是1通道输入，batchsize也是1

一、CNN计算复杂度公式：

卷积核大小为 K x K，输入通道数为 C_in，输出通道数为 C_out，输入大小为 W x H

卷积操作的复杂度： O(K*K * C_in * C_out * W * H)

举个例子：我的第一个卷积层input：1channel，output：32channels，卷积核大小是1*3，为了保持输入数据长度和输出数据长度保持不变，padding=（k-1）/2=1

输入数据格式：1*1*1024（batchsize、channel、len）

输入数据格式: 1*32*1024

计算复杂度：1*32*3*1024

二、LSTM计算复杂度公式：

假设 LSTM 的隐藏层大小为 H，输入大小为 I，时间步数为 T：

每个时间步的计算复杂度为 O(I * H + H^2)（包括矩阵乘法和激活函数）。

LSTM计算复杂度为 O(T * (I * H + H*H))

举个例子：输入大小是指上一层CNN输出的通道数128，隐藏层大小设置为128，时间步数就是数据长度：128

复杂度为：128*(128*128+128*128)=4194304

计算比例：4194304%（32*3*1024）=43%

因为这个是双层lstm：43*2=86符合预期，在实际计算中LSTM花费的时间更长，我估计是NPU对CNN结构的计算优化更好吧，下面是网络的完整结构

Layer: CNN_LSTM_ModelInput shapes: [torch.Size([32, 1, 1024])]Output shape: torch.Size([32, 10])
Layer: Conv1dInput shapes: [torch.Size([32, 1, 1024])]Output shape: torch.Size([32, 32, 1024])
Layer: ReLUInput shapes: [torch.Size([32, 32, 1024])]Output shape: torch.Size([32, 32, 1024])
Layer: Conv1dInput shapes: [torch.Size([32, 32, 1024])]Output shape: torch.Size([32, 32, 1024])
Layer: ReLUInput shapes: [torch.Size([32, 32, 1024])]Output shape: torch.Size([32, 32, 1024])
Layer: MaxPool1dInput shapes: [torch.Size([32, 32, 1024])]Output shape: torch.Size([32, 32, 512])
Layer: Conv1dInput shapes: [torch.Size([32, 32, 512])]Output shape: torch.Size([32, 64, 512])
Layer: ReLUInput shapes: [torch.Size([32, 64, 512])]Output shape: torch.Size([32, 64, 512])
Layer: MaxPool1dInput shapes: [torch.Size([32, 64, 512])]Output shape: torch.Size([32, 64, 256])
Layer: Conv1dInput shapes: [torch.Size([32, 64, 256])]Output shape: torch.Size([32, 128, 256])
Layer: ReLUInput shapes: [torch.Size([32, 128, 256])]Output shape: torch.Size([32, 128, 256])
Layer: MaxPool1dInput shapes: [torch.Size([32, 128, 256])]Output shape: torch.Size([32, 128, 128])
Layer: SequentialInput shapes: [torch.Size([32, 1, 1024])]Output shape: torch.Size([32, 128, 128])
Layer: LSTMInput shapes: [torch.Size([32, 128, 128]), <class 'tuple'>]Output shapes: [torch.Size([32, 128, 256]), <class 'tuple'>]
Layer: LinearInput shapes: [torch.Size([32, 128, 256])]Output shape: torch.Size([32, 128, 256])
Layer: AttentionInput shapes: [torch.Size([32, 128]), torch.Size([32, 128, 256])]Output shape: torch.Size([32, 1, 128])
Layer: LayerNormInput shapes: [torch.Size([32, 256])]Output shape: torch.Size([32, 256])
Layer: ResidualConnectionInput shapes: [torch.Size([32, 256]), <class 'function'>]Output shape: torch.Size([32, 256])
Layer: LinearInput shapes: [torch.Size([32, 256])]Output shape: torch.Size([32, 500])
Layer: ReLUInput shapes: [torch.Size([32, 500])]Output shape: torch.Size([32, 500])
Layer: DropoutInput shapes: [torch.Size([32, 500])]Output shape: torch.Size([32, 500])
Layer: LinearInput shapes: [torch.Size([32, 500])]Output shape: torch.Size([32, 10])
Layer: SequentialInput shapes: [torch.Size([32, 256])]Output shape: torch.Size([32, 10])