verilog 超前进位加法器（carry-look-ahead-adder）

1. 数据流建模实现4位加法器

1.1 代码

1.2 资源占用结果

1.3 RTL综合

2. 结构建模（门）实现4位超前进位加法器

2.1 代码

2.2 资源占用结果

2.3 RTL综合

3. 行为级建模实现4位串行加法器

3.1 代码

3.2 资源占用结果

3.3 RTL综合

总结

与普通加法器相比，增加了超前进位逻辑，减少了由于逐位进位信号的传递所造成的延迟。参考下图4位加法器（来自夏宇闻老师：数字逻辑设计）

串行加法器，是一位加法器结束后的进位与下一位加法器进行计算，依次执行，这样每一位计算才能计算下一位，使得整个运算延时很长。而超前进位加法器通过进位逻辑运算，不依靠之间的进位即可直接通过输入的被加数以及初始进位值即可得到最终的输出进位结果。

1. 数据流建模实现4位加法器

直接C=X+Y，这样quartus直接按照超前进位加法器进行中和，所占用的逻辑资源最少。

1.1 代码

module block(input [3 : 0] X, Y,output [3: 0] sum,output C);
assign {C, sum } = X + Y;
endmodule// testbench file
`timescale 1ns/1ns 
`define clk_period 10       //100M
module test_adder_tb();reg clk;
reg [3:0]X,Y;
wire [3:0]sum;
wire C;block block1(.X(X),.Y(Y),.sum(sum),.C(C));
initial clk=1'b1;
always #(`clk_period/2) clk=~clk;initial beginX=4'd5;Y=4'd6;#(`clk_period*10);X=4'd6;Y=4'd7;#(`clk_period*10);$stop;end
endmodule

1.2 资源占用结果

1.3 RTL综合

2. 结构建模（门）实现4位超前进位加法器

2.1 代码

module ahead_adder_4bit(input [3 : 0] X, Y,input Ci,output wire[3: 0] sum,output wire Co);
wire [3:0]P,G,C;
assign P= X^Y;
assign G=X&Y;
ahead_gene_circuit ahead_gene_circuit(P,G,Ci,C);
assign Co=C[3];
assign sum[3]=C[2]^P[3];
assign sum[2]=C[1]^P[2];
assign sum[1]=C[0]^P[1];
assign sum[0]=Ci^P[0];endmodulemodule ahead_gene_circuit(input [3:0]P,G,input Ci,output [3:0]C);	
assign C[0]=G[0]|(P[0]&Ci);
assign C[1]=G[1]|(P[1]&C[0]);
assign C[2]=G[2]|(P[2]&C[1]);
assign C[3]=G[3]|(P[3]&C[2]);endmodule

2.2 资源占用结果

2.3 RTL综合

3. 行为级建模实现4位串行加法器

通过真值表的形式建模一位加法器，然后依次串行4位，得到4位串行加法器，这样占用的资源会多。

3.1 代码

module full_adder(
input A,B,Ci,
output reg S,Co);always@(*)
begincase({A,B,Ci})3'b000:begin S=0;Co=0; end3'b001:begin S=1;Co=0; end3'b010:begin S=1;Co=0; end3'b011:begin S=0;Co=1; end3'b100:begin S=1;Co=0; end3'b101:begin S=0;Co=1; end3'b110:begin S=0;Co=1; end3'b111:begin S=1;Co=1; endendcase
end
endmodulemodule _4adder(
input [3:0] A,B,
input Ci,
output [3:0] S,
output Co);wire C0,C1,C2;
full_adder U0(A[0],B[0],Ci,S[0],C0);
full_adder U1(A[1],B[1],C0,S[1],C1);
full_adder U2(A[2],B[2],C1,S[2],C2);
full_adder U3(A[3],B[3],C2,S[3],Co);endmodule

3.2 资源占用结果

3.3 RTL综合

总结：

1. 仿真发现，输出并没有延时~~~；还是说modelsim使能仿真出逻辑时序，并没有办法仿真出加法器的延时？？？？？

2. 在代码种直接+计算和通过门运算得到的两种代码，综合和占用的资源是不同的。

3. 关于加法器的延迟可以参考：(6条消息) 夏宇闻复习笔记第10章：简单的组合逻辑模块&加法器&乘法器_coin的博客-CSDN博客。其中很形象的说明了4位全加器的总延迟相当于第一位全加器里三个门延迟再加上剩下3位里的各2各门延迟，即4位全加器的总延迟一共为9个门延迟。参考下图：

在这里插入图片描述

而4位超前进位加法器的总延迟只有4个门延迟，因为引入了中间的生成信号G和传播信号P，使得总的表达式（如下）P和G可以同步生成，输出进位信号只需要3个门，再加上求和结果一个门延迟只需要4个门延迟即可完成运算。参考下图：

4. 16位超前进位加法器代码如下：

module ahead_adder_16bit(
input [15:0]A,
input [15:0]B,
input CIN,
output reg [15:0]S,
output reg cout 
);wire [15:0]G=A&B;
wire [15:0]P=A|B;
reg [3:0]cin;
wire cout1;task ahead_adder4;
input cin;
input [3:0]A;
input [3:0]B;
input [3:0]G;
input [3:0]P;
output  reg [3:0]S;
output  reg cout;
reg [3:0]C;
begin
C[0]= G[0] | (cin&P[0]);
C[1]= G[1] | (P[1]&G[0]) | (P[1]&P[0]&cin);
C[2]= G[2] | (P[2]&G[1]) | (P[2]&P[1]&G[0]) | (P[2]&P[1]&P[0]&cin);
C[3]= G[3] | (P[3]&G[2]) | (P[3]&P[2]&G[1]) | (P[3]&P[2]&P[1]&G[0]) | (P[3]&P[2]&P[1]&P[0]&cin);S[0]=A[0]^B[0]^cin;
S[1]=A[1]^B[1]^C[0];
S[2]=A[2]^B[2]^C[1];
S[3]=A[3]^B[3]^C[2];
cout=C[3];
end
endtask
task ahead_carry;input cin;
input [15:0]G;
input [15:0]P;
output reg [3:0]cout;
reg [3:0]G2;
reg [3:0]P2;
beginG2[0]=G[3] | P[3]&G[2] | P[3]&P[2]&G[1] | P[3]&P[2]&P[1]&G[0];
G2[1]=G[7] | P[7]&G[6] | P[7]&P[6]&G[5] | P[7]&P[6]&P[5]&G[4];
G2[2]=G[11] | P[11]&G[10] | P[11]&P[10]&G[9] | P[11]&P[10]&P[9]&G[8];
G2[3]=G[15] | P[15]&G[14] | P[15]&P[14]&G[13] | P[15]&P[14]&P[13]&G[12];P2[0]=P[3]&P[2]&P[1]&P[0];
P2[1]=P[7]&P[6]&P[5]&P[4];
P2[2]=P[11]&P[10]&P[9]&P[8];
P2[3]=P[15]&P[14]&P[13]&P[12];cout[0]=G2[0] | (cin&P2[0]);
cout[1]=G2[1] | (P2[1]&G2[0]) | (P2[1]&P2[0]&cin);
cout[2]=G2[2] | (P2[2]&G2[1]) | (P2[2]&P2[1]&G2[0]) | (P2[2]&P2[1]&P2[0]&cin);
cout[3]=G2[3] | (P2[3]&G2[2]) | (P2[3]&P2[2]&G2[1]) | (P2[3]&P2[2]&P2[1]&G2[0]) | (P2[3]&P2[2]&P2[1]&P2[0]&cin);
end
endtaskalways@(*)
beginahead_carry(CIN,G[15:0],P[15:0],cin[3:0]);ahead_adder4 (CIN,A[3:0],B[3:0],G[3:0],P[3:0],S[3:0],cout1);//因为进位值实际上已经被算出来了，所以这个cout1就没有实际意义ahead_adder4 (cin[0],A[7:4],B[7:4],G[7:4],P[7:4],S[7:4],cout1);ahead_adder4 (cin[1],A[11:8],B[11:8],G[11:8],P[11:8],S[11:8],cout1);ahead_adder4 (cin[2],A[15:12],B[15:12],G[15:12],P[15:12],S[15:12],cout);//但是这个cout有实际意义，因为超前进位算出的是四个adder的低位进位值，没有算最后一个的高位进位，所以要保留
end
endmodule

这样写出的加法器在综合时是没有延时的，综合器会自动根据源代码综合成典型的加法器电路结构。若要求速度高时，综合器还会根据运算速度加入流水线结构。因为在资源工具的库中存在很多种基本的电路结构，通过编译分析，自动选择一种。因此，我们在逻辑完成后，还需要进行布局布线来约束电路的延迟，然后再仿真。根据实际的延迟，才可以确定该运算的最高频率。