1
World of Integrated Circuits
Integrated Circuits
Full-Custom
ASICs
Semi-Custom
ASICs
User
Programmable
PLD FPGA
PAL PLA PML LUT
(Look-Up Table)
MUX Gates
2
3
4
5
6
• designs must be sent
for expensive and time
consuming fabrication
in semiconductor foundry
• bought off the shelf
and reconfigured by
designers themselves
Two competing implementation approaches
ASIC
Application Specific
Integrated Circuit
FPGA
Field Programmable
Gate Array
• designed all the way
from behavioral description
to physical layout
• no physical layout design;
design ends with
a bitstream used
to configure a device
7
8
9
10
B0
CPLD Summary
• Constant delay
• Shallow logic
• great for combinatorial logic , but not sequential logic
• less than 5000 gates
• Marginal radiation tolerance due to erasure ~20K
Rads
• Can suffer SEGR during programming
11
Block
RAMs
Block
RAMs
Configurable
Logic
Blocks
I/O
Blocks
What is an FPGA?
Block
RAMs
12
Other FPGA Advantages
• Manufacturing cycle for ASIC is very costly,
lengthy and engages lots of manpower
• Mistakes not detected at design time have
large impact on development time and cost
• FPGAs are perfect for rapid prototyping of
digital circuits
• Easy upgrades like in case of software
• Unique applications
• reconfigurable computing
13
Major FPGA Vendors
SRAM-based FPGAs
• Xilinx, Inc.
• Altera Corp.
• Atmel
• Lattice Semiconductor
Flash & antifuse FPGAs
• Actel Corp.
• Quick Logic Corp.
14
Xilinx
 Primary products: FPGAs and the associated CAD
software
 Main headquarters in San Jose, CA
 Fabless* Semiconductor and Software Company
 UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}
 Seiko Epson (Japan)
 TSMC (Taiwan)
Programmable
Logic Devices ISE Alliance and Foundation
Series Design Software
15
Xilinx FPGA Families
• Old families
• XC3000, XC4000, XC5200
• Old 0.5µm, 0.35µm and 0.25µm technology. Not
recommended for modern designs.
• High-performance families
• Virtex (0.22µm)
• Virtex-E, Virtex-EM (0.18µm)
• Virtex-II, Virtex-II PRO (0.13µm)
• Low Cost Family
• Spartan/XL – derived from XC4000
• Spartan-II – derived from Virtex
• Spartan-IIE – derived from Virtex-E
• Spartan-3
16
17
Basic Spartan-II FPGA Block Diagram
18
F5IN
CIN
CLK
CE
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
BY
SR
S
Carry
&
Control
Logic
SLICE
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
F5IN
BY
SR
S
Carry
&
Control
Logic
CIN
CLK
CE SLICE
CLB Structure
• Each slice has 2 LUT-FF pairs with associated carry logic
• Two 3-state buffers (BUFT) associated with each CLB,
accessible by all CLB outputs
19
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
F5IN
BY
SR
S
Carry
&
Control
Logic
CIN
CLK
CE
SLICE
CLB Slice
20
LUT (Look-Up Table) Functionality
• Look-Up tables
are primary
elements for
logic
implementation
• Each LUT can
implement any
function of 4
inputs
x1 x2 x3 x4
y
x1 x2
y
LUT
x1
x2
x3
x4
y
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
x1 x2 x3 x4
y
x1 x2 x3 x4
y
x1 x2
y
x1 x2
y
LUT
x1
x2
x3
x4
y
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
21
CLB Slice Structure
• Each slice contains two sets of the
following:
• Four-input LUT
• Any 4-input logic function,
• or 16-bit x 1 sync RAM
• or 16-bit shift register
• Carry & Control
• Fast arithmetic logic
• Multiplier logic
• Multiplexer logic
• Storage element
• Latch or flip-flop
• Set and reset
• True or inverted inputs
• Sync. or async. control
22
LUT (Look-Up Table) Functionality
• Look-Up tables
are primary
elements for
logic
implementation
• Each LUT can
implement any
function of 4
inputs
x1 x2 x3 x4
y
x1 x2
y
LUT
x1
x2
x3
x4
y
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
x1 x2 x3 x4
y
x1 x2 x3 x4
y
x1 x2
y
x1 x2
y
LUT
x1
x2
x3
x4
y
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
x1
0
x2 x3 x4
0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
23
RAM16X1S
O
D
WE
WCLK
A0
A1
A2
A3
RAM32X1S
O
D
WE
WCLK
A0
A1
A2
A3
A4
RAM16X2S
O1
D0
WE
WCLK
A0
A1
A2
A3
D1
O0
=
=
LUT
LUT
or
LUT
RAM16X1D
SPO
D
WE
WCLK
A0
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
or
Distributed RAM
• CLB LUT configurable as
Distributed RAM
• A LUT equals 16x1 RAM
• Implements Single and Dual-
Ports
• Cascade LUTs to increase
RAM size
• Synchronous write
• Synchronous/Asynchronous
read
• Accompanying flip-flops used
for synchronous read
24
D Q
CE
D Q
CE
D Q
CE
D Q
CE
LUT
IN
CE
CLK
DEPTH[3:0]
OUT
LUT
=
Shift Register
• Each LUT can be
configured as shift register
• Serial in, serial out
• Dynamically addressable
delay up to 16 cycles
• For programmable
pipeline
• Cascade for greater cycle
delays
• Use CLB flip-flops to add
depth
25
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
F5IN
BY
SR
S
Carry
&
Control
Logic
CIN
CLK
CE
SLICE
Carry & Control Logic
26
 Each CLB contains separate
logic and routing for the fast
generation of sum & carry
signals
• Increases efficiency and
performance of adders,
subtractors, accumulators,
comparators, and counters
 Carry logic is independent of
normal logic and routing
resources
Fast Carry Logic
LSB
MSB
Carry
Logic
Routing
27
Block RAM
Spartan-II
True Dual-Port
Block RAM
Port
A
Port
B
Block RAM
• Most efficient memory implementation
• Dedicated blocks of memory
• Ideal for most memory requirements
• 4 to 14 memory blocks
• 4096 bits per blocks
• Use multiple blocks for larger memories
• Builds both single and true dual-port RAMs
28
Spartan-II Block RAM Amounts
29
Block RAM Port Aspect Ratios
0
4095
1
1023
4
0
1047
2
0
511
8
0
255
16
0
4k x 1
2k x 2 1k x 4
512 x 8
256 x 16
30
Basic I/O Block Structure
D
EC
Q
SR
D
EC
Q
SR
D
EC
Q
SR
Three-State
Control
Output Path
Input Path
Three-State
Output
Clock
Set/Reset
Direct Input
Registered
Input
FF Enable
FF Enable
FF Enable
31
IOB Functionality
• IOB provides interface between the
package pins and CLBs
• Each IOB can work as uni- or bi-directional
I/O
• Outputs can be forced into High Impedance
• Inputs and outputs can be registered
• advised for high-performance I/O
• Inputs can be delayed
32
Routing Resources
PSM PSM
CLB
PSM PSM
CLB CLB
CLB
CLB CLB
CLB
CLB CLB
Programmable
Switch
Matrix
33
Spartan-II FPGA Family Members
34
35
Virtex-II 1.5V Architecture
Configurable
Logic
Block
Block
RAMs
I/O
Block
Multipliers
18
x
18
Block
RAMs
Multipliers
18
x
18
Block
RAMs
Multipliers
18
x
18
Block
RAMs
Multipliers
18
x
18
36
Virtex-II 1.5V
Device CLB
Array
Slices Maximum
I/O
BlockRAM
(18kb)
Multiplier
Blocks
Distributed
RAM bits
XC2V40 8x8 256 88 4 4 8,192
XC2V80 16x8 512 120 8 8 16,384
XC2V250 24x16 1,536 200 24 24 49,152
XC2V500 32x24 3,072 264 32 32 98,304
XC2V1000 40x32 5,120 432 40 40 163,840
XC2V1500 48x40 7,680 528 48 48 245,760
XC2V2000 56x48 10,752 624 56 56 344,064
XC2V3000 64x56 14,336 720 96 96 458,752
XC2V4000 80x72 23,040 912 120 120 737,280
XC2V6000 96x88 33,792 1,104 144 144 1,081,344
XC2V8000 112x104 46,592 1,108 168 168 1,490,944
37
Virtex-II Block SelectRAM
• Virtex-II BRAM is 18 kbits
• Additional “parity” bits
available in selected
configurations
WEA
ENA
SSRA
CLKA
ADDRA[# : 0]
DIA[# : 0]
DOA[# : 0]
WEB
ENB
RSTB
CLKB
ADDRB[# : 0]
DIB[# : 0]
DOB[# : 0]
DIPA[# : 0]
DIPA[# : 0]
DOPA[# : 0]
DOPB[# : 0]
WEA
ENA
SSRA
CLKA
ADDRA[# : 0]
DIA[# : 0]
DOA[# : 0]
WEB
ENB
RSTB
CLKB
ADDRB[# : 0]
DIB[# : 0]
DOB[# : 0]
DIPA[# : 0]
DIPA[# : 0]
DOPA[# : 0]
DOPB[# : 0]
Width Depth Address Data Parity
1 16,386 [13:0] [0] N/A
2 8,192 [12:0] [1:0] N/A
4 4,096 [11:0] [3:0] N/A
9 2,048 [10:0] [7:0] [0]
18 1,024 [9:0] [15:0] [1:0]
36 512 [8:0] [31:0] [3:0]
38
FPGA Nomenclature
39
B0
40
B0
41
B0
42
B0
43
B0
LUT
• Add a flip flop and your done.
George Mason University
FPGA Tools
45
Design process (1)
Design and implement a simple unit permitting to
speed up encryption with RC5-similar cipher with
fixed key set on 8031 microcontroller. Unlike in
the experiment 5, this time your unit has to be able
to perform an encryption algorithm by itself,
executing 32 rounds…..
Library IEEE;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
entity RC5_core is
port(
clock, reset, encr_decr: in std_logic;
data_input: in std_logic_vector(31 downto 0);
data_output: out std_logic_vector(31 downto 0);
out_full: in std_logic;
key_input: in std_logic_vector(31 downto 0);
key_read: out std_logic;
);
end AES_core;
Specification (Lab Experiments)
VHDL description (Your Source Files)
Functional simulation
Post-synthesis simulation
Synthesis
46
Design process (2)
Implementation
Configuration
Timing simulation
On chip testing
47
Simulation Tools
Many others…
48
49
50
Synthesis Tools
… and others
51
Levels of design description
Algorithmic level
Register Transfer Level
Logic (gate) level
Circuit (transistor) level
Physical (layout) level
Level of description
most suitable for synthesis
52
53
Logic Synthesis
VHDL code VHDL simulator
Library of
standard cells
Speed without routing
Area without routing
Netlist
Design Process for ASICs (1)
Functional verification
54
Placing & routing
Netlist
Library of
standard cells
Area with routing
Speed with routing
Layout
Design Process (2)
55
architecture MLU_DATAFLOW of MLU is
signal A1:STD_LOGIC;
signal B1:STD_LOGIC;
signal Y1:STD_LOGIC;
signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;
begin
A1<=A when (NEG_A='0') else
not A;
B1<=B when (NEG_B='0') else
not B;
Y<=Y1 when (NEG_Y='0') else
not Y1;
MUX_0<=A1 and B1;
MUX_1<=A1 or B1;
MUX_2<=A1 xor B1;
MUX_3<=A1 xnor B1;
with (L1 & L0) select
Y1<=MUX_0 when "00",
MUX_1 when "01",
MUX_2 when "10",
MUX_3 when others;
end MLU_DATAFLOW;
VHDL description Circuit netlist
Logic Synthesis
56
Features of synthesis tools
• Interpret RTL code
• Produce synthesized circuit netlist in a
standard EDIF format
• Give preliminary performance estimates
• Some can display circuit schematics
corresponding to EDIF netlist
57
Implementation
• After synthesis the entire implementation
process is performed by FPGA vendor
tools
58
59
Translation
Translation
UCF
NGD
EDIF NCF
Native Generic Database file
Constraint Editor
User Constraint File
Native
Constraint
File
Electronic Design
Interchange Format
Circuit netlist Timing Constraints
Synthesis
60
Sample UCF File
• #
• # Constraints generated by Synplify Pro 7.3.3, Build 039R
• #
• # Period Constraints
• #Begin clock constraints
• #End clock constraints
• # Output Constraints
• # Input Constraints
• # Location Constraints
• # End of generated constraints
• NET "clock" LOC = "P88";
• NET "control(0)" LOC = "P50";
• NET "control(1)" LOC = "P48";
• NET "control(2)" LOC = "P42";
• NET "reset" LOC = "P93";
• NET "segments(0)" LOC = "P67";
• NET "segments(1)" LOC = "P39";
• NET "segments(2)" LOC = "P62";
• NET "segments(3)" LOC = "P60";
• NET "segments(4)" LOC = "P46";
• NET "segments(5)" LOC = "P57";
• NET "segments(6)" LOC = "P49";
61
Pin Assignment
LAB2
CLOCK
CONTROL(0)
CONTROL(2)
CONTROL(1)
RESET
SEGMENTS(0)
SEGMENTS(1)
SEGMENTS(2)
SEGMENTS(3)
SEGMENTS(4)
SEGMENTS(5)
SEGMENTS(6)
P39
P42
P46
P48
P49
P50
P57
P60
P62
P67
P88
P93
FPGA
62
Constraints Editor
63
Circuit netlist
64
Mapping
LUT2
LUT3
LUT4
LUT5
LUT1
FF1
FF2
65
Placing
CLB SLICES
FPGA
66
67
Routing
Programmable Connections
FPGA
68
69
Static Timing Analyzer
• Performs static analysis of the circuit
performance
• Reports critical paths with all sources of
delays
• Determines maximum clock frequency
70
Static Timing Analysis
• Critical Path – The Longest Path From
Outputs of Registers to Inputs of
Registers
D Q
in
clk
D Q
out
tP logic
tCritical = tP FF + tP logic + tS FF
71
Static Timing Analysis
• Min. Clock Period = Length of The
Critical Path
• Max. Clock Frequency = 1 / Min. Clock
Period
72
Configuration
• Once a design is implemented, you must create a
file that the FPGA can understand
• This file is called a bit stream: a BIT file (.bit extension)
• The BIT file can be downloaded directly to the
FPGA, or can be converted into a PROM file
which stores the programming information
73
74
Projects 1, 2
Optimization Criteria
Maximum ratio
Throughput / Circuit Area
or
Minimum product
Latency  Circuit Area
75
76
Primary timing parameters
Latency Throughput
Circuit
Time to
process
a single block
of data
Xi
Yi
Number of bits
processed
in a unit of time
Circuit
Xi
Xi+1
Xi+2
Yi
Yi+1
Yi+2
Throughput =
Block_size · Number_of_blocks_processed_simultaneously
Latency

Introduction to Asic Design and VLSI Design

  • 1.
    1 World of IntegratedCircuits Integrated Circuits Full-Custom ASICs Semi-Custom ASICs User Programmable PLD FPGA PAL PLA PML LUT (Look-Up Table) MUX Gates
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
    6 • designs mustbe sent for expensive and time consuming fabrication in semiconductor foundry • bought off the shelf and reconfigured by designers themselves Two competing implementation approaches ASIC Application Specific Integrated Circuit FPGA Field Programmable Gate Array • designed all the way from behavioral description to physical layout • no physical layout design; design ends with a bitstream used to configure a device
  • 7.
  • 8.
  • 9.
  • 10.
    10 B0 CPLD Summary • Constantdelay • Shallow logic • great for combinatorial logic , but not sequential logic • less than 5000 gates • Marginal radiation tolerance due to erasure ~20K Rads • Can suffer SEGR during programming
  • 11.
  • 12.
    12 Other FPGA Advantages •Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower • Mistakes not detected at design time have large impact on development time and cost • FPGAs are perfect for rapid prototyping of digital circuits • Easy upgrades like in case of software • Unique applications • reconfigurable computing
  • 13.
    13 Major FPGA Vendors SRAM-basedFPGAs • Xilinx, Inc. • Altera Corp. • Atmel • Lattice Semiconductor Flash & antifuse FPGAs • Actel Corp. • Quick Logic Corp.
  • 14.
    14 Xilinx  Primary products:FPGAs and the associated CAD software  Main headquarters in San Jose, CA  Fabless* Semiconductor and Software Company  UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}  Seiko Epson (Japan)  TSMC (Taiwan) Programmable Logic Devices ISE Alliance and Foundation Series Design Software
  • 15.
    15 Xilinx FPGA Families •Old families • XC3000, XC4000, XC5200 • Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. • High-performance families • Virtex (0.22µm) • Virtex-E, Virtex-EM (0.18µm) • Virtex-II, Virtex-II PRO (0.13µm) • Low Cost Family • Spartan/XL – derived from XC4000 • Spartan-II – derived from Virtex • Spartan-IIE – derived from Virtex-E • Spartan-3
  • 16.
  • 17.
  • 18.
    18 F5IN CIN CLK CE COUT D Q CK S R EC D Q CK R EC O G4 G3 G2 G1 Look-Up Table Carry & Control Logic O YB Y F4 F3 F2 F1 XB X Look-Up Table BY SR S Carry & Control Logic SLICE COUT DQ CK S R EC D Q CK R EC O G4 G3 G2 G1 Look-Up Table Carry & Control Logic O YB Y F4 F3 F2 F1 XB X Look-Up Table F5IN BY SR S Carry & Control Logic CIN CLK CE SLICE CLB Structure • Each slice has 2 LUT-FF pairs with associated carry logic • Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs
  • 19.
  • 20.
    20 LUT (Look-Up Table)Functionality • Look-Up tables are primary elements for logic implementation • Each LUT can implement any function of 4 inputs x1 x2 x3 x4 y x1 x2 y LUT x1 x2 x3 x4 y 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 x1 x2 x3 x4 y x1 x2 x3 x4 y x1 x2 y x1 x2 y LUT x1 x2 x3 x4 y 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
  • 21.
    21 CLB Slice Structure •Each slice contains two sets of the following: • Four-input LUT • Any 4-input logic function, • or 16-bit x 1 sync RAM • or 16-bit shift register • Carry & Control • Fast arithmetic logic • Multiplier logic • Multiplexer logic • Storage element • Latch or flip-flop • Set and reset • True or inverted inputs • Sync. or async. control
  • 22.
    22 LUT (Look-Up Table)Functionality • Look-Up tables are primary elements for logic implementation • Each LUT can implement any function of 4 inputs x1 x2 x3 x4 y x1 x2 y LUT x1 x2 x3 x4 y 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 x1 x2 x3 x4 y x1 x2 x3 x4 y x1 x2 y x1 x2 y LUT x1 x2 x3 x4 y 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 x1 0 x2 x3 x4 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 y 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
  • 23.
    23 RAM16X1S O D WE WCLK A0 A1 A2 A3 RAM32X1S O D WE WCLK A0 A1 A2 A3 A4 RAM16X2S O1 D0 WE WCLK A0 A1 A2 A3 D1 O0 = = LUT LUT or LUT RAM16X1D SPO D WE WCLK A0 A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 or Distributed RAM •CLB LUT configurable as Distributed RAM • A LUT equals 16x1 RAM • Implements Single and Dual- Ports • Cascade LUTs to increase RAM size • Synchronous write • Synchronous/Asynchronous read • Accompanying flip-flops used for synchronous read
  • 24.
    24 D Q CE D Q CE DQ CE D Q CE LUT IN CE CLK DEPTH[3:0] OUT LUT = Shift Register • Each LUT can be configured as shift register • Serial in, serial out • Dynamically addressable delay up to 16 cycles • For programmable pipeline • Cascade for greater cycle delays • Use CLB flip-flops to add depth
  • 25.
  • 26.
    26  Each CLBcontains separate logic and routing for the fast generation of sum & carry signals • Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters  Carry logic is independent of normal logic and routing resources Fast Carry Logic LSB MSB Carry Logic Routing
  • 27.
    27 Block RAM Spartan-II True Dual-Port BlockRAM Port A Port B Block RAM • Most efficient memory implementation • Dedicated blocks of memory • Ideal for most memory requirements • 4 to 14 memory blocks • 4096 bits per blocks • Use multiple blocks for larger memories • Builds both single and true dual-port RAMs
  • 28.
  • 29.
    29 Block RAM PortAspect Ratios 0 4095 1 1023 4 0 1047 2 0 511 8 0 255 16 0 4k x 1 2k x 2 1k x 4 512 x 8 256 x 16
  • 30.
    30 Basic I/O BlockStructure D EC Q SR D EC Q SR D EC Q SR Three-State Control Output Path Input Path Three-State Output Clock Set/Reset Direct Input Registered Input FF Enable FF Enable FF Enable
  • 31.
    31 IOB Functionality • IOBprovides interface between the package pins and CLBs • Each IOB can work as uni- or bi-directional I/O • Outputs can be forced into High Impedance • Inputs and outputs can be registered • advised for high-performance I/O • Inputs can be delayed
  • 32.
    32 Routing Resources PSM PSM CLB PSMPSM CLB CLB CLB CLB CLB CLB CLB CLB Programmable Switch Matrix
  • 33.
  • 34.
  • 35.
  • 36.
    36 Virtex-II 1.5V Device CLB Array SlicesMaximum I/O BlockRAM (18kb) Multiplier Blocks Distributed RAM bits XC2V40 8x8 256 88 4 4 8,192 XC2V80 16x8 512 120 8 8 16,384 XC2V250 24x16 1,536 200 24 24 49,152 XC2V500 32x24 3,072 264 32 32 98,304 XC2V1000 40x32 5,120 432 40 40 163,840 XC2V1500 48x40 7,680 528 48 48 245,760 XC2V2000 56x48 10,752 624 56 56 344,064 XC2V3000 64x56 14,336 720 96 96 458,752 XC2V4000 80x72 23,040 912 120 120 737,280 XC2V6000 96x88 33,792 1,104 144 144 1,081,344 XC2V8000 112x104 46,592 1,108 168 168 1,490,944
  • 37.
    37 Virtex-II Block SelectRAM •Virtex-II BRAM is 18 kbits • Additional “parity” bits available in selected configurations WEA ENA SSRA CLKA ADDRA[# : 0] DIA[# : 0] DOA[# : 0] WEB ENB RSTB CLKB ADDRB[# : 0] DIB[# : 0] DOB[# : 0] DIPA[# : 0] DIPA[# : 0] DOPA[# : 0] DOPB[# : 0] WEA ENA SSRA CLKA ADDRA[# : 0] DIA[# : 0] DOA[# : 0] WEB ENB RSTB CLKB ADDRB[# : 0] DIB[# : 0] DOB[# : 0] DIPA[# : 0] DIPA[# : 0] DOPA[# : 0] DOPB[# : 0] Width Depth Address Data Parity 1 16,386 [13:0] [0] N/A 2 8,192 [12:0] [1:0] N/A 4 4,096 [11:0] [3:0] N/A 9 2,048 [10:0] [7:0] [0] 18 1,024 [9:0] [15:0] [1:0] 36 512 [8:0] [31:0] [3:0]
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
    43 B0 LUT • Add aflip flop and your done.
  • 44.
  • 45.
    45 Design process (1) Designand implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds….. Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core; Specification (Lab Experiments) VHDL description (Your Source Files) Functional simulation Post-synthesis simulation Synthesis
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
    51 Levels of designdescription Algorithmic level Register Transfer Level Logic (gate) level Circuit (transistor) level Physical (layout) level Level of description most suitable for synthesis
  • 52.
  • 53.
    53 Logic Synthesis VHDL codeVHDL simulator Library of standard cells Speed without routing Area without routing Netlist Design Process for ASICs (1) Functional verification
  • 54.
    54 Placing & routing Netlist Libraryof standard cells Area with routing Speed with routing Layout Design Process (2)
  • 55.
    55 architecture MLU_DATAFLOW ofMLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW; VHDL description Circuit netlist Logic Synthesis
  • 56.
    56 Features of synthesistools • Interpret RTL code • Produce synthesized circuit netlist in a standard EDIF format • Give preliminary performance estimates • Some can display circuit schematics corresponding to EDIF netlist
  • 57.
    57 Implementation • After synthesisthe entire implementation process is performed by FPGA vendor tools
  • 58.
  • 59.
    59 Translation Translation UCF NGD EDIF NCF Native GenericDatabase file Constraint Editor User Constraint File Native Constraint File Electronic Design Interchange Format Circuit netlist Timing Constraints Synthesis
  • 60.
    60 Sample UCF File •# • # Constraints generated by Synplify Pro 7.3.3, Build 039R • # • # Period Constraints • #Begin clock constraints • #End clock constraints • # Output Constraints • # Input Constraints • # Location Constraints • # End of generated constraints • NET "clock" LOC = "P88"; • NET "control(0)" LOC = "P50"; • NET "control(1)" LOC = "P48"; • NET "control(2)" LOC = "P42"; • NET "reset" LOC = "P93"; • NET "segments(0)" LOC = "P67"; • NET "segments(1)" LOC = "P39"; • NET "segments(2)" LOC = "P62"; • NET "segments(3)" LOC = "P60"; • NET "segments(4)" LOC = "P46"; • NET "segments(5)" LOC = "P57"; • NET "segments(6)" LOC = "P49";
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
    69 Static Timing Analyzer •Performs static analysis of the circuit performance • Reports critical paths with all sources of delays • Determines maximum clock frequency
  • 70.
    70 Static Timing Analysis •Critical Path – The Longest Path From Outputs of Registers to Inputs of Registers D Q in clk D Q out tP logic tCritical = tP FF + tP logic + tS FF
  • 71.
    71 Static Timing Analysis •Min. Clock Period = Length of The Critical Path • Max. Clock Frequency = 1 / Min. Clock Period
  • 72.
    72 Configuration • Once adesign is implemented, you must create a file that the FPGA can understand • This file is called a bit stream: a BIT file (.bit extension) • The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information
  • 73.
  • 74.
    74 Projects 1, 2 OptimizationCriteria Maximum ratio Throughput / Circuit Area or Minimum product Latency  Circuit Area
  • 75.
  • 76.
    76 Primary timing parameters LatencyThroughput Circuit Time to process a single block of data Xi Yi Number of bits processed in a unit of time Circuit Xi Xi+1 Xi+2 Yi Yi+1 Yi+2 Throughput = Block_size · Number_of_blocks_processed_simultaneously Latency