Fpga-arch ld Design April2012 33383o

FPGA Architecture

Presentation Overview   

Available choice for digital designer FPGA – A detailed look Interconnection Framework

 FPGAs and LDs 

Field programmability and programming technologies

 SRAM, Anti-fuse, EPROM and EEPROM  

Design steps Commercially available devices

 Xilinx XC4000  Altera MAX 7000

Fixed Versus Programmable Logic 

The circuits in a fixed logic device are permanent, they perform one function or set of functions once manufactured, they cannot be changed.



Programmable logic devices (PLDs) are standard, off-the-shelf parts that offer customers a wide range of logic capacity, features, speed, and voltage characteristics - and these devices can be changed at any time to perform any number of functions.

Classifications  PLA — a Programmable Logic Array (PLA) is a relatively small FPD that contains two levels of logic, an ANDplane and an OR-plane, where both levels are programmable  PAL — a Programmable Array Logic (PAL) is a relatively small FPD that has a programmable AND-plane followed by a fixed OR-plane  SPLD — refers to any type of Simple PLD, usually either a PLA or PAL  LD — a more Complex PLD that consists of an arrangement of multiple SPLD-like blocks on a single chip.  FPGA — a Field-Programmable Gate Array is an FPD featuring a general structure that allows very high logic capacity.

Definitions  Field Programmable Device (FPD): — a general term that refers to any type of integrated circuit used for implementing digital hardware, where the chip can be configured by the end to realize different designs. —Programming of such a device often involves placing the chip into a special programming unit, but some chips can also be configured “in-system”. Another name for FPDs is programmable logic devices (PLDs).

Designer’s Choice 

Digital designer has various options

 SSI (small scale integrated circuits) or MSI (medium scale integrated circuits) components

Difficulties arises as design size increases  Interconnections grow with complexity resulting in a prolonged testing phase 

 Simple programmable logic devices  PALs

(programmable array logic)  PLAs (programmable logic array)  Architecture not scalable; Power consumption and delays play an important role in extending the architecture to complex designs  Implementation of larger designs leads to same difficulty as that of discrete components

Simple Programmable Logic Devices  Simple

two level structure  PAL and PLA

 Allow high speed performance implementations of circuit

 Drawback

 Small logic circuits Modest

number of product Interconnection structure grow impractically large 

With increase in product

 MPGAs

PLA Programmable AND Plane

Programmable OR Plane YZ XZ XYZ XY

X

Y

Z

XY+YZ XZ+XYZ

?

?

PLA Programmable AND Plane

Programmable OR Plane

Programmable Node Un-programmed Connect Disconnect

X X Y

X X Y Y

O1

Y

O2

O3 XY XY

XY

XY

O4

PAL Programmable AND Plane

X

Y

Fix OR Plane

O1

O2

O3

O4

PAL with Logic Expanders Programmable AND Plane Fix OR Plane

?

Logic expanders

PLA v.s. PAL  PLAs are more flexible than PALs since both AND & OR planes are programmable in PLAs.  Because both AND & OR planes are programmable, PLAs are expensive to fabricate and have large propagation delay.  By using fix OR gates, PALs are cheaper and faster than PLAs.  Logic expanders increase the flexibilities of PALs, but result in significant propagation delay.  PALs usually contain D flip-flops connected to the outputs of OR gates to implement sequential circuits.  PLAs and PALs are usually referred to as SPLD.

Programmable Logic   



Programmable digital integrated circuit Standard off-the-shelf parts Desired functionality is implemented by configuring on-chip logic blocks and interconnections Advantages (compared to an ASIC):

 Low development costs  Short development cycle  Device can (usually) be reprogrammed 

Types of programmable logic:

 Complex PLDs (LD)  Field programmable Gate Arrays (FPGA)

A Generic LD Structure

Multiple PLDs can be combined on a single chip by using programmable interconnect structures. These PLDs are called LDs.

LD Architecture and Examples

PLD Sum of Products Programmable AND array followed by fixed fan-in OR gates A

B

C Programmable switch or fuse

f1  A  B  C  A  B  C

f2  A  B  A  B  C

AND plane

PLD - Macrocell Can implement combinational or sequential logic A

B

Select

C

Enable

f1 Flip-flop D

Clock

AND plane

Q

MUX

LD Structure Integration of several PLD blocks with a programmable interconnect on a single chip PLD Block

• • •

• • •

I/O Block

PLD Block

I/O Block

I/O Block

• • •

Interconnection Matrix

I/O Block

• • •

PLD Block

PLD Block

High Density Logic Overview 

High-Density or Complex PLDs

 Large Logic Building Blocks  PLD-Like Architectures  Centralized Interconnect

HDPLD or LD A

C

 Fast Predictable Performance  Good at “Wide Gating” Functions  State

Machines  Counters

B

Altera MAX LD I/O Cell

LAB

LAB

LAB

LAB

LAB

LAB

Chip-wide interconnect

Altera MAX chip

LA (local array)

•••

LAB (Logic Array Block)

Macroccell Each LAB contains 16 macrocells

LD Example - Altera MAX7000

EPM7000 Series Block Diagram

LD Example - Altera MAX7000 



MAX 7000 architecture includes  Logic array blocks  Macrocells  Expander product ( shareable and parallel)  Programmable interconnect array  I/O control blocks Performance  Linking high performance and flexible LABs Programmable interconnect array (PIA)  Global bus fed by all dedicated inputs  I/O pins  Macrocells 



Configured  Combinational or Sequential logic operations


EPM7000 Series Device Macrocell


Macrocell  Logic array  Combinational

logic is implemented in the array  Five products

 Product-term select matrix  Allocate 



product

Primary logic inputs to AND or XOR gates – Combinational logic Secondary inputs to the – Clear, preset, clock, clock enable control functions

 Programmable  Logic expanders  Each 

LAB has16 Shareable expanders

Inverted product fed back into the logic array

 Parallel 

expanders

Product borrowed from adjacent macrocells


ed functions



Macrocell FF 

Configured to get T,D,JK or SR functions 



Software 

Optimize resource utilization 



By global clock signal 







Fastest clock to output performance

By a global clock signal and an active-high clock enable 

Provides an enable on each FF Fast clock to output performance of global clock

By an array clock implemented with a product term 

FF clocked – Signals from buried macrocells – I/O pins

FF s Asynchronous preset and clear functions   



Selecting efficient FF operation for ed function

can be clocked in different modes 



Programmable clock control

Product-term select matrix provides product to control these functions Control signals are active high or can be derived to be below by inverting within the logic array

Device power-up



FF is cleared upon power up


I/O pins

 Fast input path to macrocell bying PIA and combinational logic 

Complex logic functions

 Each macrocell provides five product  Most

logic functions can designed using five product

 Another macrocell can be used to 

supply the required logic resources Expander product  Shareable

and parallel expander product



Provides additional product to any macrocell within the same LAB


EPM7000 Series Device Macrocell

Performance  LDs

have wide fan in

 Single level allows high frequency AND low latency  Very small functions burn logic

Macrocell 68

Logic

Design Methodologies 

Custom ICs are created using unique masks for all layers during the manufacturing process   





Highly skilled and competent designers Lengthy development time High cost of design and testing

Mask Programmable Gate Arrays

  

Generic masks for all layers except metallization Generic Masks  array of modular functional blocks Modules of transistors  rows separated by fixed width chl’s

  

Designer’s expertise  less critical Shorter development time and low development costs Channel-less Gate arrays  Sea-of-Gates

Standard Cells

  

Modules or Standard cells are picked from the database and then placed in rows and interconnected Placement and Routing are done automatically (removing the designers from the physical design process) Designs are less efficient in size and performance

 Gate     



arrays

a highly standardized means to implement digital integrated circuit design manufactured as regular arrays of patterned blocks of transistors which can be interconnected to form logic elements such as gates, flip-flops and multiplexers. Manufacturer can pre-produce gate array wafers without interconnections in highvolume. These are then configured in an additional process step in the factory Once a customer provides a definition of the logic block interconnections, one or more layers of metal are added to form these connections collectively known as MPGAs (Mask- Programmable Gate Arrays)  Sea-of-gates  structures added metal interconnects have to be placed over particular transistors, rendering them unusable  Regular gate arrays  blank routing space is provided at regular intervals in the transistor array As process technologies advance and sizes get smaller, it is becoming increasingly more expensive to configure such devices

Masked Programmable Logic Devices 

MPGA

 Rows of transistors  specified interconnections  Within 

the rows

to implement basic logic gates

 Between 

the rows

To connect basic gates together

 I/O circuitry  Predefined mask layers except final metal layers  Manufacturer

 Metal layers  Customized



to implement desired circuit

MPGA

 MPGA

 Drawback Large

NRE cost

Need to generate metal mask layers  Manufacture the chip 

More

time to market

 Advantage General

structure allows to implement much larger structure 

Due to the their scalable interconnection structure – Scales proportionally with the logic

Field Programmable Logic Devices  FPGAs

 Programmability of PLD  Scalable interconnection structure of an MPGA

Designer’s Choice  Quest

for high capacity; Two choices available

 MPGA (Masked Programmable Logic Devices) Customized during fabrication  Low volume expensive  Prolonged time-to-market and high financial risk 

 FPGA (Field Programmable Logic Devices) Customized by end  Implements multi-level logic function  Fast time to market and low risk 

Designer’s Choice  FPGA s

vs MPGA

 Disadvantages Low 

Programmable switches – Significant resistance and capacitance in the connections between logic blocks

Low 

speed of operation

logic density

Programmable switches and programming circuitry Requires more area over MPGA to implement with the same amount of logic circuitry – Less number of chips per wafer

FPGA s vs MPGA

Standard Cell-based Design

Rows of cells

Feedthrough cell

Logic cell

Routing channel

Functional module (RAM, multiplier,…)

Standard-cell layout methodology

Routing channel requirements are reduced by presence of more interconnect layers

Gate Array — Sea-of-gates rows of uncommitted cells

routing channel

Why FPGAs? 

Advantages of FPGAs     



Replacement of SSI and MSI chips Availability of parts off the shelf Rapid Turnaround Low risk Re programmability

Limitations 





PLDs will operate faster than FPGAs for the same design implemented in both For FPGAs the circuit delay depends on the design implementation tools Less dense and operate at lower speed when compared to conventional Gate Arrays

FPGA – A Quick Look   

Two dimensional array of customizable logic block placed in an interconnect array Like PLDs programmable at s site Like MPGAs, implements thousands of gates of logic in a single device  Employs

logic and interconnect structure capable of implementing multi-level logic  Scalable in proportion with logic removing many of the size limitations of PLD derived two level architecture



FPGAs offer the benefit of both MPGAs and PLDs!

FPGA – A Detailed Look  



Based on the principle of functional completeness FPGA: Functionally complete elements (Logic Blocks) placed in an interconnect framework Interconnection framework



comprises of wire segments and switches 



Provide a means to interconnect logic blocks

Circuits are partitioned to logic block size, mapped and routed

Basic FPGA Architecture

FPGA Architecture 

(With Multiplexer As Functionally Complete Cell) Basic building block

Interconnection Framework  Granularity

and interconnection structure has caused a split in the industry



FPGA – Fine grained – Variable length interconnect segments – Programmable switches – Timing in general is not predictable; Timing extracted after placement and route

Interconnection Framework 

LD – Coarse grained

– – – – –

(SPLD like blocks) Programmable crossbar interconnect structure Interconnect structure uses continuous metal lines The switch matrix may or may not be fully populated Timing predictable if fully populated Architecture does not scale well

Field Programmability 



Field programmability is achieved through switches (Transistors controlled by memory elements or fuses) Switches control the following aspects  Interconnection

among wire segments  Configuration of logic blocks



Distributed memory elements controlling the switches and configuration of logic blocks are together called “Configuration Memory”

Technology of Programmable Elements 



Vary from vendor to vendor. All share the common property: Configurable in one of the two positions – ‘ON’ or ‘OFF’ Can be classified into three categories:

 SRAM based  Fuse based  EPROM/EEPROM/Flash based 

Desired properties:  Minimum

area consumption  Low on resistance; High off resistance  Low parasitic capacitance to the attached wire  Reliability in volume production

SRAM Programming Technology 

 

 

Employs SRAM (Static RAM) cells to control transistors and/or transmission gates SRAM cells control the configuration of logic block as well Volatile  Needs an external storage  Needs a power-on configuration mechanism  In-circuit re-programmable

Lesser configuration time Occupies relatively larger area

Anti-fuse Programming Technology



Though implementation differ, all anti-fuse programming elements share common property

 Uses materials which normally resides in high 

impedance state But can be fused irreversibly into low impedance state by applying high voltage

Anti-fuse Programming Technology  

Very low ON Resistance (Faster implementation of circuits) Limited size of anti-fuse elements; Interconnects occupy relatively lesser area

 Offset : Larger transistors needed for programming 

One Time Programmable

 Cannot be re-programmed  (Design

changes are not possible)

 Retain configuration after power off

EPROM, EEPROM or Flash Based Programming Technology



EPROM Programming Technology  Two gates: Floating and Select  Normal mode: No charge on floating gate  Transistor behaves as normal n-channel transistor 

 Floating gate charged by applying high voltage Threshold of transistor (as seen by gate) increases  Transistor turned off permanently 

 Re-programmable by exposing to UV radiation

EPROM Programming Technology  Used

as pulldown devices  Consumes static power

EPROM Programming Technology  No

external storage mechanism  Re-programmable (Not all!)  Not in-system re-programmable  Re-programming is a time consuming task

EEPROM Programming Technology   



Two gates: Floating and Select Functionally equivalent to EPROM; Construction and structure differ Electrically Erasable: Re-programmable by applying high voltage (No UV radiation expose!) When un-programmed, the threshold (as seen by select gate) is negative!

EEPROM Programming Technology

EEPROM Programming Technology  Re-programmable;

In general, insystem re-programmable  Re-programming consumes lesser time compared to EPROM technology  Multiple voltage sources may be required  Area occupied is twice that of EPROM!

Programming Technologies

Basic architectures of FPGAs 

An FPGA device  to allow the implementation of practically any logic circuit  requires an area trade-off between a sufficient number of flexible configurable logical cells and  enough interconnect resources to allow all connections between these cells. 



majority of circuits a small portion of routing and logic resources,  Resulting in a loss in speed (signal ing through redundant routing elements)  density of logic when compared to the same circuit implemented in dedicated logic. 



grouping of different FPGA devices with related architecture into a family. Each member in a family would be physically tailored to a certain class of application architecture, by for example replacing the switches in certain routes by hard shorts, or hard-wiring the logical cells internally in a certain manner.  This member may now implement certain circuits more efficiently, but its reduced flexibility means that some circuits may not fit at all onto the device. 



Implementation of a circuit is now a question of choosing the right device from the FPGA family.

Programming Skills vs. FPGAs U Model 

Single-threading No synchronization for/if/switch control



Incremental execution One instruction at a time Results are immediate



Common parallelization Large units of work Costly communication

FPGA Model 

Massive parallelism Visible timing relations State machine/hardwired



Pipelined execution All operations active Visible dependencies



Parallelism model Fine grain – one ALU op Cheap on-chip comm.

An Example 

Modulo-4 counter: Specification



Modulo-4 counter: Logic Implementation

FPGA Implementation of Modulo-4 Counter

Design Steps Involved in Deg With FPGAs      

 

Understand and define design requirements Design description Behavioural simulation (Source code interpretation) Synthesis Functional or Gate level simulation Implementation  Fitting  Place and Route Timing or Post layout simulation Programming, Test and Debug

Commercially Available Devices  Architecture

differs from vendor to vendor  Characterized by

 Structure and content of logic block  Structure and content of routing resources  To examine,

devices

look at some of available

 FPGA: Xilinx (XC4000)  LD: Altera (MAX 5K)

Xilinx FPGAs 

Generic Xilinx Architecture



  



Symmetric Array based; Array consists of CLBs with LUTs and D-Flipflops N-input LUTs can implement any n-input boolean function Array embedded within the periphery of IO blocks Array elements interleaved with routing resources (wire segments, switch matrix and single connection points) Employs SRAM technology



What is an FPGA? contain the building blocks necessary to design a custom integrated circuit without having to turn to an outside foundry. logic blocks  Interconnects and  I/O blocks  All of these can be programmed to do a particular function  memory-based (SRAM or flash EEPROM)  anti-fuse A designer needs to develop a special program and have that program ed to the FPGA. FPGAs could be considered more of a software development than a hardware development effort. Intellectual property -IP, placed inside the FPGA, can either be developed by the designer or via a third party. 

  



Design Flow Approaches – Schematic capture - the most intuitive and visual but the least flexible – Hardware Description Language – More portable

Why FPGAs? 

Advantages of FPGAs     



Replacement of SSI and MSI chips Availability of parts off the shelf Rapid Turnaround Low risk Re programmability

Limitations 





PLDs will operate faster than FPGAs for the same design implemented in both For FPGAs the circuit delay depends on the design implementation tools Less dense and operate at lower speed when compared to conventional Gate Arrays



FPGA manufacturers Xilinx (http://www.xilinx.com) SRAMbased FPGAs ( tens of thousands

to millions upon

millions of gates).

Altera

(http://www.altera.com) SRAM based FPGAs Lattice Semiconductor (http:// www.latticesemi.com) Actel (http://www.actel.com) Quick Logic (http://www.quicklogic.com)

Classification by Granularity 

Logic Block size correlates to the granularity of a device which relates to the effort required to complete the wiring between the blocks (routing channels)   

Fine granularity (sea of gates architecture) Medium granularity (FPGA) Large granularity (LD)





Large numbers of relatively simple programmable logic block “islands” embedded in a “sea” of programmable interconnect Fine-grained architecture



Each logic block can be used to implement only a very simple function   



Coarse-grained architecture

  

3- input function or a storage element Glue logic and state machines A large number connections into and out of each block

Underlying FPGA Fabric

Each logic block contains a relatively of more logic Logic block might contain Four 4-input LUTs, four mux’es, four D-latches, and some fast carry logic

Mux vs LUT-based logic blocks  assume that the LUT is formed from SRAM cells (but it could be formed using antifuses, E2PROM, or FLASH cells)

MUX-based logic block

Multiplexer based CLB (configurable logic block) 

Multiplexer based CLB example from Actel 40MK 8-input, 1-output cell implements basic logic functions (and, or, nor, ..) with 2,3, or 4 inputs

LUT based CLB 



  

A commonly used technique is to use the inputs to select the desired SRAM cell using a cascade of transmission gates If a transmission gate is enabled (active), it es the signal seen on its input through to its output. But if the gate is disabled, its output is electrically disconnected from the wire it is driving. 4-input LUTS offer the optimal balance of tradeoffs. recently introduced Virtex-5 family from Xilinx features 6-input LUTs Altera has a fabric that combines two 4LUTs and four 3-LUTs. In addition to allowing designers to form a 6-LUT, this also allows you to make a 5-LUT and a 3LUT, and many other combinations.

Look-up table (LUT) based CLB



LUT based CLB depending on the combination of the input words, a predefined output value is assigned



Memory implementation: input values = address of memory predefined values = content of memory

ed output based CLB



The output of the LUT may be ed or not, depending on the functional description (selection is implemented via multiplexers)

Look up Tables 



Configuration memory holds outputs for truth table Internal signals connect to control signals of multiplexers to select value of truth table for any given input value



Synthesis mode 



Arithmetic mode 



Any logic function of up to 4 variables in its ed or direct form.

The LUT is split to provide any two logic functions of the same 3 variables. In the arithmetic mode, the inputs A, B, C are the addends and the Carry-in, whilst the output functions are the Sum and the Carry-out.

Multiplier mode 

This mode also implements an adder, with the addends this time being partial products and Carry-in from the previous bit position.



The partial product of A and B may be implemented with an AND gate.

CLB with ed output



Counter mode  The LUT provides two logic functions (counter Output and Carryout) of the same 2 variables, which are a Carry-in and the previous Output. The loop to use this output as an input is normally provided for within the CLC; this could also be implemented externally by connecting appropriate routes.



Multiplexer (2:1) mode  The LUT is configured to provide a logic function of 3 variables, where one selects one of the other two inputs. As an example, the case where C is the select line for A and B will be considered

CLB with ed output



Modulo-4 counter: Logic Implementation

An Example 

Modulo-4 counter: Specification

LUT based CLB

Example from Actel Varicore CLC: LUT based

Multiplexer to decrease LUT size ed output via multiplexer selectable

LUT output based CLB Example from Xilinx XC3000: Dual output complex CLB s selectable large combinational function with two outputs



Xilinx logic cell (LC) 

an LC comprises a 4-input LUT (which can also act as a 16 × 1 RAM or a 16-bit shift ), a multiplexer and a 





The can be configured (programmed) to act as a flip-flop or as a latch. The polarity of the clock (rising-edge-triggered or falling-edge-triggered) can be configured, as can the polarity of the clock enable and set/reset signals (active-high or active-low)

Highly-simplified view of a Xilinx logic cell (LC).

the equivalent core "building block" in an FPGA from Altera is called a logic element (LE).

A multi-faceted LUT A "slice" containing two logic cells.



 



what Xilinx call a configurable logic block (CLB) and what Altera refer to as a logic array block (LAB). some Xilinx FPGAs have two slices in each CLB while others have four fast programmable interconnect within the CLB. This interconnect is used to connect neighboring slices.

logic-block hierarchy  LC, then Slice (with two LCs), then





CLB (with four Slices) complemented by an equivalent hierarchy in the interconnect. Thus, there is fast interconnect between the LCs in a slice, then slightly slower interconnect between slices in a CLB, followed by the interconnect between CLBs. The idea is to achieve the optimum tradeoff between making it easy to connect things together without incurring excessive interconnectrelated delays.

A CLB containing four slices



  

all of the LUTs within a CLB can be configured together to implement the following:  Single-port 16 × 8 bit RAM  Single-port 32 × 4 bit RAM  Single-port 64 × 2 bit RAM  Single-port 128 × 1 bit RAM  Dual-port 16 × 4 bit RAM  Dual-port 32 × 2 bit RAM  Dual-port 64 × 1 bit RAM each 4-bit LUT can be used as a 16-bit shift the LUTs within a single CLB to be configured together to implement a shift containing up to 128 bits as required

Fast carry chains

   

A key feature - the special logic and interconnect required to implement fast carry chains. In the context of the CLBs, each logic cell (LC) contains special carry logic. This is complemented by dedicated interconnect between the two LCs in each slice, between the slices in each CLB, and between the CLBs themselves. This special carry logic and dedicated routing boosts the performance of logical functions such as counters and arithmetic functions such as adders. The availability of these fast carry chains – in conjunction with features like the shift incarnations of LUTs and the embedded multipliers.

FPGA families Low-cost

High-performance

Spartan 3 Virtex 4 LX / SX / FX Spartan 3E Virtex 5 LX Xilinx Spartan 3L

Altera

Cyclone II

Stratix II Stratix II GX

Xilinx FPGA Families 





Old families  XC3000, XC4000, XC5200  Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. Low Cost Family  Spartan/XL – derived from XC4000  Spartan-II – derived from Virtex  Spartan-IIE – derived from Virtex-E  Spartan-3, Spartan 3E, Spartan 3L High-performance families  Virtex (220 nm)  Virtex-E, Virtex-EM (180 nm)  Virtex-II, Virtex-II PRO (130 nm)  Virtex-4 (90 nm)  Virtex 5 (65 nm)

Xilinx XC3000 CLB

Granularity of FPGAs

Selection of an FPGA 

an evolution of PAL’s where size is increased by an order of magnitude, or a refinement of mask-programmed gate arrays, where the reprogramming time and cost are drastically reduced



anti-fuse versus reprogrammable configuration, blockstructured versus channel-structured routing, and lookup table (LUT) versus multiplexer versus sum-of-products logic.



the technology and architecture of the routing fabric is the most important factor in determining the effectiveness of an FPGA for a particular application.

Selecting an FPGA 

Size 



I/O pins 



A large FPGA may be able to squeeze in all of your required IP, but the resultant cost might break the project's budget. It may make more sense to only incorporate certain IP into the FPGA and use off-the-shelf components for the rest of the design.

Performance 



A designer needs to know how many pins they must share with the circuit outside of the FPGA. For example, serialization and de-serialization of signals can use up many pins.

Unit price 



FPGA vendors measure density or size in different ways. Nonetheless, a designer will need a ballpark understanding of what type of FPGA product they require.

If fast computations are essential, then higher-performance FPGAs would, of course, become mandatory—and a tradeoff to cost.

Power consumption 

This is critical for applications particularly sensitive to heat dissipation, and for those that require batteries.



FPGA routing enables (almost) arbitrary connection among logic blocks, but at the cost of tying up more area and incurring more delay than present in a mask programmed part. Likewise,



the logic architectures of FPGA’s are larger and slower than mask defined gates, since their functionality must be programmable. But in comparison to the routing fabric, their delays tend to be more predictable and less of a limiting factor.

FPGAs    

Architecture Gate Density Routing Resources Programming method  

Xilinx Logic Cell Array (LCA) Actel Configurable Technology (ACT)

FPGA Capacity comparisons

 

 

Some FPGAs offer dedicated adder blocks. One operation that is very common in DSP applications is called a multiply-and-accumulate (MAC) this function multiplies two numbers together and adds the result into a running total stored in an accumulator.

the majority of designs make use of microprocessors in one form or another. high-end FPGAs contain one or more embedded microprocessors  microprocessor cores

  

The core functions forming a MAC

A hard microprocessor core is one that is implemented as a dedicated, predefined block.





move all of the tasks that used to be performed by the external microprocessor into the internal core makes the board smaller and lighter.

Two main approaches for integrating such a core into the FPGA  locate it in a strip to the side of the main FPGA fabric  embed one or more microprocessor cores directly into the main FPGA fabric

Soft microprocessor cores

  

configure a group of programmable logic blocks to act as a microprocessor simpler (more primitive) and slower only need to implement a core if you need it, and also that you can instantiate as many cores as you require

Bird's-eye view of chip with embedded core outside of the main fabric







Clock trees

All of the synchronous elements inside an FPGA (the s configured to act as flip-flops inside the programmable logic blocks) – need to be driven by a clock signal. Such a clock signal typically originates in the outside world, comes into the FPGA via a special clock input pin, and is then routed through the device and connected to the appropriate s. Clock tree

 

the main clock signal branches again and again the flip-flops can be consider to be the "leaves" on the end of the branches



all of the flip-flops see the clock signal as close together as possible. Skew







 

If the clock was distributed as a single long track driving all of the flip-flops one after another, then the flip-flop closest to the clock pin would see the clock signal much sooner than the one at the end of the chain.

The clock tree is implemented using special tracks and is separate from the generalpurpose programmable interconnect. multiple clock pins multiple clock domains (clock trees)

Clock manager 

daughter clocks may be used to drive internal clock trees or external output pins that can be used to provide clocking services to other devices on the host circuit board. Each family of FPGAs has its own type of clock manager (there may be multiple clock manager blocks in a device), where different clock managers may only a subset of the following features:

 

Jitter removal Frequency synthesis



Phase shifting

Selecting an FPGA 

Size 



I/O pins 



A large FPGA may be able to squeeze in all of your required IP, but the resultant cost might break the project's budget. It may make more sense to only incorporate certain IP into the FPGA and use off-the-shelf components for the rest of the design.

Performance 



A designer needs to know how many pins they must share with the circuit outside of the FPGA. For example, serialization and de-serialization of signals can use up many pins.

Unit price 



FPGA vendors measure density or size in different ways. Nonetheless, a designer will need a ballpark understanding of what type of FPGA product they require.

If fast computations are essential, then higher-performance FPGAs would, of course, become mandatory—and a tradeoff to cost.

Power consumption 

This is critical for applications particularly sensitive to heat dissipation, and for those that require batteries.



Design Entry  Involves capturing the design using a high-level description    



Logic Synthesis  optimizes the circuit by regrouping logic functions and/or removing  



language like Verilog or VHDL. Alternatively a schematic editor is used to enter the design at basic logic level, or by making use of generic blocks which in turn are described by high level languages entry of the design using state diagrams. The CAD software provided by FPGA manufacturers includes libraries of standard circuits or macro-functions to quickly implement common circuits of varying complexity. The schematic or VHDL description are then translated into a netlist describing the circuit in of logic gates and sequential elements.

redundancies. according to design constraints or rules, which could be minimizing area or maximizing velocity. Once the optimized netlist is obtained, it has to be mapped onto the logical cell of the FPGA (LUT / flip-flop, PLA ... ).

Floorplanning

  

The circuit to be designed is now divided into partitions, each of which is adjusted to be implemented in a particular area on a FPGA device. A partition usually corresponds to a large section of the circuit which has a particular functionality, e.g a multiplier, filter bank etc. the total number of FPGA devices required is also determined.

FPGA Design Flow



Place and Route.

    



Layout Verification

 



A logic partition is now mapped onto an FPGA device by means of the placement tool, which assigns a physical place in the array of CLCs to each function (LUT / flip-flop, PLA . ). Typical placement algorithms aim to minimize the total length of the interconnections in the final design, with the objective of maximizing the speed of the device. Routing algorithms configure the routing elements to provide the required connections between logic elements. The primary aim of any routing algorithm is to assure that 100% of the required routes may be realized. Other goals of routing algorithms include finding the shortestpaths possible between elements. Because of restricted interconnection resources, this step is the most restrictive. This step involves extracting the physical layout of the design and simulating it using commercial simulators to obtain timing data and checking design rules (DRC). If the delays associated with the interconnections within the prototype indeed fulfill delay constraints imposed by the design specifications, then the device may be programmed, otherwise the placement and routing steps have to be repeated until a satisfactory configuration is found.

Macro Integration

  

This involves the provision of all the necessary files and data formats for integrating the macro in the design flow of the whole chip. Once the circuit would have been verified, the design configuration is output in a format which is readable as an input to the FPGA device which is to be programmed. The programming of the device could be a question of minutes .

FPGA Design Flow

The Design Cycle 

ASIC Design Methodology 

the design is verified by simulation at each stage of refinement  

 

Accurate simulators are slow Fast simulators trade away simulation accuracy.

ASIC designers use a battery of simulators across the speedaccuracy spectrum in an attempt to the design.

an FPGA designer can replace simulation with in-circuit verification, “simulating” the circuitry in real time with a prototype

 

The path from design to prototype is short, allowing a designer to operation over a wide range of conditions at high speed and high accuracy. proof-of-concept prototype designs easily 



Designs can be verified by trial rather than reduction to first principles or by mental execution. that the design works in the real system, not merely in a potentially-erroneous simulation model of the system.

The Design Cycle 1. 2. 3. 4. 5. 6. 7. 8. 9.

Entering the design in the form of schematic, netlist, logic expressions, or HDLs Simulating the design for functional verification Mapping the design into the FPGA architecture Placing and Routing the FPGA design Extracting delay parameters of the routed design Resimulating for timing verification Generating the FPGA device configuration format Configuring or Programming the device Testing the product for undesirable functional behavior

FPGA Configuration

In the case of re-programmable devices, activation or deactivation of interconnects is implemented by means of transistors or tri-state buffers  Memory units also store the configuration of LUTs and static multiplexers in the CLC. If the type of memory used is EEPROM, the device is non-volatile, but the difficult mechanism of re-configuration imposes limitations on the application of the system.  SRAM memory, on the other hand, loses the configuration once power is removed from the device (volatile), but it is simple and quick to configure. The use of SRAM allows for dynamic re-configuration of the device even during real-time operation.  Small local SRAM blocks may also be used to store several configuration bits. 

FPGA Capacity comparisons

SRAM FPGA -- EEPROM FPGA 



Despite this, however, most FPGAs still use SRAM for reasons of simplicity (when you need to reprogram it, it's easier to reencode a small ROM chip than to reprogram a large FPGA chip), so count on having to use a separate boot ROM for the FPGA. Use of an FPGA is broadly divided into two main stages:  Configuration 

mode

the mode in which the FPGA is when you first power it up. Configuration mode is, as you may have guessed, where you configure the FPGA;

Product – FPGA vs ASIC

Comparison:  FPGA benefits vs ASICs: - Design time: - Cost: - Volume: 

9 month design cycle vs 2-3 years No $3-5 M upfront (NRE) design cost. No $100-500K mask-set cost High initial ASIC cost recovered only in very high volume products

Due to Moore’s law, many ASIC market requirements now met by FPGAs - Eg. Virtex II Pro has 4 processors, 10 Mb memory, IO

Resulting Market Shift:  Dramatic decline in number of ASIC design starts: - 11,000 in ’97 - 1,500 in ’02



FPGAs as a % of Logic market:

- Increase from 10 to 22% in past 3-4 years 

FPGAs (or programmable logic) is the fastest growing segment of the semiconductor industry!!

FPGA/ASIC Crossover Changes

Cost

90nm / 300mm ASICs

SICs A m m 0 0 2 / 150nm s s A A G G P P F F m m m 0 0 m 3 0 / 0 2 m / 0n 9 m n 0 15 FPGA FPGA Cost Advantage Cost Advantage ASIC Cost ASICAdvantage Cost Advantage FPGA Cost Advantage

Production Volume

Fpga-arch ld Design April2012 33383o

Overview 26281t

More details 6y5l6z

Related Documents 3h463d

Fpga-arch ld Design April2012 33383o

ld 3662y

ld 3662y

ld 3662y

ld 2019 5z2e2s

Fpga ld 6j2l2h

More Documents from "Moazzam Hussain" 5n2p23

8251 And 8255 Usart 284o2

Fpga-arch ld Design April2012 33383o

Harry Potter Aur Azkaban Ka Qaidi (sample) 2w1z6l

Jaisi Karni Waisi Bharni 2q2c2j

Factors Affecting Expatriate Adjustment And Performance (1) 555ll

Harry Potter Aur Pars Pathar (smaple) 6z245