FPGA'99: Advanced processes unleash architectural ideas http://www.eet.com/story/OEG19990223S0021 MONTEREY, Calif. — It may not be obvious that the FPGA industry is in upheaval. But attendees at the FPGA'99 conference here had a front-row seat to watch architectural change, driven on the tide of process advances, begin a sweep through the industry. The key underpinnings of this generation's FPGA architectures have already been undermined, and new structures are already beginning to appear in their place. But indications from conference papers are that the change has only just begun. The driving force behind all this activity is, not surprisingly, process improvement. Where a generation ago FPGA designers were delighted to get a fast CMOS process with three usable metal layers, they are now being offered vastly superior speeds, up to six layers of high-conductivity metal and the ability to route signals over delicate logic structures. These advances have negated — or at least brought up for reexamination — many of the assumed truths of FPGA architecture. The most prominent truth to be questioned is the supremacy of the four-input lookup table (LUT) as the basic element of combinatorial logic. Research by Jonathan Rose and associates at the University of Toronto years ago established that the ideal combination of utilization and performance could be achieved by an FPGA built out of a uniform array of four-input LUTs. But, as presenters from Actel Corp. pointed out in their opening paper, Rose's research assumed an interconnect architecture without hierarchy, and also assumed that it was impossible to route over logic. With current processes, these assumptions are no longer necessary. Technically, assaults on the doctrine of homogeneous architecture began some time ago, with the inclusion of SRAM blocks into FPGAs. But the dominance of the four-input LUT has just in this generation been seriously challenged. The first attack came from Altera Corp., with the incorporation of even more flexible memory blocks into its Apex architecture. In a presentation Monday (Feb.22), Altera pointed out that the Embedded System Blocks (ESBs) in the new architecture served logic as well as memory functions. In common with the Embedded Array Blocks in the Flex architecture, the ESBs can be configured as ROM and used as wide-input LUTs. But in addition, the ESBs can be configured as AND arrays, very similar to the AND arrays that form the basis of Altera's Max CPLD family. While the wide LUT configuration permits single-level implementation of an arbitrary 1-bit function of up to 11 inputs, the AND array configuration can implement up to 16 bits of output functions from 32 inputs. Of course, as in any product-term structure, not all possible combinations of functions can be implemented. Thus the ESBs give Altera a way to accommodate either very dense clusters of combinatorial logic or very wide fan-in functions much more efficiently than they could be handled with four-input LUTs. An entirely different approach to the same end was described by Vantis Corp. In that vendor's latest FPGA family, very fast local interconnect is used to, in effect, cascade three-input LUTs to form four-, five- and six-input LUTs within a local island of logic. This permit's Vantis' design tools to work with, in effect, an heterogeneous array of LUT of varying widths, while the underlying silicon retains the simplicity of an array of homogeneous three-input LUT islands. Actel has carried this idea in a slightly different direction in its new reprogrammable FPGA architecture, which is now sampling. Repeating Rose's original research after removing the assumptions about non-hierarchical routing and no over-the-top routing, the Actel developers concluded that in the latest processes, a basic logic element that included both three-input and two-input LUTs would be more efficiently used, particularly by data path compilation tools. Hence their new architecture, based on islands of logic suspended in a three-layered routing hierarchy, employs basic blocks of one flip-flop, two three-input LUTs and one two-input LUT, sewn together and linked to nearest neighbors by very fast (0.25-ns) local interconnect. These papers have shown that fast local interconnect has changed the ground rules for logic topology. But global interconnect topologies came under as much examination as logic in this year's presentations. As the amount of metal available to designers grows, the question becomes not how to get a minimum of links between logic elements, but how to use additional links. The answer, it appears, will be borrowed from the world of supercomputing. In a poster session, several papers considered the possibility of three-dimensional and even four-dimensional topologies for linking the growing islands of logic in an FPGA. Herman Schmit of Carnegie Mellon University demonstrated that a partially populated four-dimensional interconnect scheme — essentially, a hypercube — behaved much better under intensive routing demands than existing two-dimensional arrays. And a team from the National Chiao Tung University of Taiwan explored the micro-architecture of three-dimensional switching blocks — the all-important connection points between routing elements. Meanwhile, Rose and his associates at Toronto have not been sleeping. In another poster paper, Rose and Vaughn Betz demonstrated that a mixture of pass-transistor-controlled segments and buffered interconnect lines could outperform an interconnect scheme composed exclusively of either pass transistors or buffers. Hence heterogeneity may be coming not only to logic elements, but to interconnect programming elements as well. The bottom line for the architectural papers at this year's conference appears to be simple: everything is up for reconsideration. Heterogeneous architectures are in. Enabled by the relative unimportance of logic real estate on the modern die, logic islands of growing sophistication are in. And complex, hierarchical interconnect schemes are on the way. The next question, mostly skirted by the architecture papers, is the one that stopped heterogeneous architectures in their tracks several years ago. Can tools be developed that can exploit the new heterogeneous structures? Or will the increasingly clear advantages to be gained by more complex FPGA hardware be lost on tool suites still struggling to exploit a field of sparsely-connected four-input LUTs. That question remains on the table. For more technology news, visit http://www.eet.com