Computer Architecture and VLSI Systems (CARV) Laboratory,
Institute of Computer Science (ICS),
FORTH,
Heraklion, Crete, Greece
© copyright 2006-2012 by FORTH, IEEE, ACM, and Springer
For more information about our Formic board and its use for prototyping large multicore systems, please visit: http://formic-board.com
ABSTRACT: We present the hardware design and implementation of a local memory system for individual processors inside future chip multiprocessors (CMP). Our memory system supports both implicit communication via caches, and explicit communication via directly accessible local ("scratchpad") memories and remote DMA (RDMA). We provide run-time configurability of the SRAM blocks that lie near each processor, so that portions of them operate as 2nd level (local) cache, while the rest operate as scratchpad. We also strive to merge the communication subsystems required by the cache and scratchpad into one integrated Network Interface (NI) and Cache Controller (CC), in order to economize on circuits. The processor interacts with the NI at user-level through virtualized command areas in scratchpad; the NI uses a similar access mechanism to provide efficient support for two hardware synchronization primitives: counters, and queues. We describe the NI design, the hardware cost, and the latencies of our FPGA-based prototype implementation that integrates four MicroBlaze processors, each with 64 KBytes of local SRAM, a crossbar NoC, and a DRAM controller. One-way, end-to-end, user-level communication completes within about 20 clock cycles for short transfer sizes. | |
The prototype includes multiple Xilinx XUPV5 processor boards, containing 4 MicroBlaze cores per board, interconnected via a Xilinx ML325 switch board that contains 3 parallel crossbars, using 3 RocketIO (2.5 Gbps) links per board. |
ABSTRACT: Parallel computing systems are becoming widespread and grow in sophistication. Besides simulation, rapid system prototyping becomes important in designing and evaluating their architecture. We present an efficient FPGA-based platform that we developed and use for research and experimentation on high speed interprocessor communication, network interfaces and interconnects. Our platform supports advanced communication capabilities such as Remote DMA, Remote Queues, zero-copy data delivery and flexible notification mechanisms, as well as link bundling for increased performance. We report on the platform architecture, its design cost, complexity and performance (latency and throughput). We also report our experiences from implementing benchmarking kernels and a user-level benchmark application, and show how software can take advantage of the provided features, but also expose the weaknesses of the system. | |
The prototype includes eight x86 nodes, each with a 10Gbps PCI-X RDMA-capable NIC (DiniGroup Virtex-II Pro boards), interconnected via four Xilinx ML325 switch boards (variable-size buffered crossbars), using four RocketIO (2.5 Gbps) links per node. |
Angelos Bilas, Alex Ramirez, and Georgi Gaydadjiev helped us shape our ideas; we deeply thank them. We also thank, for their participation and assistance: M. Ligerakis, M. Marazakis, M. Papamichael, E. Vlahos, G. Mihelogiannakis, and A. Ioannou.
We also deeply thank the Xilinx University Program for donating to us a number of FPGA chips, boards, and licences for the Xilinx EDA tools.
Up to CARV-ICS-FORTH | Last updated: Apr. 2012, by M. Katevenis. |