On-Chip Network Designs for Many-Core Computational Platforms

Anh T. Tran
PhD Dissertation
VLSI Computation Laboratory
Department of Electrical and Computer Engineering
University of California, Davis
Technical Report ECE-VCL-2012-3, VLSI Computation Laboratory, University of California, Davis, 2012.

Abstract:

Processor designers have been utilizing more processing elements (PEs) on a single chip to make efficient use of technology scaling and also to speed up system performance through increased parallelism. Networks on-chip (NoCs) have been shown to be promising for scalable interconnection of large numbers of PEs in comparison to structures such as point-to-point interconnects or global buses. This dissertation investigates the designs of on-chip interconnection networks for many-core computational platforms in three application domains: high-performance network designs for applications with high communication bandwidths; low-cost networks for applicationspecific low-bandwidth dynamic traffic; and reconfigurable networks for platforms targeting digital signal processing (DSP) applications which have deterministic inter-task communication characteristics.

An on-chip router architecture named RoShaQ is proposed for platforms executing general-purpose applications with dynamic and high communication bandwidths. RoShaQ maximizes buffer utilization by allowing sharing of multiple buffer queues among input ports hence achieves high network performance. Experimental results show that RoShaQ is 17.2% lower latency, 18.2% higher saturation throughput and 8.3% lower energy dissipated per bit than state-of-the-art virtual-channel routers given the same buffer capacity averaged over a broad range of traffic patterns.

For mapping applications showing low inter-task communication bandwidths, five lowcost bufferless routers are proposed. All routers guarantee in-order packet delivery so that expensive reordering buffers are not required. The proposed bufferless routers have lower costs and higher performance per unit cost than all buffered wormhole routers -- the smallest proposed bufferless router has 32.4% less area, 24.5% higher throughput, 29.5% lower latency, 10.0% lower power and 26.5% lower energy per bit than the smallest buffered router.

A globally asynchronous locally synchronous (GALS)-compatible reconfigurable circuit-switched on-chip network is proposed for use in many-core platforms targeting streaming DSP and embedded applications which show deterministic inter-task communication traffic. Inter-processor communication is achieved through a simple yet effective source-synchronous technique which can sustain the ideal throughput of one word per cycle and the ideal latency approaching the wire delay. This network was utilized in a GALS many-core chip fabricated in 65 nm CMOS. For evaluating the efficiency of this platform, a complete IEEE 802.11a baseband receiver was implemented. The receiver achieves a real-time throughput of 54 Mbps and consumes 174.8 mW with only 12.2 mW (7.0%) dissipated by its interconnects.

A highly parameterizable NoC simulator named NoCTweak is also proposed for early exploration of performance and energy efficiency of on-chip networks. The simulator has been developed in SystemC, a C++ plugin, which allows fast modeling of concurrent hardware modules at the cycle-level accuracy. Area, timing and power of router components are post-layout data based on a 65 nm CMOS standard-cell library. NoCTweak was used in many experiments reported in this dissertation.

Dissertation Copy

Reference

Anh T. Tran, "On-Chip Network Designs for Many-Core Computational Platforms," Ph.D Dissertation, Technical Report ECE-VCL-2012-3, VLSI Computation Laboratory, ECE Department, University of California, Davis, 2012.

BibTeX entry

@phdthesis{atran:vcl:phdthesis,
   author      = {Anh T. Tran},
   title       = {On-Chip Network Designs for Many-Core Computational Platforms},
   school      = {University of California},
   year        = 2012,
   address     = {Davis, CA, USA},
   month       = Aug,
   note        = {\url{http://www.ece.ucdavis.edu/vcl/pubs/theses/2012-3/}}
   }

VCL Lab | ECE Dept. | UC Davis