



#### On-Die Interconnects for next generation CMPs

Partha Kundu Corporate Technology Group (MTL) Intel Corporation

OCIN Workshop, Stanford University December 6, 2006

# **Multi-Core Transition Accelerating**



#### What will we do with this Compute Power?



#### Emerging 'Killer' Applications The RMS Suite

<u>Source :</u> "Cool Codes for Hot Chips" Keynote by Justin Rattner, CTO, Intel, Aug. 2006



### **Tera-Scale Prototype**



<u>Source</u>: "Cool Codes for Hot Chips" Keynote by Justin Rattner, CTO, Intel, Aug. 2006



## **Overview of Talk**

- Establish Importance of On-die Interconnects
- Walk through Case Study of a router design
- Evaluate against Goals
- Conclusions



#### **iRMS Data Size estimates**



\* Data collected on complete application run on a hardware cache emulator





No data replication All data goes over on-die interconnect



Possible data replication primarily dirty blocks go over on-die interconnect

High On-Die B/W

Low On-Die B/W

High off-die B/W



sharing exists in some of the RMS kernels



Manage Off-Die bandwidth via better On-Die Network



# **Need for Scalability**





# **Need for Scalability**





## **Case Study of a Router**



# 5-port Switch (overview)



• Maximize throughput of network

# **Double-pumped Crossbar**



Source : Vangal et al "A six-port 57GB/s double pumped nonblocking router core" Sym. On VLSI Circuits, June 2005



#### **Buffer Management**





#### **Buffer Management**





All Traffic Classes allowed to go to any VC





Statically Assigned Buffers SAMQ with simple (VCT) flow control



#### **Buffer Management**



#### **Switch Allocator**



**Proprietary Switch Allocator** achieves high matching efficiency

- Need to generate 4 requests per cycle
- Adapts to load conditions using heuristic

Achieve High Throughput @ manageable latency



# **Pipeline Design**





# **Pipeline Design**

 $\wedge$ 







Buffer Read







 Choose Pipeline frequency to Maximize Switching rate

• Optimize for load conditions

(Non-Speculative) Request Set Drossbar No-Load Pipeline Dp Traversal



## **Power Challenges for ODI**



- Increased Soft Error and Process Variability impacts design
  - design to detect and/or correct errors (latency, bandwidth impact)
  - routing for fault tolerance
- Clocking power is high (16%)
  With wide links cost of GALS approaches may be higher



#### Conclusions

- Scalable High Performance on-die interconnect would be required in future CMPs
- We do achieve high network throughput Many of the techniques are borrowed from previous research
- But significant challenge is to fit within power and area



#### <u>Co-Leads</u>: Jay Jayasimha, Yatin Hoskote

Aniruddha Vaidya, Sriram Vangal, Arvind P. Singh, Chris Hughes, Y-K Chen, Ioannis Schoinas, Akhilesh Kumar, Sailesh Kottapalli, Jeffrey Chamberlain, Li-Shiuan Peh, Amit Kumar, Niraj Jha

