Upload
shahanianmol
View
907
Download
0
Embed Size (px)
DESCRIPTION
Reliable Network On Chip
Citation preview
RELIABLE NETWORK ON CHIP DESIGN
ANMOL SHAHANI829905938
Why Fault Tolerance?Offers many advantages:
◦Avoids costly packet retransmissions◦Avoids catastrophic data loss◦Can increase chip yield◦Allows higher speed operation
In NoC specifically◦Ensures success of interconnect◦Grows in importance as technology
scales
Fault Classes Transient faults (or soft errors) : Random appearance and
disappearance Alpha particles, Cosmic-ray-induced neutrons etc.
Intermittent faults: appear only under certain conditions like Occur repeatedly at the same location Tend to occur in bursts Replacement of the faulty component removes the fault
Permanent faults (or Hard errors): occur always but may be masked
Static (occurring at manufacture-time) Process Variability (PV), Manufacturing imperfections Dynamic (occurring at run-time,) Electro-Migration (EM), Negative Bias Temperature
Instability (NBTI), Oxide breakdown, Stress-Induced Voiding (SIV), Hot Carrier Injection (HCI), etc.
Making NoC’s ReliableCurrent Methods
T-error tolerant NoC designError Control
◦Error detection and correction codes◦HBH retransmission mechanism
• Reliable task mappingFault tolerant rerouting
Timing error tolerant NoC design
Error correction and detection
Power consumption Analysis
Power consumption of the schemes
Power consumption Observations
The ee-par scheme has higher power consumption than ee-crc and hybrid scheme.
The flit based scheme incurs more power consumption because as the no. of flits per packet increases the useful bits decreases.
The packet buffer requirements impact the power consumption. Hence, as the number of hops increases, the power overhead of ss-flit scheme increases.
HBH Retransmission Scheme
Advantages•Avoids deadlock•Eliminates the need to provide escape channel to the destination node.
Reliability Aware Task Mapping
Fault tolerant route generation
Switch Design to support multipath routing with In order packet delivery
Resilience against NBTI
Fig. adaptive router architecture
ROBUST: SELF HEALING ROUTER
Universal Logic Block Crossbar protection using multiple ULB blocks
Advantages
It has higher silicon protection factor and a higher reliability improvement factor.
Future challenges◦ All the schemes presented to improve the reliability
of the NoC architecture have power overhead associated with them. This increases the power dissipated which can reduce the mean time to failure (MTTF).
◦ All the techniques should be thermal aware in order to prevent the above mentioned phenomena.
◦ Instead of evenly wearing out all cores in MPSoCs, a method should be deigned to self heal failed cores.
◦ Most error resilient schemes today focus primarily on making router, links fault tolerant. There should be some focus on making memories more reliable
Conclusion The ideas presented in this paper make the NoC
architecture resilient to permanent and intermittent errors. To improve the reliability several techniques like t-error tolerant mechanism, self healing router architecture, reliability driven task mapping, deadlock recovery mechanism, error detection and correction schemes are employed. Several techniques make use of redundancy in hardware component which is good in terms of area since because of “dark silicon” it is impossible to turn on every component on the die anyways. However, most techniques increase the power consumption in the NoC architecture which is by far the only drawback in using them. Designing systems to make them resilient to errors is very crucial in exploiting the advantages of using Network on chips.
References [1] M. Yang, T. Li, Y. Jiang, and Y. Yang, “Fault-tolerant routing schemes in RDT(2,2,1)/-based interconnection network for
networks-on-chip designs,” [2] Jacques Henri Collet, Ahmed Louri, Vivek Tulsidas Bhat, Pavan Poluri, “ROBUST: A new Self-healing Fault-Tolerant NoC
Router” [3] Theocharis Theocharides, Luca Benini, Giovanni De Micheli, N. Vijaykrishnan, Mary Jane Irwin, “Analysis of Error
Recovery Schemes for Networks-on-Chips”. [4] Rutuparna Tamhankar, “TERROR: RELIABLE AND EFFICIENT LINK DESIGN FOR NETWORK ON CHIPS” [5] Armin Alaghi, Mahshid Sedghi, Naghmeh Karimi, Mahmood Fathy, Zainalabedin Navabi, “Reliable NoC Architecture
Utilizing a Robust Rerouting Algorithm”. [6] Srinivasan Murali, “METHODOLOGIES FOR RELIABLE AND EFFICIENT DESIGN OF NETWORKS ON CHIPS” [7] Xin Fu1, Tao Li, José A. B. Fortes,” Architecting Reliable Multi-core Network-on-Chip for Small Scale Processing
Technology” [8] Avijit Dutta and Nur A. Touba,” Reliable Network-on-Chip Using a Low Cost Unequal Error Protection Code” [9] Deepthi chamkur .V , Vijayakumar.T, “Reliable Routing & Deadlock free massive NoC Design with Fault Tolerance
based on combinatorial application.”. [10] Luca Benini, Giovanni De Micheli, “Powering Networks on Chips: Energy-efficient and reliable interconnect design
for SoCs”. [11] Haidar M. Harmanani and Rana Farah, “A Method for Efficient Mapping and Reliable Routing for NoC Architectures
with Minimum Bandwidth and Area “. [12] Yin-He Han Hang Lu Lei Zhang, “RevivePath: Resilient Network-on-Chip Design Through Data Path Salvaging of
Router” [13] Anup Das, Akash Kumar and Bharadwaj Veeravalli,“Reliability-Driven Task Mapping for Lifetime Extension of
Networks-on-Chip Based Multiprocessor Systems”. [14] Avijit Dutta and Nur A. Touba, ”Reliable Network-on-Chip Using a Low Cost Unequal Error Protection Code”. [15] Deepthi chamkur .V , Vijayakumar.T,” Reliable Routing & Deadlock free massive NoC Design with Fault Tolerance
based on combinatorial application.” [16] M.H. Neishaburi, Zeljko Zilic,” NISHA: A fault-tolerant NoC router enabling deadlock-free Interconnection of Subnets
in Hierarchical Architectures”. [17] Yu Ren , Leibo Liu , Shouyi Yin , Jie Han , Qinghua Wua, Shaojun Wei, “A fault tolerant NoC architecture using quad-
spare mesh topology and dynamic reconfiguration”. [18] Mehdi Modarressi , Marjan Asadinia , Hamid Sarbazi-Azad,” Using task migration to improve non-contiguous
processor allocation in NoC-based CMPs”. [19] Cristian Grecu, Lorena Anghel, Partha P. Pande, André Ivanov, Resve Saleh,” Essential Fault-Tolerance Metrics for
NoC Infrastructures”. [20] Young Hoon Kang, Taek-Jun Kwon, Jeffrey Draper,” Fault-Tolerant Flow Control in On-Chip Networks”.