Transient faults are a major problem for large scale HPC systems, and the mitigation of adverse fault effects need to be highly efficient as we approach exascale. We developed a fault injection tool (CAROL-FI) to identify the potential sources of adverse fault effects. With a deeper understanding of such effects, we provide useful insights to design efficient mitigation techniques, like selective hardening of critical portions of the code. We performed a fault injection campaign injecting more than 67, 000 faults into an Intel Xeon Phi executing six repre-sentative HPC programs. We show that selective hardening can be successfully applied to DGEMM and Hotspot while LavaMD and NW may require a complete code hardening.
CAROL-FI: An efficient fault-injection tool for vulnerability evaluation of modern HPC parallel accelerators / Oliveira, D.; Frattin, V.; Navaux, P.; Koren, I.; Rech, P.. - (2017), pp. 295-298. ((Intervento presentato al convegno 14th ACM International Conference on Computing Frontiers, CF 2017 tenutosi a University of Siena, ita nel 2017 [10.1145/3075564.3075598].
Scheda prodotto non validato
I dati visualizzati non sono stati ancora sottoposti a validazione formale da parte dello Staff di IRIS, ma sono stati ugualmente trasmessi al Sito Docente Cineca (Loginmiur).
Titolo: | CAROL-FI: An efficient fault-injection tool for vulnerability evaluation of modern HPC parallel accelerators | |
Autori: | Oliveira, D.; Frattin, V.; Navaux, P.; Koren, I.; Rech, P. | |
Autori Unitn: | ||
Titolo del volume contenente il saggio: | ACM International Conference on Computing Frontiers 2017, CF 2017 | |
Luogo di edizione: | usa | |
Casa editrice: | Association for Computing Machinery, Inc | |
Anno di pubblicazione: | 2017 | |
Codice identificativo Scopus: | 2-s2.0-85027044968 | |
ISBN: | 9781450344876 | |
Handle: | http://hdl.handle.net/11572/346639 | |
Citazione: | CAROL-FI: An efficient fault-injection tool for vulnerability evaluation of modern HPC parallel accelerators / Oliveira, D.; Frattin, V.; Navaux, P.; Koren, I.; Rech, P.. - (2017), pp. 295-298. ((Intervento presentato al convegno 14th ACM International Conference on Computing Frontiers, CF 2017 tenutosi a University of Siena, ita nel 2017 [10.1145/3075564.3075598]. | |
Appare nelle tipologie: | 04.1 Saggio in atti di convegno (Paper in Proceedings) |