Code-Dependent and Architecture-Dependent reliability behaviors

IRIS

The increased need for computing capabilities and higher efficiency have stimulated industries to make available in the market novel architectures with increased complexity. The variety of codes that need to be executed combined with the complexity of novel architectures introduces challenges in the reliability evaluation of computing systems and applications. This paper compares the reliability behaviors of six different architectures (an Intel co-processor, three NVIDIA GPUs, an AMD APU, an embedded ARM) executing eight different codes. To support our evaluation, we present and discuss experimental beam data that covers a total of more than 352,000 years of natural exposure and fault-injection analysis based on a total of more than 120,000 injections. We first quantify both the Silent Data Corruptions and the Detected Unrecoverable Errors rates. Then, we qualify observed errors considering the difference between the corrupted and expected values as well as the portion of the output that has been corrupted. From these analyses, we identify the reliability characteristics which are related to the underlying hardware and the intrinsic behaviors of the executed code. Finally, we discuss the implications of the device- and code-dependent reliability behaviors for approximate computing. We analyze the benefits, in term of reduced error rate, of a relaxed output correctness.

Code-Dependent and Architecture-Dependent reliability behaviors / Fratin, V.; Oliveira, D.; Lunardi, C.; Santos, F.; Rodrigues, G.; Rech, P.. - (2018), pp. 13-26. (Intervento presentato al convegno 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018 tenutosi a lux nel 2018) [10.1109/DSN.2018.00015].

Code-Dependent and Architecture-Dependent reliability behaviors

Fratin V.;Oliveira D.;Lunardi C.;Santos F.;Rodrigues G.;Rech P.

2018-01-01

Abstract

The increased need for computing capabilities and higher efficiency have stimulated industries to make available in the market novel architectures with increased complexity. The variety of codes that need to be executed combined with the complexity of novel architectures introduces challenges in the reliability evaluation of computing systems and applications. This paper compares the reliability behaviors of six different architectures (an Intel co-processor, three NVIDIA GPUs, an AMD APU, an embedded ARM) executing eight different codes. To support our evaluation, we present and discuss experimental beam data that covers a total of more than 352,000 years of natural exposure and fault-injection analysis based on a total of more than 120,000 injections. We first quantify both the Silent Data Corruptions and the Detected Unrecoverable Errors rates. Then, we qualify observed errors considering the difference between the corrupted and expected values as well as the portion of the output that has been corrupted. From these analyses, we identify the reliability characteristics which are related to the underlying hardware and the intrinsic behaviors of the executed code. Finally, we discuss the implications of the device- and code-dependent reliability behaviors for approximate computing. We analyze the benefits, in term of reduced error rate, of a relaxed output correctness.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2018
			
	Titolo del volume (Proceedings title)
	
				Proceedings - 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018
			
	Luogo di edizione (Place of publication)
	
				Stati Uniti
			
	Casa editrice (Publisher)
	
				Institute of Electrical and Electronics Engineers Inc.
			
	ISBN
	
				978-1-5386-5596-2
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85051058986
			
	Codice WOS (WOS identifier)
	
				WOS:000485508200002
			
	Tutti gli autori
	
						Fratin, V.; Oliveira, D.; Lunardi, C.; Santos, F.; Rodrigues, G.; Rech, P.
					
	Citazione
	
				Code-Dependent and Architecture-Dependent reliability behaviors / Fratin, V.; Oliveira, D.; Lunardi, C.; Santos, F.; Rodrigues, G.; Rech, P.. - (2018), pp. 13-26. (Intervento presentato al  convegno 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018 tenutosi a lux nel 2018) [10.1109/DSN.2018.00015].

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/403754

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

39

33

ND

social impact