Preserving individual privacy is one of the major issues in the context of Big Data, since handling huge volumes of data may contribute to the disclosure of sensitive or personally identifiable information. In fact, even when data is anonymized there is a risk of re-identification through privacy attacks. This paper presents a re-identification risk-based anonymization framework for big data analytics platforms. This framework is based on anonymization policies and allows applying anonymization techniques and models in two stages: during the ETL process and before exporting the statistical results of data analytics. This second stage evaluates the data re-identification risk and increases the anonymity level if it is necessary to reduce this risk. Although generic, the implementation of the framework reported in this work was integrated into Ophidia as a case study. Privacy attacks were performed to check the effectiveness of the re-identification process. Results are promising, showing a low probability of re-identification in two different scenarios.
A Re-Identification Risk-Based Anonymization Framework for Data Analytics Platforms / Silva, H.; Basso, T.; Moraes, R.; Elia, D.; Fiore, S.. - (2018), pp. 101-106. (Intervento presentato al convegno 14th European Dependable Computing Conference, EDCC 2018 tenutosi a Romania nel 2018) [10.1109/EDCC.2018.00026].
A Re-Identification Risk-Based Anonymization Framework for Data Analytics Platforms
Fiore S.
2018-01-01
Abstract
Preserving individual privacy is one of the major issues in the context of Big Data, since handling huge volumes of data may contribute to the disclosure of sensitive or personally identifiable information. In fact, even when data is anonymized there is a risk of re-identification through privacy attacks. This paper presents a re-identification risk-based anonymization framework for big data analytics platforms. This framework is based on anonymization policies and allows applying anonymization techniques and models in two stages: during the ETL process and before exporting the statistical results of data analytics. This second stage evaluates the data re-identification risk and increases the anonymity level if it is necessary to reduce this risk. Although generic, the implementation of the framework reported in this work was integrated into Ophidia as a case study. Privacy attacks were performed to check the effectiveness of the re-identification process. Results are promising, showing a low probability of re-identification in two different scenarios.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione