A relevant task in the exploration and understanding of large datasets is the discovery of hidden relationships in the data. In particular, functional dependencies have received considerable attention in the past. However, there are other kinds of relationships that are significant both for understanding the data and for performing query optimization. Order dependencies belong to this category. An order dependency states that if a table is ordered on a list of attributes, then it is also ordered on another list of attributes. The discovery of order dependencies has been only recently studied. In this paper, we propose a novel approach for discovering order dependencies in a given dataset. Our approach leverages the observation that discovering order dependencies can be guided by the discovery of a more specific form of dependencies called order compatibility dependencies. We show that our algorithm outperforms existing approaches on real datasets. Furthermore, our algorithm can be parallelized leading to further improvements when it is executed on multiple threads. We present several experiments that illustrate the effectiveness and efficiency of our proposal and discuss our findings. © 2019 Copyright held by the owner/author(s).
Discovering Order Dependencies through Order Compatibility / Consonni, Cristian; Sottovia, Paolo; Montresor, Alberto; Velegrakis, Yannis. - (2019), pp. 409-420. (Intervento presentato al convegno EDBT tenutosi a Lisbon nel March 26-29, 2019) [10.5441/002/edbt.2019.36].
Discovering Order Dependencies through Order Compatibility
Consonni , Cristian;Sottovia , Paolo;Montresor , Alberto;Velegrakis , Yannis
2019-01-01
Abstract
A relevant task in the exploration and understanding of large datasets is the discovery of hidden relationships in the data. In particular, functional dependencies have received considerable attention in the past. However, there are other kinds of relationships that are significant both for understanding the data and for performing query optimization. Order dependencies belong to this category. An order dependency states that if a table is ordered on a list of attributes, then it is also ordered on another list of attributes. The discovery of order dependencies has been only recently studied. In this paper, we propose a novel approach for discovering order dependencies in a given dataset. Our approach leverages the observation that discovering order dependencies can be guided by the discovery of a more specific form of dependencies called order compatibility dependencies. We show that our algorithm outperforms existing approaches on real datasets. Furthermore, our algorithm can be parallelized leading to further improvements when it is executed on multiple threads. We present several experiments that illustrate the effectiveness and efficiency of our proposal and discuss our findings. © 2019 Copyright held by the owner/author(s).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione