This paper presents ETSCH1 , a novel paradigm processing large graphs. E T S C H departs from the vertex- based approach of BSP frameworks like PREGEL in two ways: first, the units of computation are not the vertices, but rather a collection of subgraphs, obtained by partitioning the input graph; second, the subgraphs are obtained through an edge-partitioning algorithm, in which edges, rather than vertices, are subdivided into disjoint subsets Global computations over the graph are then easily expressed using classical centralized algorithms executed on each of the partitions, with the only additional burden of specifying simple reconciliation procedures when vertices are replicated in multiple computing nodes. The ETSCH paradigm has been implemented both on top of existing frameworks like HADOOP and SPARK, and as a stand-alone service based on AKKA, a toolkit for building distributed message-driven appli- cations. When considering problems like single-source shortest path and PageRank, our experiments show that solutions based on ETSCH/HADOOP and ETSCH/SPARK already outperform the standard solutions to the same problems in HADOOP and SPARK, respectively. But it is our AKKA implementation that really shines: the execution time on graphs with millions of edges falls down from from thousands of seconds (ETSCH/HADOOP) to tens of seconds (ETSCH/SPARK) to seconds (ETSCH/AKKA), while easily scaling to graphs with billions of edges. ETSCH/AKKA is also faster than other partition-centric frameworks like BLOGEL and GPS.
ETSCH: Partition-centric Graph Processing
Montresor, Alberto;Centellegher, Simone
2016-01-01
Abstract
This paper presents ETSCH1 , a novel paradigm processing large graphs. E T S C H departs from the vertex- based approach of BSP frameworks like PREGEL in two ways: first, the units of computation are not the vertices, but rather a collection of subgraphs, obtained by partitioning the input graph; second, the subgraphs are obtained through an edge-partitioning algorithm, in which edges, rather than vertices, are subdivided into disjoint subsets Global computations over the graph are then easily expressed using classical centralized algorithms executed on each of the partitions, with the only additional burden of specifying simple reconciliation procedures when vertices are replicated in multiple computing nodes. The ETSCH paradigm has been implemented both on top of existing frameworks like HADOOP and SPARK, and as a stand-alone service based on AKKA, a toolkit for building distributed message-driven appli- cations. When considering problems like single-source shortest path and PageRank, our experiments show that solutions based on ETSCH/HADOOP and ETSCH/SPARK already outperform the standard solutions to the same problems in HADOOP and SPARK, respectively. But it is our AKKA implementation that really shines: the execution time on graphs with millions of edges falls down from from thousands of seconds (ETSCH/HADOOP) to tens of seconds (ETSCH/SPARK) to seconds (ETSCH/AKKA), while easily scaling to graphs with billions of edges. ETSCH/AKKA is also faster than other partition-centric frameworks like BLOGEL and GPS.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione