Review of "PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs"
Large scale graph structured computation is useful for cases from targeted advertising to natural language processing, thus several graph-parallel abstractions including Pregel and GraphLab has been developed. However, these systems ignored the fact that natural graphs commonly found in real-world have highly skewed power-law degree distributions (a small fraction of the vertices are connected to most part of the graph). This paper proposes PowerGraph which exploits the internal structure of graph programs to address the challenges of computing on natural graphs. By leveraging PowerGraph abstraction, it introduces a new approach to distributed graph placement and representation that exploits the structure of power-lay graphs. It explicitly factors computation over edges instead of vertices. As a consequence it has substantially greater parallelism, reduces network communication and storage cost and provides highly effective approach to distributed graph placement.
A graph-parallel abstraction consists of sparse graph G = {V, E} and a vertex-program Q which is executed in parallel on each vertex v and can interact through shared state or messages with neighboring instances. To address the challenges of computation on power-law graphs, PowerGraph eliminates the degree dependence of the vertex program by directly exploiting the GAS decomposition to factor vertex program over edges. By lifting the Gather and Scatter phases into the abstraction, PowerGraph is able to retain the natural "think-like-a-vertex" philosophy while distributing the computation of a single vertex program over the entire cluster.
Will this paper be influential in 10 years? Yes, I think so. I like the fact that it's designed on a real-world data pattern – i.e. considered the power-law distribution and really performed well.