Arachne: An Open-Source Framework for Interactive Massive-Scale Graph Analytics

Abstract

Massive-scale analytics is an emerging field that integrates the power of high-performance computing and mathematical modeling to extract key insights and information from data sets that can be as large as petabytes and beyond. Productivity in massive-scale analytics entails quick interpretation of results through easy-to-use systems, while also adhering to design principles that combine high-performance computing and user-friendly simplicity. However, data scientists often encounter challenges, especially with graph analytics, which require the analysis of complex data from various domains, such as the natural and social sciences. To address this issue, we introduce Arachne, a system that enhances accessibility and usability in massive-scale graph analytics. Arachne offers novel algorithms and implementations of graph kernels for efficient data analysis, such as connected components, breadth-first search, triangle counting, k-truss, among others. The algorithms are integrated into a backend server written in Chapel and can be accessed through a Python application programming interface (API). Arachne delivers high performance in the shared-memory versions of its algorithms, and we have assessed its capabilities with the Friendster social network that is comprised of 1,806,067,135 edges and 65,608,366 vertices. Arachne’s backend server is compatible with Linux supercomputers, is easy to set up, and can be utilized through either Python scripts or Jupyter notebooks, which makes it a desirable tool for data scientists who have access to highly performing Linux compute clusters. In this poster we present an overview of the algorithms we have implemented into Arachne and, if applicable, the algorithmic novelties introduced for each of them. We provide results in the format of execution times and discuss the much-needed improvements in communication overheads for our implementations. Further, we discuss improvements to our graph data structure to store extra information such as node labels, edge relationships, and node and edge properties. Arachne is built as an extension to Arkouda and allows for graphs to be generated from Arkouda dataframes. The open-source code for Arachne can be found at https://github.com/Bears-R-Us/arkouda-njit.

Publication
In International Parallel & Distributed Processing Symposium