Project Team Members:
Northwestern University
Parallel MEP (PMEP)
Description:
Detecting communities is of great importance in sociology, biology and computer science disciplines where systems are often represented as graphs.
Community detection for large graphs is extremely challenging due to a lack of a priori information of the graph structure and a high degree of data dependency. The scalability and quality of a parallel algorithm is significantly impacted by the data partitioning scheme it employs.
We have devised a distributed-memory based parallel algorithm called PMEP which parallelizes MEP, a community detection algorithm based on the idea of maximizing equilibrium and purity of communities. MEP has been demonstrated to produce high quality of results for medium to large graphs. To parallelize MEP, we use a data partitioning strategy that duplicates a portion of computational workload in exchange for a lower communication cost required in the later stage when combing local results into a global one. This strategy is motivated by the fact that graph problems are highly data dependent and it is unlikely for a data partitioner to produce subgraphs that can be processed independently on multiple processes without incurring a high cost of synchronization and communication. The experimental results on both synthetic and real dataset show that PMEP successfully achieves scalability while maintaining high qualities of clustering results.
PMEP is a publicly available code that implements distributed-memory based parallel community detection algorithm using MPI.
Publications:
- Diana Palsetia, William Hendrix, Sunwoo Lee, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. Parallel Community Detection Algorithm Using a Data Partitioning Strategy with Pairwise Subdomain Duplication. In the 31st International Supercomputing Conference, June 2016. (pdf)