Sponsor:

Advanced Scientific Computing Research (ASCR) under the U.S. Department of Energy Office of Science (Office of Science)


Project Team Members:

Northwestern University

North Carolina State University

Lawrence Berkley National Laboratory






Northwestern University - EECS Dept.




Scalable and Power Efficient Data Analytics for Hybrid Exascale Systems


Introduction

Recent DOE Workshops on Exascale Computing have articulated emerging trends related to data, hardware, and energy issues that necessitate the next-generation algorithms and software libraries for data analysis and mining. They recognized that the increasing gap between the opportunities created by these trends and the current data analytics capabilities will soon become a major bottleneck on our path to exascale. Given the data trend—scientific data grow not only in the size but in the complexity—the demands for more sophisticated analyses increase. The execution of many data analysis algorithms is dominated by a small number of kernels. Therefore, our strategy is to provide a generic and highly optimized set of cores, or kernel, analytics functions, from these a broad constellation of high performance analytical pipelines could be organically consolidated. It is our vision to develop a comprehensive library of such exascale data analysis and mining kernels. In the long term, such an approach will bring the development of analytics algorithms to the next level: the impact maybe akin to that of the ScaLAPACK linear algebra library in scientific computing. Furthermore, to meet the hardware trend—the architectures of emerging HPC systems are becoming inherently heterogeneous—our specific goal is to design algorithms for data analysis kernels accelerated on hybrid multi-node, multi-core HPC architectures comprised of a mix of GPUs, FPGA, and SSDs and develop their scalable implementations. Finally, the energy trend—performance-energy tradeoffs are becoming an essential part of the equation—drives the proposed advances in our performance-energy tradeoff analysis framework that would enable our data analysis kernels algorithms and software to be parameterized so that users can choose the right power-performance optimizations. The apex of this proposal is a library of functions and software to accelerate data analytics, mining, knowledge discovery for large-scale scientific applications, thereby, increasing productivity of both scientists and the systems. The developed software will be released as open source for the benefits of the community in large. Moreover, students and postdoctoral associates on this project would be trained, readily available for DOE labs.

To achieve our overarching goals, the specific objectives of the proposal are as follows:

  1. Design and develop data mining kernels and algorithms for acceleration on hybrid architectures which include many-core systems, GPUs, and other accelerators.
  2. Design and develop approximate scalable algorithms for data mining and analysis kernels enabling faster exploration, more efficient resource usage, reduced memory footprint, and more power efficient computations.
  3. Design and develop scalable and out-of-core algorithms and software for analytics that exploit SSD disks to enable exploration of massive amounts of data and to also enable large-scale in-situ analytics and mining on nodes.
  4. Design and develop index-based data analysis and mining kernels and algorithms for performance and power optimizations including (a) selective data mining kernels with FastBit and (b) index-based perturbation analysis kernels for noisy and uncertain data.
  5. Design and develop alternative and parameterized kernels and algorithms that facilitate trade-offs in performance, resource usage, and energy efficiency. For this purpose, we will build upon our Energy-Resource-Efficiency (ERE) Framework.
  6. Demonstrate the results of our project by enabling analytics at scale for selected applications (some of them described in the next section) on large-scale HPC systems.
  7. Provide the results, algorithms, and software libraries in the public domain.

Publications

This work is supported by DOE Office of Science, Advanced Scientific Computing Research (ASCR), under award number DE-SC0005340, program manager Lucy Nowell.
Northwestern University EECS Home | McCormick Home | Northwestern Home | Calendar: Plan-It Purple
© 2011 Robert R. McCormick School of Engineering and Applied Science, Northwestern University
"Tech": 2145 Sheridan Rd, Tech L359, Evanston IL 60208-3118  |  Phone: (847) 491-5410  |  Fax: (847) 491-4455
"Ford": 2133 Sheridan Rd, Ford Building, Rm 3-320, Evanston, IL 60208  |  Fax: (847) 491-5258
Email Director

Last Updated: $LastChangedDate: 2015-02-19 15:02:26 -0600 (Thu, 19 Feb 2015) $