Energy-efficient Data Processing Using Accelerators
Author | : |
Publisher | : |
Total Pages | : 127 |
Release | : 2015 |
ISBN-10 | : OCLC:931736762 |
ISBN-13 | : |
Rating | : 4/5 (62 Downloads) |
Book excerpt: Energy efficiency of computing systems has become crucial with the end of Dennardian scaling in which voltage scaling has stalled, thereby increasing power density with decreasing transistor size. One approach to improve energy efficiency is to use accelerators specialized for a certain set of computing problems. Unlike traditional general-purpose processors, accelerators avoid the overhead of fetching and scheduling instructions. This dissertation investigates architectural techniques to enable energy-efficient data processing using reconfigurable accelerators: customizing L1 data caches for computing systems integrated with reconfigurable accelerators, and proposing a near-memory processing architecture using reconfigurable accelerators. Data transfers between accelerators and memory are often a bottleneck for both performance and energy efficiency. This dissertation demonstrates the potential of a configurable L1 data cache to exploit diversity in cache requirements across hybrid applications that use accelerators. One configurable feature is the cache topology; it can be reconfigured as a set of private L1 caches, or a single L1 cache shared by a processor and an accelerator. This dissertation also proposes a technique to provide a configurable tradeoff between number of ports and capacity of the L1 cache. To further reduce the overhead of transferring data between compute-engines and memory, this dissertation proposes NDA (Near-DRAM Acceleration), an architecture that stacks reconfigurable accelerators atop off-chip commodity DRAM devices. To make this architecture practical in the short run, NDA uses commodity 2D DRAM devices and provides, in a practical way, high-bandwidth connections between accelerators and DRAM for the purpose of near-memory processing. This dissertation explores three NDA microarchitectures to stack accelerators atop DRAM and analyzes the impact of supporting such microarchitectures on DRAM area, timing, and energy. The first microarchitecture connects accelerators and DRAM through global I/O lines that are shared between all DRAM banks. In the second microarchitecture, global I/O lines are doubled to increase the internal bandwidth between accelerators and DRAM. The third microarchitecture connects accelerators and DRAM through global datalines that are private to each DRAM bank, substantially increasing internal DRAM bandwidth. This dissertation also identifies various software and hardware challenges in implementing the NDA architecture and provides cost-effective solutions.