FPGAs versus GPUs for Datacenters
Moderator: Babak Falsafi (EPFL)
Panelists: Bill Dally (Stanford/NVIDIA), Desh Singh (Altera)
The slowdown in Dennard Scaling is giving rise to the use of accelerators in datacenters. The most promising accelerator platforms are GPUs and FPGAs. GPUs rely on the data parallel execution model, offer higher-level programming abstractions and a rich set of libraries and best suited for dense floating-point arithmetic. FPGAs enable arbitrary forms of parallelism and generalized acceleration including fixed-point arithmetic but lack in programmability, computational density and well-defined memory abstractions. The debate between GPUs and FPGAs is on the one hand related to the broader applicability of the data-parallel execution model and dense floating-point arithmetic to emerging datacenter workloads, and the programmability and computational density/efficiency and basic utility of FPGAs on the other.
Most of the debate will center around the above issues. I also have a few questions about:
- What should we do about data services/data management operators that are neither data parallel, nor fit the spatial computing model or the current CPUs are servers that are primarily designed for desktop workloads?
- What are the best ways to interface GPUs/FPGAs as accelerators with the rest of the system both from a programming abstraction perspective and at the hardware/system level (e.g., cache coherence shared memory, message passing)?
Bill Dally’s position statement
For processing tasks in the data center, one should use the right tool for the job. If the job involves intensive integer or floating point arithmetic, high bandwidth to bulk memory, and close interaction with a CPU, the right tool is a GPU. If the job involves non-arithmetic logic (e.g., bioinformatics, coding, etc...), moderate memory bandwidth, and loose CPU interaction, the right tool is an ASIC. If the right tool is an ASIC but you don't have the volume to justify it, use an FPGA, but realize that a LUT costs 20-100x the power and area of the equivalent function in an ASIC. GPUs are the tool of choice for arithmetic-intensive applications because they achieve industry-leading perf/W on numeric benchmarks and are easy to program. Upcoming GPUs with HBM have memory bandwidth that cannot be approached by FPGAs with discrete memory. With NVLINK, GPUs can transparently share memory with CPUs, further simplifying the programming task.
Desh Singh’s position statement
I believe that the most efficient way to implement an algorithm is to design a custom hardware circuit that is absolutely dedicated to implementing only that algorithm. While this is impractical given the generality and rapid evolution of workloads, the FPGA offers us the next best alternative. Using FPGA reconfigurable hardware, we can implement any circuit that we like by taking advantage of a millions of programmable resources arranged in a blank slate. In the past, FPGA design has posed a high barrier to entry due to hardware centric tool flows. However, recent evolutions in compiler technology are now opening the doors to software programmers using FPGAs with high level languages (C,C++,OpenCL,Java,etc.). The convergence of tools and new specializations in FPGA architecture will make into a formidable application accelerator in the data center.