Hardware Software Co-Design: Not Just a Cliché - PDF

Hardware Software Co-Design: Not Just a Cliché Adrian Sampson, James Bornholt, and Luis Ceze University of Washington, US Abstract The age of the air-tight

Please download to get full document.

View again

of 12
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.


Publish on:

Views: 10 | Pages: 12

Extension: PDF | Download: 0

Hardware Software Co-Design: Not Just a Cliché Adrian Sampson, James Bornholt, and Luis Ceze University of Washington, US Abstract The age of the air-tight hardware abstraction is over. As the computing ecosystem moves beyond the predictable yearly advances of Moore s Law, appeals to familiarity and backwards compatibility will become less convincing: fundamental shifts in abstraction and design will look more enticing. It is time to embrace hardware software co-design in earnest, to cooperate between programming languages and architecture to upend legacy constraints on computing. We describe our work on approximate computing, a new avenue spanning the system stack from applications and languages to microarchitectures. We reflect on the challenges and successes of approximation research and, with these lessons in mind, distill opportunities for future hardware software co-design efforts ACM Subject Classification C.5 Computer System Implementation Keywords and phrases approximation, co-design, architecture, verification Digital Object Identifier /LIPIcs.SNAPL Introduction Generations of computer scientists and practitioners have worked under the assumption that computers will keep improving themselves: just wait a few years and Moore s Law will solve your scaling problems. This reliable march of electrical-engineering progress has sparked revolutions in the ways humans use computers and interact with the world and each other. But growth in computing power has protected outdated abstractions and encouraged layering even more abstractions, whatever the cost. The free lunch seems to be over: single-thread performance has stagnated, Dennard scaling has broken down, and Moore s Law threatens to do the same. The shift to multi-core designs worked as a stopgap in the final years of frequency advancements, but physical limits have dashed hopes of long-term exponential gains through parallelism. Hardware software co-design presents significant performance and efficiency opportunities that are unavailable without crossing the abstraction gap. For example, embedded systems depended on co-design from their inception. The embedded domain uses software for flexibility while specialized hardware delivers performance. General-purpose computing features many hardware extensions employed to better serve software virtual memory, ISA extensions for security, and so on. While these mechanisms have been successful, they are ad hoc responses to trends in software design. Hybrid hardware software research cannot just be a cliché: now more than ever, true cooperation is crucial to improving the performance and efficiency of future computing systems. Over the past five years, our research group has been exploring approximate computing, a classic example of a hardware software co-design problem. Approximate computing trades accuracy for performance or efficiency, exploiting the fact that many applications are robust to some level of imprecision. Our experience has shown that neither software nor hardware alone can unlock the full potential of approximation; optimal solutions require co-design Adrian Sampson, James Bornholt, and Luis Ceze; licensed under Creative Commons License CC-BY 1st Summit on Advances in Programming Languages (SNAPL 15). Eds.: Thomas Ball, Rastislav Bodík, Shriram Krishnamurthi, Benjamin S. Lerner, and Greg Morrisett; pp Leibniz International Proceedings in Informatics Schloss Dagstuhl Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany A. Sampson, J. Bornholt, and L. Ceze 263 between programming models and hardware accelerators or ISA extensions. We discuss our experience with approximate computing and highlight lessons for future co-design efforts. We believe we can go much further by truly designing hardware and software in unison. We survey the opportunities for hardware software co-design in four broad categories. The first is effective hardware acceleration, where the best algorithm for general-purpose hardware and the best custom hardware for generic algorithms both fall short. For example, the Anton [12] special machine for molecular dynamics, co-designed a simulation algorithm with the hardware. The second category is reducing redundancy and reassigning responsibilities. For example, if all programs were guaranteed to be free of memory bugs, can we drop support for memory protection? Or if programs were free of data-races by construction, can we rethink cache coherence? The third category is hardware support for domain-specific languages, whose booming popularity democratizes programming by providing higher-level semantics and tooling, but today must still compile to low-level general-purpose hardware. The final category is research beyond the CPU: unexplored opportunities abound for hardware software cooperation in networks, memory, storage, and less glamorous parts of computer hardware such as power supplies. 2 Approximate Computing For the last five years, our group has explored one instance of a hardware software co-design opportunity: approximate computing. The idea is that current abstractions in computer systems fail to incorporate an important dimension of the application design space: not every application needs the same degree of accuracy all the time. These applications span a wide range of domains including big-data analytics, web search, machine learning, cyber-physical systems, speech and pattern recognition, augmented reality, and many more. These kinds of programs can tolerate unreliable and inaccurate computation, and approximate computing research shows how to exploit this tolerance for gains in performance and efficiency [1, 7, 10, 11, 25, 26, 28, 30, 35]. Approximate computing is a classic cross-cutting concern: its full potential is not reachable through software or hardware alone, but only through changing the abstractions and contracts between hardware and software. Advances in approximation require co-design between architectures that expose accuracy efficiency trade-offs and the programming systems that make those trade-offs useful for programmers. We have explored projects across the entire system stack from programming languages and tools down through the hardware that enable computer systems to trade off accuracy of computation, communication, and storage for gains in efficiency and performance. Our research direction spans languages [38], runtime systems, programmer tools including debuggers [36], compilers [41], synthesis tools for both software and hardware, microarchitecture [13], hardware techniques [43, 29], data stores [40], and communication services. Safety and Quality-of-Results A core requirement in writing programs with approximate components is safety: approximate components must not compromise safe execution (e.g., no uncontrolled jumps or critical data corruption) and must interact with precise components only in well-defined ways allowed by the programmer. Our work met this need with language support in the form of type qualifiers for approximate data and type-based static information-flow tracking [38]. Other work from MIT consists of a proof system for deriving safety guarantees in the face of unreliable S N A P L 264 Hardware Software Co-Design: Not Just a Cliché components [5]. These crucial safety guarantees allow systems to prove at compile time that approximation cannot introduce catastrophic failures into otherwise-good programs. Beyond safety, another key requirement is ways to specify and ensure acceptable quality-ofresults (QoR). Languages must enable programmers to declare the magnitude of acceptable approximation in multiple forms and granularities. For example, QoR can be set for a specific value (X should be at most Y% from its value in a fully precise execution), or one could attach QoR to a set of values (at most N values in a set can be in error). One can provide a QoR specification only for the final output of a program or for intermediate values. QoR specifications can then guide the compiler and runtime to choose and control the optimal approximate execution engine from a variety of software and hardware approximation mechanisms. While quality constraints are more general and therefore more difficult to enforce statically than safety requirements, initial tactics have seen success by limiting the kinds of approximation they can work with [41, 6] or by relying on dynamic checks [36, 17]. Approximation Techniques The purpose of allowing approximation is to trade accuracy for energy savings. At the highest level, there are three categories of approximation techniques: algorithmic, compiler/runtime, and hardware. Algorithmic approximation can be achieved by the programmer providing multiple implementations of a given computation and the compiler/runtime choosing among them based on QoR needs. A compiler can generate code for approximate execution by eliding some computations [28] or reducing value precision whenever allowed by the approximation specification. Approximation research has explored several approximate execution techniques with hardware support, among them: compiler-controlled voltage overscaling [13]; using learning-based techniques such as neural networks or synthesis to approximate kernels of imperative programs in a coarse-grain way [14]; adopting analog components for computation [43]; designing efficient storage systems that can lose bits [40]; and extending architecture mechanisms with imprecise modes [27]. While approximation only at the algorithm level together with compiler/runtime support applies to off-the-shelf hardware (and we intend to further explore that space too), our experience has shown that the greatest energy benefit comes from hardware-supported approximation with language/architecture co-design [29]. Tools A final key component for making approximate programming practical is software-development tools. We need tools to help programmers identify approximation opportunities, understand the effect of approximation at the application level, assist with specifying QoR requirements, and help test and debug applications with approximate components. Our first steps in this direction are a debugger and a post-deployment monitoring framework for approximate programs [36]. 2.1 Next Steps in Approximation Controlling Quality The community has allocated more attention to assuring safety of approximate programs than to controlling quality. Decoupling safety from quality has been crucial to enabling progress on that half of the equation [38, 5] but more nuanced quality properties have proven more challenging. We have initial ways to prove and reason about limited probabilistic quality A. Sampson, J. Bornholt, and L. Ceze 265 properties [6, 4, 41], but we still lack techniques that can cope with arbitrary approximation strategies and still produce useful guarantees. We also need ways to measure quality at run time. If approximate programs could measure how accurate they are without too much overhead, they could offer better guarantees to programmers while simultaneously exploiting more aggressive optimizations [17, 36]. But there is not yet a general way to derive a cheap, dynamic quality check for an arbitrary program and arbitrary quality criterion. Even limited solutions to the dynamic-check problem will amplify the benefits of approximation. Defining Quality Any application of approximate computing rests on a quality metric. Even evaluations for papers on approximation need to measure their effectiveness with some accuracy criterion. Unlike traditional criteria energy or performance, for example the right metric for quality is not obvious. It varies per program, per deployment, and even per user. The community does not have a satisfactory way to decide on the right metric for a given scenario: we are so far stuck with guesses. A next step in approximation research should help build confidence that we are using the right quality metrics. We should adopt techniques from software engineering, humancomputer interaction, and application domains like graphics to help gather evidence for good quality metrics. Ultimately, programmers need a sound methodology for designing and evaluating quality metrics for new scenarios. The Right Accelerator Hardware approximation research has fallen into two categories: extensions to traditional architectures [13, 27] and new, discrete accelerators [47, 14]. The former category has yielded simpler programming models, but the fine-grained nature of the model means that efficiency gains have been limited. Coarser-grained, accelerator-oriented approaches have yielded the best results to date. There are still opportunities for co-designing accelerators with programming models that capture the best of both approaches. The next generation of approximate hardware research should co-design an accelerator design with a software interface and compiler workflow that together attack the programmability challenges in approximation: safety and quality. By decoupling approximation from traditional processors, new accelerators could unlock new levels of efficiency while finally making approximate computing palatable hardware vendors. 2.2 Lessons from Approximation Our group s experience with approximate computing as a cross-cutting concern has had both successes and failures. The path through this research has yielded lessons both for approximation research specifically and hardware software co-design generally. The von Neumann Curse When doing approximation at the instruction granularity in a typical von Neumann machine, the data path can be approximate but the control circuitry likely can t. Given that control accounts for about 50% of hardware resources, gains are fundamentally limited to 2, which is hardly enough to justify the trouble. We therefore have more hopes for coarse-grain approximation techniques than fine-grain. S N A P L 266 Hardware Software Co-Design: Not Just a Cliché And that was just an example. Many other promising avenues of our work have fallen afoul of the conflation of program control flow and data flow. For example, if we want to approximate the contents of a register, we need to know whether it represents pixel data amenable to approximation or a pointer, which may be disastrous to approximate. Separation problems are not unique to approximation. Secure systems, for example, could profit from a guarantee that code pointers are never manipulated as raw data. Future explorations of hardware software co-design would benefit from architectural support for separating control flow from data flow. The High Cost of Hardware Design In our work on approximation with neural networks, we achieve the best energy efficiency when we use a custom hardware neural processing unit, or NPU [14, 43]. Our first evaluations of the work used cycle-based architectural simulation, which predicted an average 3 energy savings. Later, we implemented the NPU on an FPGA [29]. On this hardware, we measured an average energy savings of 1.7. The difference is due partially to the FPGA s overhead and clock speed and partially due to the disconnect between simulation and reality. Determining the exact balance would require a costly ASIC design process. Even building the FPGA implementation took almost two years. Hardware software co-design involves an implied imbalance: software is much faster to iterate on than hardware. Future explorations of hardware software co-design opportunities would benefit from more evolution of hardware description languages and simulation tools. We should not have to implement hardware three times first in simulation, second in an HDL for an efficient FPGA implementation, and again for a high-performance ASIC. Trust the Compiler Hybrid hardware software research constantly needs to divide work between the programmer, the compiler, and the hardware. In our experience, a hybrid design should delegate as much as possible to the compiler. For example, the Truffle CPU [13] has dual-voltage SRAM arrays that require every load to match the precision level of its corresponding store. This invariant would be expensive to enforce with per-register and per-cache-line metadata, and it would be unreasonable for programmers to specify manually. Our final design leaves all the responsibility to the compiler, where enforcement is trivial. Relegating intelligence to the compiler comes at a cost in safety: programming to the Truffle ISA directly is dangerous and error-prone. Fortunately, modern programmers rarely write directly to the CPU s hardware interface. Researchers should treat the ISA as a serialization channel between the compiler and architecture not a human interface. Eschewing direct human hardware interaction can pay off in fundamental efficiency gains. 3 Opportunities for Co-Design 3.1 Programming Hardware The vast majority of programming languages research is on languages that target a very traditional notation of programming. Programs must eventually be emitted as a sequence of instructions, meant to be interpreted by processors that will load them from memory and execute them in order. The programming languages and architecture communities should not remain satisfied with this traditional division of labor. The instruction set architecture abstraction imposes limits on the control that programmers can have over how A. Sampson, J. Bornholt, and L. Ceze 267 hardware works, and compatibility concerns limit the creativity of architecture designs. Hybrid hardware software research projects should design new hardware abstractions that violate the constraints of traditional ISAs. Some recent work revived interest in languages for designing hardware and FPGA configurations [3, 31] or applied language techniques to special-purpose hardware like networking equipment [15] and embedded processor arrays [33]. But we can do more. A new story for programmable hardware will require radically new architecture designs, hardware interfaces that expose the right costs and capabilities, programming models that can abstract these interfaces complexity, and careful consideration of the application domains. Rethinking Architectures from the Ground Up The central challenge in designing programmable hardware is finding the right architectural trade-off between reconfigurability and efficiency. Co-design is only possible if we expose more than what current CPUs do, but providing too much flexibility can limit efficiency. Current field-programmable gate arrays (FPGAs) are a prime example that misses the mark today: FPGAs strive for bit-level reconfigurability, and they pay for it in both performance and untenable tool complexity. New architectures will need to carefully choose the granularity of reconfiguration to balance these opposing constraints. Exposing Communication Costs The von Neumann abstraction is computation-centric: the fundamental unit of work is computation. But the costs in modern computers, especially in terms of energy, are increasingly consumed by communication more than computation. New programmable hardware will need to expose abstractions that reflect this inversion in the cost model. Unlike earlier attempts to design processors with exposed communication costs [16], these new abstractions need not be directly comprehensible for humans: we do not need to expect programmers to write this assembly language directly. Instead, we should design abstractions with their software toolchains in mind from the beginning. Managing Complexity New programmable hardware will need to work in concert with programming languages and compiler workflows that can manage their inherent complexity. Crucially, the toolchain will need to balance convenient automation with programmer control. It can be extremely powerful to unleash programmers on the problem of extracting efficiency from hardware, but history has also shown that overly complex programming models do not go mainstream. New toolchains should combi
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks