In the previous era of single-core CPU dominance, most computations were executed on a predefined and closed hardware-software interface presented by Instruction Set Architecture (ISAs). such as x86, PowerPC, and ARM. ISA was useful as an interface between the hardware and programming language. However, novel trends clearly show that an abundance of computation patterns (especially ones that are power-hungry, compute-bound, and performance demanding) shift from predefined ISA/CPU architecture to specialized processing cores for the goal of reducing energy consumption and increasing performance. At first, some computations were offloaded to GPUs, and recently, reconfigurable fabric such as FPGAs has been added to the mix. This trend, previously present in embedded computing only, is nowadays universally present in all domains: general desktop computing, HPC, and smart, high-throughput embedded computing (smartphones). It requires a paradigm shift from performance/energy/memory-oblivious approaches to novel programming approaches which must take these non-functional parameters into the account. All these developments caused the disruption of the traditional hardware-software interface, and the need for new abstractions and optimizations which will be able to cope with heterogeneous architectures emerged.
High-Performance Computing (HPC) as we know it today is experiencing unprecedented changes, encompassing all levels from technology to use cases. The escalating quest for performance/power efficiency is increasingly requiring deep application-based customization of the underlying computing architecture. Looking straight at the heart of the problem, the hurdle to the full exploitation of today computing technologies ultimately lies in the gap between the applications’ demands and the capabilities of the underlying computing architecture: the closer the computing system matches the structure of the application and vice versa, more efficiently the available computing capability is exploited.
A significant challenge that emerged from heterogeneity is the increased complexity in design-space exploration and partitioning of applications to enable execution profiles for different processing elements existing in the computing platform, such as:
- General-purpose CPU (Intel x86, ARM, RISC-V)
- GPU and GPU-like cores
- Custom hardware accelerators designed for FPGA
- Integrated-chip accelerators in ASIC.
The main challenge of system integration is to adapt applications to maximally exploit all available processing elements, which includes:
- a thorough analysis of the application’s design and characteristics,
- identification of key compute-intensive kernels and algorithms, and,
- efficient mapping and optimization of identified kernels to different types of processing cores.
The ultimate goal of system integration is to achieve the best trade-off between the desired metrics (such as performance, throughput, power, energy, cost, quality of service, time constraints, etc.) depending on the requirements of the integrated environment.