Enabling Full-Fledged Parallelism on Intermittently Powered Computing

Akhunov, Khakim

doi:10.15168/11572_412810

Energy-harvesting batteryless devices exploit power from various sources, such as radio waves, sunlight, and vibration. However, the sporadic availability of ambient energy causes frequent power failures, forcing the systems to operate intermittently. The computation interruptions violate forward progress and memory consistency. State-of-the-art solutions have proposed multiple mature approaches for intermittent computing to provide both application termination guarantees and consistent and idempotent results. Some solutions propose so-called just-in-time (JIT) checkpoints, where dedicated hardware is used to constantly monitor available energy and warn the system when the energy level in the energy buffer reaches critical points. These points indicate potential power failures before which the system must back up its architectural state. Other solutions propose placing checkpoints in the program code at compile time based on the energy consumption of code execution between checkpoints. A power failure can occur at any time during execution, but the computation recovers from the recent checkpoint. Instead of explicitly placing checkpoints, another set of solutions assumes the software developers split the application into failure-atomic tasks directly manipulating non-volatile memory. The common condition in task-based intermittent programming is to keep the energy consumption of each task within the capacity of the energy buffer. While efficient, the proposed solutions target off-the-shelf single-core ultra-low-power microcontrollers (MCUs) with limited flexibility and performance capability. These MCUs are energy-efficient and ideal for performing low-cost tasks. On the other hand, contemporary compute- and data-intensive, parallelizable applications demand the execution of high-cost tasks on edge devices. The reason is that sending large amounts of raw sensor data wirelessly to offload the intensive tasks to the cloud is too energy-inefficient, especially for energy-harvesting devices. Four critical limitations prevent the use of advanced multicore devices and emerging technologies for the efficient execution of modern applications on ultra-low-power batteryless edges. First, in existing systems, programmers need to exploit underlying parallelism manually by interacting directly with low-power accelerators, which is cumbersome. Programmable general-purpose multicore platforms provide the highest degree of flexibility, but the intermittent computing community has overlooked them so far. Existing intermittent computing runtimes do not support parallelism or provide language constructs to express parallelizable code blocks. Second, the availability of energy and the strength of incoming power affect an intermittent system's charging and discharging cyclical nature. When incoming power is strong enough, the device charges rapidly and spends more time on computation. Similarly, low input power forces the system to spend more time collecting energy than computing. To respond to ambient power dynamics and increase throughput, existing works have proposed workload, accuracy, voltage, frequency, and computational unit scaling techniques. However, the solutions work on a fixed hardware configuration, and target systems are limited by the performance of a single-core processor without employing available degrees of application parallelism. Third, existing low-power multicore platforms are not designed for intermittent computing. Their internal non-volatile flash memories are not suitable for intermittent computing because they have high energy requirements, low speed, and limited write endurance. The only way to exploit current low-power multicore platforms for intermittent computing is to introduce an external non-volatile memory, such as FRAM. However, this architectural configuration is very inefficient as compared to embedded FRAM due to its significant energy overhead, making backup and recovery operations energy-expensive. Finally, using emerging memories, e.g., MRAM, as an external non-volatile memory allows for in-memory processing (PIM) of data-intensive computations, eliminating unnecessary data movement and enabling data-level parallelism. While inherently idempotent, such in-memory computation is hard to integrate into traditional MCU-based intermittent systems. Successful integration lacks the effective maintenance of data flow and computation in a power failure-resilient manner. In this thesis, we tackle the limitations. In Chapter 3, we introduce AdaMICA, an intermittent computing runtime that supports parallel intermittent multicore computing and provides the highest degree of flexibility of programmable general-purpose multiple cores. AdaMICA adaptively switches to the best multicore configuration considering the dynamic input power. Therefore, it allows an intermittent system to benefit from workload parallelization, thereby increasing systems throughput and decreasing end-to-end delay while considering the energy availability. Chapter 4 presents PEARL, a power- and energy-aware multicore intermittent computing that enables, for the first time, the efficient adaptation of the common off-the-shelf low-power multicore microcontroller platforms to the intermittent computing paradigm. PEARL features a novel backup policy that significantly reduces the number of accesses to non-volatile memory on multicore platforms. PEARL benefits from multicore power-aware adaptation to adjust the underlying hardware architecture and exploits energy awareness to transition an intermittent system to ultra-low-power mode, retaining memory content. In Chapter 6, we address emerging non-volatile memory, CRAM (Computational RAM), presenting PiMCo and LUTIC, novel programmable CRAM-based in-memory coprocessors that facilitate the power-failure resilient execution of parallelizable computational loads. The coprocessors are pluggable into and controlled by a general-purpose MCU via a standard communication protocol. In Chapter 7, we propose Viadotto, a novel adaptive intermittent computing system that bridges the gap between existing MCU-based intermittent systems and the emerging compute-in-memory paradigm. Viadotto introduces a high-level programming model supported by its compiler, software library, and power failure-resilient memory controller, hiding detailed low-level logic operations and data flow management in CRAM from programmers. Viadotto exploits adaptation by controlling data-level parallelism with respect to the ambient power level. In essence, this thesis addresses several pivotal challenges to enabling full-fledged parallelism on ultra-low-power batteryless devices. Hence, we have made a significant step towards the efficient deployment of modern complex applications on energy-harvesting systems.

Enabling Full-Fledged Parallelism on Intermittently Powered Computing / Akhunov, Khakim. - (2024 Jun 24), pp. 1-142. [10.15168/11572_412810]