Google's New LiteRT Accelerator Supercharges AI Workloads on Snapdragon-powered Android Devices
Briefly

Google's New LiteRT Accelerator Supercharges AI Workloads on Snapdragon-powered Android Devices
"While GPU hardware is widely available on modern Android devices, relying on them exclusively for AI tasks can introduce performance bottlenecks, according to Google software engineers Lu Wang, Wiyi Wanf and Andrew Wang. For example, they note that "running a compute-intensive, text-to-image generation model on-device, while simultaneously processing the live camera feed with an ML-based segmentation" can overwhelm even high-end mobile GPUs. The result may be a jittery user experience and dropped frames."
"However, many mobile devices now include a neural processing units (NPUs) which are custom-designed AI accelerators that can significantly speed up AI workloads compared to a GPU, while consuming less power. QNN was developed by Google in close collaboration with Qualcomm as a replacement for the previous TFLite QNN delegate. It provides developers with a unified and simplified workflow by integrating a wide range of SoC compilers and runtimes and exposing them through a streamlined API."
Google introduced a new LiteRT accelerator called Qualcomm AI Engine Direct (QNN) to improve on-device AI on Qualcomm-powered Android devices with Snapdragon 8 SoCs. QNN offers up to 100x speedups over CPU and 10x over GPU. GPUs can become bottlenecks for simultaneous compute tasks like text-to-image generation and live camera segmentation, causing jitter and dropped frames. Many devices include NPUs that accelerate AI while using less power. QNN replaces the TFLite QNN delegate, unifies SoC compilers and runtimes behind a streamlined API, supports 90 LiteRT operations for full delegation, and includes kernels optimized for LLMs. Benchmarks showed 64 of 72 models achieving full NPU delegation.
Read at InfoQ
Unable to calculate read time
[
|
]