Apparate significantly reduces latency across various workloads, outperforming traditional early-exit models without sacrificing accuracy or tail-latency constraints, making it a versatile solution.
Our evaluation shows Apparate can achieve a 40.5-91.5% reduction in 25th percentile and median latencies for computer vision tasks, indicating substantial efficiency improvements.
While existing early-exit models can jeopardize accuracy by up to 23.9%, Apparate consistently meets both accuracy and tail-latency specifications across diverse configurations.
One notable benefit of Apparate is its capability to adapt to different model architectures and configurations fluently, maintaining performance under varying constraints.
Collection
[
|
...
]