Apparate: Early-Exit Models for ML Latency and Throughput Optimization - Comparisons | HackerNoon
Briefly

In comparing Apparate with existing early-exit models like BranchyNet and DeeBERT, we find that previous approaches suffer significant accuracy drops, with declines as high as 23.9% for CV and 17.8% for NLP workloads. This comparison was conducted while following the recommended architectures and tuning thresholds optimally for both models. Despite their tuning, they failed to consistently meet accuracy constraints, highlighting the effectiveness of Apparate in maintaining a balance between accuracy and efficiency.
Taken together, the results clearly indicate that Apparate stands out in terms of maintaining accuracy. While existing early-exit strategies struggle with maintaining acceptable accuracy levels under load, Apparate not only fulfills the set accuracy constraint of 1% but simultaneously achieves lower tail latencies, ranging from 0.9% to 9.4% less than the compared systems. This showcases Apparate's advanced tuning mechanisms and operational efficiency.
Read at Hackernoon
[
|
]