Cactus v1: Cross-Platform LLM Inference on Mobile with Zero Latency and Full Privacy

"Cactus, a Y Combinator-backed startup, enables local AI inference to mobile phones, wearables, and other low-power devices through cross-platform, energy-efficient kernels and a native runtime. It delivers sub-50ms time-to-first-token for on-device inference, eliminates network latency, and defaults to complete privacy. Version v1 of the SDK, now in beta, improves performance on lower-end hardware and adds optional cloud fallback to ensure greater reliability."

"Native Swift support is still minimal and less mature as support for other languages, but iOS developers can use the Kotlin Multiplatform bindings within their Swift apps. On iOS and Android devices, Cactus takes a more general approach to on-device AI inference than the platform-native solutions offered by Apple and Google, Apple Foundation frameworks and Google AI Edge, which are platform-specific and expose only a limited, vendor-controlled set of capabilities. Cactus supports a wide variety of models, including Qwen, Gemma, Llama, DeepSeek, Phi, Mistral and many others."

Cactus provides a cross-platform SDK and native runtime that enables local AI inference on mobile phones, wearables, and other low-power devices. The platform delivers sub-50ms time-to-first-token for on-device inference, eliminates network latency, and defaults to complete privacy. Version v1 (beta) improves performance on lower-end hardware and adds optional cloud fallback for greater reliability. The SDK offers native bindings for React Native, Flutter, and Kotlin Multiplatform, with minimal native Swift support. Cactus supports many model families and multiple quantization levels from FP32 down to 2-bit. The SDK includes built-in model versioning and over-the-air updates, handling downloads, caching, and seamless model switching. The SDK can fall back to cloud-based models for complex or large-context tasks and the v1 inference engine was overhauled, transitioning from GGUF to a proprietary format.

#on-device-ai #mobile-sdk #model-quantization #privacy

Read at InfoQ

Unable to calculate read time

Collection

[

...

]

Cactus v1: Cross-Platform LLM Inference on Mobile with Zero Latency and Full PrivacyCactus v1: Cross-Platform LLM Inference on Mobile with Zero Latency and Full Privacy Briefly

Cactus v1: Cross-Platform LLM Inference on Mobile with Zero Latency and Full Privacy
Cactus v1: Cross-Platform LLM Inference on Mobile with Zero Latency and Full Privacy
Briefly