On-Device AI Inference for Mobile & Wearables
Run LLMs, speech models, and vision transformers locally on phones with hybrid NPU acceleration. Low latency, minimal RAM, and intelligent cloud fallback.
# Install Cactus
brew install cactus-compute/cactus/cactus</div> <div className="mt-4 dark:text-gray-500 text-gray-500"># Run a model</div> <div className="mt-2 dark:text-blue-400 text-blue-600"> cactus run LiquidAI/LFM2-1.2B
✓ Model loaded (245MB RAM)
✓ NPU acceleration enabled
✓ 168 tokens/sec decode
Quick Start
Get up and running with Cactus in minutes
Download a Model
Download a pre-quantized model (INT4 recommended for mobile):
Models are cached in
./weights by default. Supported models include Gemma, Qwen, LFM2, Whisper, and Parakeet.Key Features
Everything you need for production AI on mobile
NPU Acceleration
5-11x faster inference on Apple, Qualcomm, and Mediatek neural processors
Multi-Modal
Text, vision, and audio models with unified API. Vision-language and speech-to-text support
Quantization
INT4/INT8 quantization with lossless quality. 70-90% memory reduction
RAG & Embeddings
Built-in vector database and embedding models for retrieval-augmented generation
Tool Calling
Function calling and tool execution for agentic workflows
Cloud Fallback
Automatic cloud handoff based on confidence thresholds for complex queries
Platform SDKs
Native integrations for every platform
Python SDK
FFI bindings for Mac and Linux. Auto-installed with
source ./setupSwift SDK
Native iOS, macOS, tvOS, and watchOS support with XCFramework
Kotlin SDK
Android and iOS via Kotlin Multiplatform. JNI bridge included
Flutter SDK
Cross-platform Dart bindings for iOS and Android
Rust SDK
Safe Rust bindings for systems programming
React Native
JavaScript bridge for React Native apps
Explore the Docs
Dive deeper into Cactus capabilities
Architecture
Understand the three-layer design: Engine, Graph, and Kernels
Supported Models
Browse the full catalog of LLMs, STT, and vision models
Chat Completion
Build conversational AI with streaming and tool calling
Speech-to-Text
Real-time transcription with Whisper, Moonshine, and Parakeet
CLI Reference
Master the CLI for model management and testing
Performance Tuning
Optimize for latency, RAM, and battery life
Ready to Get Started?
Install Cactus and run your first model in under 2 minutes
View Quickstart Guide