Skip to main content

On-Device AI Inference for Mobile & Wearables

Run LLMs, speech models, and vision transformers locally on phones with hybrid NPU acceleration. Low latency, minimal RAM, and intelligent cloud fallback.

# Install Cactus
brew install cactus-compute/cactus/cactus</div> <div className="mt-4 dark:text-gray-500 text-gray-500"># Run a model</div> <div className="mt-2 dark:text-blue-400 text-blue-600"> cactus run LiquidAI/LFM2-1.2B
✓ Model loaded (245MB RAM)
✓ NPU acceleration enabled
✓ 168 tokens/sec decode

Quick Start

Get up and running with Cactus in minutes

1

Install Cactus

Install via Homebrew on macOS or Linux:
brew install cactus-compute/cactus/cactus
Or clone and build from source:
git clone https://github.com/cactus-compute/cactus
cd cactus && source ./setup
2

Download a Model

Download a pre-quantized model (INT4 recommended for mobile):
cactus download LiquidAI/LFM2-1.2B --precision INT4
Models are cached in ./weights by default. Supported models include Gemma, Qwen, LFM2, Whisper, and Parakeet.
3

Run Your First Inference

Start the interactive playground:
cactus run LiquidAI/LFM2-1.2B
Or use the C API:
#include <cactus.h>

cactus_model_t model = cactus_init("weights/lfm2-1.2b", NULL, false);

const char* messages = R"([
  {"role": "user", "content": "What is 2+2?"}
])";

char response[4096];
cactus_complete(model, messages, response, sizeof(response), NULL, NULL, NULL, NULL);

printf("%s\n", response);
cactus_destroy(model);

Key Features

Everything you need for production AI on mobile

NPU Acceleration

5-11x faster inference on Apple, Qualcomm, and Mediatek neural processors

Multi-Modal

Text, vision, and audio models with unified API. Vision-language and speech-to-text support

Quantization

INT4/INT8 quantization with lossless quality. 70-90% memory reduction

RAG & Embeddings

Built-in vector database and embedding models for retrieval-augmented generation

Tool Calling

Function calling and tool execution for agentic workflows

Cloud Fallback

Automatic cloud handoff based on confidence thresholds for complex queries

Platform SDKs

Native integrations for every platform

Python SDK

FFI bindings for Mac and Linux. Auto-installed with source ./setup

Swift SDK

Native iOS, macOS, tvOS, and watchOS support with XCFramework

Kotlin SDK

Android and iOS via Kotlin Multiplatform. JNI bridge included

Flutter SDK

Cross-platform Dart bindings for iOS and Android

Rust SDK

Safe Rust bindings for systems programming

React Native

JavaScript bridge for React Native apps

Explore the Docs

Dive deeper into Cactus capabilities

Architecture

Understand the three-layer design: Engine, Graph, and Kernels

Supported Models

Browse the full catalog of LLMs, STT, and vision models

Chat Completion

Build conversational AI with streaming and tool calling

Speech-to-Text

Real-time transcription with Whisper, Moonshine, and Parakeet

CLI Reference

Master the CLI for model management and testing

Performance Tuning

Optimize for latency, RAM, and battery life

Ready to Get Started?

Install Cactus and run your first model in under 2 minutes

View Quickstart Guide