# Cactus ## Docs - [Compatibility](https://mintlify.wiki/cactus-compute/cactus/advanced/compatibility.md): Weight versioning, breaking changes, and platform requirements for Cactus runtime and models - [Custom Models](https://mintlify.wiki/cactus-compute/cactus/advanced/custom-models.md): Convert and deploy custom models to Cactus, including LoRA adapters, weight quantization, and model testing - [Fine-Tuning](https://mintlify.wiki/cactus-compute/cactus/advanced/fine-tuning.md): Train LoRA adapters with Unsloth and deploy them to iOS and Android devices using Cactus - [NPU Acceleration](https://mintlify.wiki/cactus-compute/cactus/advanced/npu-acceleration.md): Hardware acceleration with Apple Neural Engine, Qualcomm Hexagon, and MediaTek APU for faster mobile inference - [Performance Tuning](https://mintlify.wiki/cactus-compute/cactus/advanced/performance-tuning.md): Optimize Cactus inference with KV cache configuration, chunked prefill, TPS throttling, and memory management - [Completion](https://mintlify.wiki/cactus-compute/cactus/api/completion.md): Text generation and chat completion API - [Embeddings](https://mintlify.wiki/cactus-compute/cactus/api/embeddings.md): Text, image, and audio embedding APIs - [Model & Engine](https://mintlify.wiki/cactus-compute/cactus/api/engine.md): Core Model class and engine configuration - [C FFI API](https://mintlify.wiki/cactus-compute/cactus/api/ffi.md): Complete C foreign function interface reference - [Computation Graph](https://mintlify.wiki/cactus-compute/cactus/api/graph.md): CactusGraph computational graph builder API - [Transcription](https://mintlify.wiki/cactus-compute/cactus/api/transcription.md): Speech-to-text transcription API - [Voice Activity Detection](https://mintlify.wiki/cactus-compute/cactus/api/vad.md): Detect speech segments in audio streams - [Vector Index](https://mintlify.wiki/cactus-compute/cactus/api/vector-index.md): High-performance vector similarity search - [cactus auth](https://mintlify.wiki/cactus-compute/cactus/cli/auth.md): Manage Cactus Cloud API key for automatic cloud fallback - [cactus build](https://mintlify.wiki/cactus-compute/cactus/cli/build.md): Build native libraries for mobile and desktop platforms - [cactus clean](https://mintlify.wiki/cactus-compute/cactus/cli/clean.md): Remove all build artifacts and cached files - [cactus convert](https://mintlify.wiki/cactus-compute/cactus/cli/convert.md): Convert models with LoRA adapter support - [cactus download](https://mintlify.wiki/cactus-compute/cactus/cli/download.md): Download and cache models locally - [CLI Overview](https://mintlify.wiki/cactus-compute/cactus/cli/overview.md): Command-line interface for running and managing Cactus models - [cactus run](https://mintlify.wiki/cactus-compute/cactus/cli/run.md): Run models in an interactive playground - [cactus test](https://mintlify.wiki/cactus-compute/cactus/cli/test.md): Run unit tests, benchmarks, and device testing - [cactus transcribe](https://mintlify.wiki/cactus-compute/cactus/cli/transcribe.md): Real-time speech-to-text transcription - [Architecture](https://mintlify.wiki/cactus-compute/cactus/concepts/architecture.md): Understand Cactus's three-layer architecture: Engine, Graph, and Kernels. Learn how hybrid NPU/CPU execution delivers low-latency inference on mobile devices. - [Supported Models](https://mintlify.wiki/cactus-compute/cactus/concepts/models.md): Complete list of models supported by Cactus, including language models, vision models, and transcription models. Includes RAM usage, features, and performance benchmarks. - [Quantization](https://mintlify.wiki/cactus-compute/cactus/concepts/quantization.md): Learn about INT4, INT8, and FP16 quantization in Cactus. Understand memory/performance tradeoffs, group quantization, and how to choose the right precision for your models. - [Chat Completion](https://mintlify.wiki/cactus-compute/cactus/guides/chat-completion.md): Build conversational AI with streaming, tool calling, and multi-turn conversations - [Embeddings](https://mintlify.wiki/cactus-compute/cactus/guides/embeddings.md): Generate text, image, and audio embeddings for semantic search and RAG - [Retrieval-Augmented Generation (RAG)](https://mintlify.wiki/cactus-compute/cactus/guides/rag.md): Build RAG applications with built-in vector database and semantic search - [Streaming Responses](https://mintlify.wiki/cactus-compute/cactus/guides/streaming.md): Stream tokens in real-time for better user experience - [Tool Calling](https://mintlify.wiki/cactus-compute/cactus/guides/tool-calling.md): Enable function calling and tool execution for agentic workflows - [Speech-to-Text Transcription](https://mintlify.wiki/cactus-compute/cactus/guides/transcription.md): Real-time audio transcription with Whisper, Moonshine, and Parakeet models - [Vision-Language Models](https://mintlify.wiki/cactus-compute/cactus/guides/vision.md): Use multimodal models to understand images alongside text - [Installation](https://mintlify.wiki/cactus-compute/cactus/installation.md): Complete installation guide for Cactus on macOS, Linux, iOS, and Android. Includes package managers, source builds, and SDK-specific setup. - [Introduction to Cactus](https://mintlify.wiki/cactus-compute/cactus/introduction.md): A hybrid low-latency energy-efficient AI engine for mobile devices & wearables with NPU acceleration, quantization, and multi-modal support. - [Quickstart](https://mintlify.wiki/cactus-compute/cactus/quickstart.md): Get started with Cactus in under 5 minutes. Run your first on-device AI inference with a simple "Hello World" example. - [Flutter SDK](https://mintlify.wiki/cactus-compute/cactus/sdks/flutter.md): Flutter bindings for Cactus on-device AI inference. Run LLMs, vision models, and speech models on iOS, macOS, and Android with dart:ffi. - [Kotlin SDK](https://mintlify.wiki/cactus-compute/cactus/sdks/kotlin.md): Kotlin API for running AI models on-device on Android and iOS via Kotlin Multiplatform. Supports completion, transcription, embeddings, RAG, and tool calling. - [Python SDK](https://mintlify.wiki/cactus-compute/cactus/sdks/python.md): Python bindings for Cactus on-device AI inference engine. Supports chat completion, vision, transcription, embeddings, RAG, tool calling, and streaming. - [React Native SDK](https://mintlify.wiki/cactus-compute/cactus/sdks/react-native.md): React Native bindings for Cactus on-device AI inference. Run AI models natively on iOS and Android with JavaScript/TypeScript. - [Rust SDK](https://mintlify.wiki/cactus-compute/cactus/sdks/rust.md): Rust FFI bindings to the Cactus C API for on-device AI inference. Auto-generated via bindgen with CMake build integration. - [Swift SDK](https://mintlify.wiki/cactus-compute/cactus/sdks/swift.md): Swift API for running AI models on-device on iOS, macOS, tvOS, watchOS, and Android. Supports transcription, embeddings, RAG, and tool calling.