Skip to main content

Overview

The cactus run command opens an interactive playground for any supported model. If the model isn’t already downloaded, Cactus will automatically fetch it from HuggingFace.

Syntax

cactus run <model> [flags]

Arguments

  • <model> - Model name (e.g., qwen-2.5-1.5b, llama-3.2-1b, phi-4)

Flags

—precision

Set the quantization precision level:
cactus run <model> --precision INT4|INT8|FP16
Default: INT4 Options:
  • INT4 - 4-bit quantization (smallest size, fastest)
  • INT8 - 8-bit quantization (balanced)
  • FP16 - 16-bit floating point (highest quality)

—token

Provide a HuggingFace API token for gated models:
cactus run <model> --token <your-hf-token>
Required for models like Llama that require access approval.

—reconvert

Force reconversion of the model from source weights:
cactus run <model> --reconvert
Useful when model format has been updated or conversion failed previously.

Examples

# Run Qwen with default settings (INT4)
cactus run qwen-2.5-1.5b

Interactive Playground

Once the model loads, you’ll enter an interactive chat interface:
┌─────────────────────────────────────────────┐
│ Cactus Playground - qwen-2.5-1.5b          │
│ Precision: INT4                             │
└─────────────────────────────────────────────┘

You: Hello! What can you help me with?

Assistant: I'm an AI assistant running locally on
your device. I can help with coding, writing,
analysis, and general questions...

Model Auto-Download

If the model isn’t cached locally, cactus run will:
  1. Download the model from HuggingFace
  2. Convert it to Cactus format with the specified precision
  3. Cache it in ./weights for future use
  4. Launch the interactive playground

See Also

Download Command

Pre-download models without running them

Model Library

Browse all supported models