cactus run - Cactus

Overview

The cactus run command opens an interactive playground for any supported model. If the model isn’t already downloaded, Cactus will automatically fetch it from HuggingFace.

Syntax

cactus run <model> [flags]

Arguments

<model> - Model name (e.g., qwen-2.5-1.5b, llama-3.2-1b, phi-4)

Flags

—precision

Set the quantization precision level:

cactus run <model> --precision INT4|INT8|FP16

Default: INT4 Options:

INT4 - 4-bit quantization (smallest size, fastest)
INT8 - 8-bit quantization (balanced)
FP16 - 16-bit floating point (highest quality)

—token

Provide a HuggingFace API token for gated models:

cactus run <model> --token <your-hf-token>

Required for models like Llama that require access approval.

—reconvert

Force reconversion of the model from source weights:

cactus run <model> --reconvert

Useful when model format has been updated or conversion failed previously.

Examples

# Run Qwen with default settings (INT4)
cactus run qwen-2.5-1.5b

Interactive Playground

Once the model loads, you’ll enter an interactive chat interface:

┌─────────────────────────────────────────────┐
│ Cactus Playground - qwen-2.5-1.5b          │
│ Precision: INT4                             │
└─────────────────────────────────────────────┘

You: Hello! What can you help me with?

Assistant: I'm an AI assistant running locally on
your device. I can help with coding, writing,
analysis, and general questions...

Model Auto-Download

If the model isn’t cached locally, cactus run will:

Download the model from HuggingFace
Convert it to Cactus format with the specified precision
Cache it in ./weights for future use
Launch the interactive playground

Download Command

Pre-download models without running them

Model Library

Browse all supported models

​Overview

​Syntax

​Arguments

​Flags

​—precision

​—token

​—reconvert

​Examples

​Interactive Playground

​Model Auto-Download

​See Also