cactus convert

Overview

The cactus convert command transforms models from HuggingFace format to Cactus format with quantization. Supports merging LoRA adapters into base models.

Syntax

cactus convert <model> [output_dir] [flags]

Arguments

<model> - Model name or HuggingFace repository
[output_dir] - Optional output directory (default: ./weights/<model-name>)

Flags

—precision

Set the quantization precision level:

cactus convert <model> --precision INT4|INT8|FP16

Default: INT4 Options:

INT4 - 4-bit quantization (smallest size)
INT8 - 8-bit quantization (balanced)
FP16 - 16-bit floating point (highest quality)

—lora

Merge a LoRA adapter into the base model:

cactus convert <model> --lora <path/to/lora>

Supports:

Local LoRA adapter directories
HuggingFace LoRA repositories
Multiple LoRA adapters (specify flag multiple times)

—token

Provide a HuggingFace API token for downloading source models:

cactus convert <model> --token <your-hf-token>

Required for gated models or private repositories.

Examples

# Convert Qwen to INT4 format
cactus convert qwen-2.5-1.5b

Conversion Process

The conversion pipeline includes:

Download - Fetch source model from HuggingFace (if needed)
LoRA Merge - Apply LoRA adapters to base weights (if specified)
Quantization - Convert to target precision level
Optimization - Apply Cactus-specific optimizations
Export - Write converted model to output directory

┌─────────────────────────────────────────────┐
│ Converting: qwen-2.5-1.5b                   │
│ Precision: INT4                             │
│ LoRA: ./adapters/my-finetune                │
└─────────────────────────────────────────────┘

Loading base model...
Applying LoRA adapter... ████████████ 100%
Quantizing to INT4...    ████████████ 100%
Optimizing for ARM...    ████████████ 100%
Writing weights...       ████████████ 100%

✓ Conversion complete
  Output: ./weights/qwen-2.5-1.5b-int4
  Size: 1.2GB

LoRA Adapter Format

Supported LoRA formats:

Local Directory

./adapters/my-lora/
├── adapter_config.json
├── adapter_model.safetensors  # or .bin
└── README.md

HuggingFace Repository

# Public repository
cactus convert base-model --lora username/lora-adapter

# Private repository (requires token)
cactus convert base-model \
  --lora username/private-lora \
  --token hf_xxxxxxxxxxxxx

Output Format

Converted models include:

./weights/model-name-precision/
├── weights.bin          # Quantized model weights
├── tokenizer.json       # Tokenizer vocabulary
├── config.json          # Model configuration
└── metadata.json        # Conversion metadata

Use Cases

Fine-tuned Models

Convert your custom fine-tuned models:

# Convert your HuggingFace fine-tune
cactus convert username/my-finetuned-llama

LoRA Experimentation

Test different LoRA combinations:

# Base model
cactus convert qwen-2.5-7b

# With coding LoRA
cactus convert qwen-2.5-7b --lora ./coding-lora

# With math LoRA
cactus convert qwen-2.5-7b --lora ./math-lora

Precision Optimization

Create multiple precision variants:

# Small & fast (mobile)
cactus convert phi-4 --precision INT4

# Balanced (tablets)
cactus convert phi-4 --precision INT8

# High quality (desktop)
cactus convert phi-4 --precision FP16

Download Command

Download models without custom conversion

Run Command

Run converted models interactively

​Overview

​Syntax

​Arguments

​Flags

​—precision

​—lora

​—token

​Examples

​Conversion Process

​LoRA Adapter Format

​Local Directory

​HuggingFace Repository

​Output Format

​Use Cases

​Fine-tuned Models

​LoRA Experimentation

​Precision Optimization

​See Also