cactus test - Cactus

Overview

The cactus test command runs the Cactus test suite, including unit tests, performance benchmarks, and on-device testing for iOS and Android.

Syntax

cactus test [flags]

Flags

—model

Specify the LLM model to test:

cactus test --model <model-name>

Default: LFM2-VL-450M

—transcribe_model

Specify the speech-to-text model to test:

cactus test --transcribe_model <model-name>

Default: moonshine-base

—benchmark

Run benchmarks with larger, more comprehensive models:

cactus test --benchmark

Uses production-scale models instead of test fixtures.

—precision

Regenerate model weights at a specific precision:

cactus test --precision INT4|INT8|FP16

Forces conversion of test models at the specified quantization level.

—reconvert

Force reconversion of test models from source:

cactus test --reconvert

Useful when model format has been updated.

—no-rebuild

Skip rebuilding the library before testing:

cactus test --no-rebuild

Use existing build artifacts. Faster for iteration on tests.

—llm / —stt / —performance

Run specific test suites:

cactus test --llm            # Only LLM tests
cactus test --stt            # Only speech-to-text tests
cactus test --performance    # Only performance benchmarks

By default, all suites run.

—ios

Run tests on a connected iPhone or iPad:

cactus test --ios

Requirements:

Physical iOS device connected via USB
Xcode with device provisioning
Device in developer mode

—android

Run tests on a connected Android device:

cactus test --android

Requirements:

Physical Android device or emulator
ADB debugging enabled
Device authorized for USB debugging

Examples

# Run all tests with default models
cactus test

Test Suites

LLM Tests (`--llm`)

Tests language model functionality:

Model loading and initialization
Text generation with various prompts
Tokenization accuracy
Context window handling
Stop sequence detection
Temperature and sampling
Batch processing

┌─────────────────────────────────────────────┐
│ Running LLM Tests                           │
│ Model: LFM2-VL-450M                         │
└─────────────────────────────────────────────┘

✓ test_model_loading          (0.3s)
✓ test_simple_generation      (1.2s)
✓ test_context_window         (2.1s)
✓ test_stop_sequences         (0.8s)
✓ test_temperature_sampling   (1.5s)
✓ test_batch_processing       (3.2s)

6 passed, 0 failed

STT Tests (`--stt`)

Tests speech-to-text functionality:

Model loading and initialization
Audio file transcription
Real-time streaming transcription
Multiple audio formats
Accuracy on test dataset
Performance metrics

┌─────────────────────────────────────────────┐
│ Running STT Tests                           │
│ Model: moonshine-base                       │
└─────────────────────────────────────────────┘

✓ test_model_loading          (0.2s)
✓ test_file_transcription     (1.8s)
✓ test_streaming_audio        (2.5s)
✓ test_audio_formats          (3.1s)
✓ test_accuracy_dataset       (12.4s)
✓ test_performance_metrics    (5.3s)

6 passed, 0 failed

Performance Tests (`--performance`)

Benchmarks system performance:

Token generation speed (tokens/sec)
Time to first token (TTFT)
Memory usage and leaks
Model load time
Concurrent request handling
Device-specific optimizations

┌─────────────────────────────────────────────┐
│ Running Performance Benchmarks              │
│ Model: LFM2-VL-450M (INT4)                  │
└─────────────────────────────────────────────┘

Token generation:     45.2 tokens/sec
Time to first token:  0.3s
Model load time:      1.2s
Memory usage:         320MB
Peak memory:          380MB

✓ All benchmarks passed

Device Testing

iOS Device (`--ios`)

Deploys and runs tests on a connected iPhone/iPad:

cactus test --ios --model qwen-2.5-1.5b

┌─────────────────────────────────────────────┐
│ Testing on iOS Device                       │
│ Device: iPhone 15 Pro (iOS 18.0)            │
└─────────────────────────────────────────────┘

Building for iOS...
Deploying to device...
Running tests...

✓ test_model_loading          (0.5s)
✓ test_generation_speed       (2.1s)
  → 38.4 tokens/sec on A17 Pro
✓ test_memory_usage           (1.2s)
  → Peak: 420MB

3 passed, 0 failed

Android Device (`--android`)

Deploys and runs tests on a connected Android device:

cactus test --android --model llama-3.2-1b

┌─────────────────────────────────────────────┐
│ Testing on Android Device                   │
│ Device: Pixel 8 (Android 14)                │
└─────────────────────────────────────────────┘

Building for Android...
Installing APK...
Running tests...

✓ test_model_loading          (0.7s)
✓ test_generation_speed       (2.5s)
  → 32.1 tokens/sec on Tensor G3
✓ test_memory_usage           (1.4s)
  → Peak: 480MB

3 passed, 0 failed

Benchmark Mode

With --benchmark, tests use larger production models:

cactus test --benchmark

Suite	Default Model	Benchmark Model
LLM	LFM2-VL-450M	Qwen-2.5-3B
STT	moonshine-base	parakeet-1.1b

Benchmark mode provides more realistic performance metrics but takes longer to run.

Continuous Integration

For CI/CD pipelines:

# Fast test run
cactus test --llm --no-rebuild

# Full test suite
cactus test --benchmark

# Platform-specific
cactus test --android --model qwen-2.5-1.5b

Build Command

Build libraries before testing

Run Command

Test models interactively

​Overview

​Syntax

​Flags

​—model

​—transcribe_model

​—benchmark

​—precision

​—reconvert

​—no-rebuild

​—llm / —stt / —performance

​—ios

​—android

​Examples

​Test Suites

​LLM Tests (--llm)

​STT Tests (--stt)

​Performance Tests (--performance)

​Device Testing

​iOS Device (--ios)

​Android Device (--android)

​Benchmark Mode

​Continuous Integration

​See Also