Skip to main content

Overview

Cactus provides embedding generation for multiple modalities:
  • Text embeddings - For semantic search, classification, RAG
  • Image embeddings - For image similarity and multimodal search
  • Audio embeddings - For audio similarity and speaker identification
All embeddings are returned as float32 vectors and can be normalized for cosine similarity.

Text Embeddings

Basic Usage

from cactus import cactus_init, cactus_embed, cactus_destroy

model = cactus_init("weights/lfm2-350m", None, False)

# Generate embedding
embedding = cactus_embed(model, "Hello, world!", normalize=True)
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

cactus_destroy(model)

C API

#include <cactus.h>

cactus_model_t model = cactus_init("weights/qwen3-embedding-0.6b", NULL, false);

float embeddings[768];
size_t embedding_dim;

int result = cactus_embed(
    model,
    "Sample text",
    embeddings,
    sizeof(embeddings),
    &embedding_dim,
    true  // normalize
);

if (result == 0) {
    printf("Dimension: %zu\n", embedding_dim);
    for (size_t i = 0; i < 5; i++) {
        printf("%.4f ", embeddings[i]);
    }
}

cactus_destroy(model);

Supported Embedding Models

ModelDimensionUse Case
LFM2-350M1024General text embeddings
LFM2-700M1024Higher quality text embeddings
Qwen3-0.6B768Fast text embeddings
Qwen3-Embedding-0.6B768Optimized for embeddings
nomic-embed-text-v2-moe768High quality with MoE
LFM2-VL-450M1024Text + image embeddings
LFM2.5-VL-1.6B1024High quality multimodal

Normalization

Image Embeddings

Generate embeddings from images using vision-language models:
from cactus import cactus_init, cactus_image_embed, cactus_destroy

model = cactus_init("weights/lfm2-vl-450m", None, False)

# Generate image embedding
embedding = cactus_image_embed(model, "image.jpg")
print(f"Image embedding: {len(embedding)} dimensions")

cactus_destroy(model)
import numpy as np

# Generate embeddings for multiple images
images = ["cat1.jpg", "cat2.jpg", "dog.jpg"]
embeddings = [cactus_image_embed(model, img) for img in images]

# Find most similar to query image
query_embedding = cactus_image_embed(model, "query.jpg")
similarities = [np.dot(query_embedding, emb) for emb in embeddings]

most_similar_idx = np.argmax(similarities)
print(f"Most similar: {images[most_similar_idx]}")
print(f"Similarity: {similarities[most_similar_idx]:.3f}")

Audio Embeddings

Generate embeddings from audio using transcription models:
from cactus import cactus_init, cactus_audio_embed, cactus_destroy

model = cactus_init("weights/parakeet-ctc-1.1b", None, False)

# Generate audio embedding
embedding = cactus_audio_embed(model, "speech.wav")
print(f"Audio embedding: {len(embedding)} dimensions")

cactus_destroy(model)
Audio embeddings are useful for speaker verification, audio classification, and finding similar audio segments.

Batch Processing

Process multiple texts efficiently:
texts = [
    "First document",
    "Second document",
    "Third document"
]

embeddings = []
for text in texts:
    emb = cactus_embed(model, text, normalize=True)
    embeddings.append(emb)

# Convert to numpy array
import numpy as np
embeddings_matrix = np.array(embeddings)
print(f"Shape: {embeddings_matrix.shape}")
Implement semantic search with text embeddings:
import numpy as np

# Document corpus
documents = [
    "Cactus is an AI inference engine for mobile devices",
    "Python is a programming language",
    "The weather is sunny today",
    "Machine learning models run on smartphones"
]

# Generate document embeddings
doc_embeddings = [cactus_embed(model, doc, True) for doc in documents]

# Search query
query = "on-device AI inference"
query_embedding = cactus_embed(model, query, True)

# Compute similarities
similarities = [np.dot(query_embedding, doc_emb) for doc_emb in doc_embeddings]

# Get top results
top_indices = np.argsort(similarities)[::-1][:3]
for idx in top_indices:
    print(f"Score: {similarities[idx]:.3f} - {documents[idx]}")

Multimodal Embeddings

Combine text and image embeddings in the same space:
model = cactus_init("weights/lfm2-vl-1.6b", None, False)

# Text and image share the same embedding space
text_emb = cactus_embed(model, "A photo of a cat", True)
image_emb = cactus_image_embed(model, "cat.jpg")

# Cross-modal similarity
similarity = np.dot(text_emb, image_emb)
print(f"Text-image similarity: {similarity:.3f}")

Pooling Strategies

For models that support it, choose pooling method:
# Mean pooling (default)
embedding = cactus_embed(model, "Text", normalize=True)

# CLS token pooling (model-dependent)
# Automatically selected based on model architecture

Performance Tips

Reuse the same model instance for multiple embeddings to avoid reloading weights.
# Good: Reuse model
model = cactus_init("weights/lfm2-350m", None, False)
for text in large_dataset:
    emb = cactus_embed(model, text, True)
    # process embedding
cactus_destroy(model)

# Bad: Reload model each time
for text in large_dataset:
    model = cactus_init("weights/lfm2-350m", None, False)
    emb = cactus_embed(model, text, True)
    cactus_destroy(model)

Error Handling

try:
    embedding = cactus_embed(model, "Text", True)
except RuntimeError as e:
    print(f"Embedding failed: {e}")
    error = cactus_get_last_error()
    if error:
        print(f"Details: {error}")

Next Steps

RAG Guide

Build retrieval-augmented generation with embeddings

Vector Index

Store and query embeddings efficiently

Vision Models

Use multimodal vision-language models

API Reference

Complete embeddings API documentation