FastEmbed

Sparse embedding service supporting BM25, SPLADE, and other sparse vector methods for hybrid search. Combine with dense embeddings for improved retrieval accuracy.

Understanding Sparse Embeddings

Sparse embeddings represent text as high-dimensional vectors where most values are zero and only a small number of dimensions are activated. Each active dimension corresponds to a specific term or learned feature, making sparse vectors inherently interpretable. Unlike dense embeddings that spread meaning across all dimensions, sparse vectors excel at capturing exact keyword matches and domain-specific terminology that dense models may overlook. This is particularly valuable in technical, legal, or medical domains where precise term matching is critical.

The real power of sparse embeddings emerges when they are combined with dense vectors in a hybrid search architecture. Dense embeddings from Jina handle semantic understanding, capturing paraphrases and conceptual similarity, while sparse embeddings from FastEmbed ensure that important keywords are not missed. IntelligenceBox uses Qdrant named vectors to store both representations alongside each document chunk, and at query time the scores from both searches are fused to produce a final ranking that benefits from both approaches.

Service Info

Port

8083

Internal URL

http://fastembed:8083

Endpoint

/embeddings/sparse

Output

Sparse vectors

Use Cases

  • Hybrid search combining sparse and dense vectors
  • Keyword-aware retrieval with BM25
  • Learned sparse representations with SPLADE
  • Improved recall for domain-specific terminology

Request Format

POST /embeddings/sparse
{
  "texts": [
    "First document text",
    "Second document text"
  ],
  "collection_name": "my-collection",
  "is_query": false,              // true for query encoding
  "generate_all_sparse": true,    // generate all sparse methods
  "method": "bm25",               // or "splade", "minicol"
  "sparse_method": "bm25"         // specific sparse method
}

Response Format

{
  "success": true,
  "sparse_embeddings": [
    {
      "indices": [42, 156, 892, 1024, ...],
      "values": [0.5, 0.3, 0.8, 0.2, ...]
    }
  ],
  // Alternative format with multiple methods
  "embeddings": {
    "bm25": {
      "vectors": [
        { "indices": [...], "values": [...] }
      ]
    },
    "splade": {
      "vectors": [
        { "indices": [...], "values": [...] }
      ]
    }
  },
  "count": 2,
  "method": "bm25"
}

TypeScript Client

Using FastEmbedClient
import { FastEmbedClient } from '@/services/gpu/fastEmbedClient';

// Default timeout: 5 minutes per batch
const client = new FastEmbedClient('http://fastembed:8083');

// Generate sparse embeddings for documents
const result = await client.generateSparse({
  texts: ['Document 1 content', 'Document 2 content'],
  collectionName: 'my-collection',
  isQuery: false,
  method: 'bm25',
});

// Generate sparse embedding for a query
const queryResult = await client.generateSparse({
  texts: ['search query'],
  collectionName: 'my-collection',
  isQuery: true,  // Important for query encoding
  method: 'bm25',
});

// Access sparse vectors
for (const embedding of result.sparse_embeddings) {
  console.log('Indices:', embedding.indices);
  console.log('Values:', embedding.values);
}

Sparse Methods

MethodDescriptionBest For
bm25Classic BM25 term weightingGeneral keyword search
spladeLearned sparse representationsSemantic + keyword hybrid
minicolCompact learned sparseMemory-efficient search

Hybrid Search Pattern

Combine sparse and dense vectors for best results:

// 1. Generate dense embeddings (e.g., with Jina)
const denseEmbeddings = await jinaClient.embedImages(images);

// 2. Generate sparse embeddings
const sparseEmbeddings = await fastEmbedClient.generateSparse({
  texts: documents,
  collectionName: 'my-collection',
  method: 'bm25',
});

// 3. Store both in Qdrant with named vectors
// 4. Search using both vectors with score fusion