FastEmbed

Understanding Sparse Embeddings

Sparse embeddings represent text as high-dimensional vectors where most values are zero and only a small number of dimensions are activated. Each active dimension corresponds to a specific term or learned feature, making sparse vectors inherently interpretable. Unlike dense embeddings that spread meaning across all dimensions, sparse vectors excel at capturing exact keyword matches and domain-specific terminology that dense models may overlook. This is particularly valuable in technical, legal, or medical domains where precise term matching is critical.

The real power of sparse embeddings emerges when they are combined with dense vectors in a hybrid search architecture. Dense embeddings from Jina handle semantic understanding, capturing paraphrases and conceptual similarity, while sparse embeddings from FastEmbed ensure that important keywords are not missed. IntelligenceBox uses Qdrant named vectors to store both representations alongside each document chunk, and at query time the scores from both searches are fused to produce a final ranking that benefits from both approaches.

Request Format

POST /embeddings/sparse

{
  "texts": [
    "First document text",
    "Second document text"
  ],
  "collection_name": "my-collection",
  "is_query": false,              // true for query encoding
  "generate_all_sparse": true,    // generate all sparse methods
  "method": "bm25",               // or "splade", "minicol"
  "sparse_method": "bm25"         // specific sparse method
}

Response Format

{
  "success": true,
  "sparse_embeddings": [
    {
      "indices": [42, 156, 892, 1024, ...],
      "values": [0.5, 0.3, 0.8, 0.2, ...]
    }
  ],
  // Alternative format with multiple methods
  "embeddings": {
    "bm25": {
      "vectors": [
        { "indices": [...], "values": [...] }
      ]
    },
    "splade": {
      "vectors": [
        { "indices": [...], "values": [...] }
      ]
    }
  },
  "count": 2,
  "method": "bm25"
}

TypeScript Client

Using FastEmbedClient

import { FastEmbedClient } from '@/services/gpu/fastEmbedClient';

// Default timeout: 5 minutes per batch
const client = new FastEmbedClient('http://fastembed:8083');

// Generate sparse embeddings for documents
const result = await client.generateSparse({
  texts: ['Document 1 content', 'Document 2 content'],
  collectionName: 'my-collection',
  isQuery: false,
  method: 'bm25',
});

// Generate sparse embedding for a query
const queryResult = await client.generateSparse({
  texts: ['search query'],
  collectionName: 'my-collection',
  isQuery: true,  // Important for query encoding
  method: 'bm25',
});

// Access sparse vectors
for (const embedding of result.sparse_embeddings) {
  console.log('Indices:', embedding.indices);
  console.log('Values:', embedding.values);
}

Method	Description	Best For
`bm25`	Classic BM25 term weighting	General keyword search
`splade`	Learned sparse representations	Semantic + keyword hybrid
`minicol`	Compact learned sparse	Memory-efficient search

Method

Description

Best For

bm25

Classic BM25 term weighting

General keyword search

splade

Learned sparse representations

Semantic + keyword hybrid

minicol

Compact learned sparse

Memory-efficient search

Hybrid Search Pattern

Combine sparse and dense vectors for best results:

// 1. Generate dense embeddings (e.g., with Jina)
const denseEmbeddings = await jinaClient.embedImages(images);

// 2. Generate sparse embeddings
const sparseEmbeddings = await fastEmbedClient.generateSparse({
  texts: documents,
  collectionName: 'my-collection',
  method: 'bm25',
});

// 3. Store both in Qdrant with named vectors
// 4. Search using both vectors with score fusion

Understanding Sparse Embeddings

Service Info

Use Cases

Request Format

Response Format

TypeScript Client

Sparse Methods

Hybrid Search Pattern