FastEmbed
Sparse embedding service supporting BM25, SPLADE, and other sparse vector methods for hybrid search. Combine with dense embeddings for improved retrieval accuracy.
Understanding Sparse Embeddings
Sparse embeddings represent text as high-dimensional vectors where most values are zero and only a small number of dimensions are activated. Each active dimension corresponds to a specific term or learned feature, making sparse vectors inherently interpretable. Unlike dense embeddings that spread meaning across all dimensions, sparse vectors excel at capturing exact keyword matches and domain-specific terminology that dense models may overlook. This is particularly valuable in technical, legal, or medical domains where precise term matching is critical.
The real power of sparse embeddings emerges when they are combined with dense vectors in a hybrid search architecture. Dense embeddings from Jina handle semantic understanding, capturing paraphrases and conceptual similarity, while sparse embeddings from FastEmbed ensure that important keywords are not missed. IntelligenceBox uses Qdrant named vectors to store both representations alongside each document chunk, and at query time the scores from both searches are fused to produce a final ranking that benefits from both approaches.
Service Info
Port
8083
Internal URL
http://fastembed:8083
Endpoint
/embeddings/sparse
Output
Sparse vectors
Use Cases
- Hybrid search combining sparse and dense vectors
- Keyword-aware retrieval with BM25
- Learned sparse representations with SPLADE
- Improved recall for domain-specific terminology
Request Format
{
"texts": [
"First document text",
"Second document text"
],
"collection_name": "my-collection",
"is_query": false, // true for query encoding
"generate_all_sparse": true, // generate all sparse methods
"method": "bm25", // or "splade", "minicol"
"sparse_method": "bm25" // specific sparse method
}Response Format
{
"success": true,
"sparse_embeddings": [
{
"indices": [42, 156, 892, 1024, ...],
"values": [0.5, 0.3, 0.8, 0.2, ...]
}
],
// Alternative format with multiple methods
"embeddings": {
"bm25": {
"vectors": [
{ "indices": [...], "values": [...] }
]
},
"splade": {
"vectors": [
{ "indices": [...], "values": [...] }
]
}
},
"count": 2,
"method": "bm25"
}TypeScript Client
import { FastEmbedClient } from '@/services/gpu/fastEmbedClient';
// Default timeout: 5 minutes per batch
const client = new FastEmbedClient('http://fastembed:8083');
// Generate sparse embeddings for documents
const result = await client.generateSparse({
texts: ['Document 1 content', 'Document 2 content'],
collectionName: 'my-collection',
isQuery: false,
method: 'bm25',
});
// Generate sparse embedding for a query
const queryResult = await client.generateSparse({
texts: ['search query'],
collectionName: 'my-collection',
isQuery: true, // Important for query encoding
method: 'bm25',
});
// Access sparse vectors
for (const embedding of result.sparse_embeddings) {
console.log('Indices:', embedding.indices);
console.log('Values:', embedding.values);
}Sparse Methods
| Method | Description | Best For |
|---|---|---|
bm25 | Classic BM25 term weighting | General keyword search |
splade | Learned sparse representations | Semantic + keyword hybrid |
minicol | Compact learned sparse | Memory-efficient search |
Hybrid Search Pattern
Combine sparse and dense vectors for best results:
// 1. Generate dense embeddings (e.g., with Jina)
const denseEmbeddings = await jinaClient.embedImages(images);
// 2. Generate sparse embeddings
const sparseEmbeddings = await fastEmbedClient.generateSparse({
texts: documents,
collectionName: 'my-collection',
method: 'bm25',
});
// 3. Store both in Qdrant with named vectors
// 4. Search using both vectors with score fusion