DocsGPU ServicesFastEmbed (Sparse Vectors)

FastEmbed (Sparse Vectors)

Sparse embeddings service providing BM25, SPLADE, and MiniCOIL methods. Sparse vectors complement dense embeddings for hybrid search.

Port: 8083

What it Does

FastEmbed generates sparse vector representations of text:

  • BM25 - Traditional keyword-based scoring (multilingual)
  • SPLADE - Semantic sparse representations
  • MiniCOIL - Contextualized sparse embeddings

Sparse vectors are useful for:

  • Hybrid search (combine with dense embeddings)
  • Keyword matching with semantic awareness
  • Efficient retrieval with inverted indices

API Endpoints

Generate Sparse Embeddings

Endpoint: POST /embed

curl -X POST "http://fastembed-server:8083/embed" \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["What is machine learning?", "Deep learning tutorial"],
    "method": "bm25"
  }'

Request:

ParameterTypeDescription
textsarrayList of texts to embed
methodstringbm25, splade, or minicoil

Response:

{
  "embeddings": [
    {"indices": [1, 5, 23, ...], "values": [0.8, 0.5, 0.3, ...]},
    {"indices": [2, 8, 15, ...], "values": [0.7, 0.6, 0.4, ...]}
  ],
  "method": "bm25"
}

Health Check

curl http://fastembed-server:8083/health

Usage Example

import requests
 
# Generate BM25 sparse vectors
response = requests.post(
    "http://fastembed-server:8083/embed",
    json={
        "texts": ["machine learning algorithms", "neural network training"],
        "method": "bm25"
    }
)
 
result = response.json()
for i, emb in enumerate(result["embeddings"]):
    print(f"Text {i}: {len(emb['indices'])} non-zero dimensions")

Methods Comparison

MethodSpeedSemanticBest For
BM25FastLowKeyword search
SPLADEMediumHighHybrid search
MiniCOILSlowHighPrecision retrieval

Hybrid Search Pattern

Combine sparse and dense vectors for best results:

# 1. Get dense embeddings (from Jina or ColPali)
dense = get_dense_embedding(query)
 
# 2. Get sparse embeddings
sparse = requests.post(
    "http://fastembed-server:8083/embed",
    json={"texts": [query], "method": "splade"}
).json()
 
# 3. Search with both (in your vector DB)
results = vector_db.hybrid_search(
    dense_vector=dense,
    sparse_vector=sparse["embeddings"][0]
)