Jina Reranker
Fast document reranking service using Jina Reranker M0. Improves search results by scoring document relevance to queries.
Port: 8081
Model: jinaai/jina-reranker-m0-GGUF (Q4_K_M quantized)
What it Does
Reranking improves search quality by rescoring initial retrieval results:
- Query-document scoring - Relevance scores for query-document pairs
- Fast inference - Quantized model for speed
- Batch processing - Handle multiple documents efficiently
Use cases:
- Improve RAG retrieval accuracy
- Re-order search results by relevance
- Filter low-relevance documents
API Endpoints
Rerank Documents
Endpoint: POST /rerank
curl -X POST "http://jina-reranker:8081/rerank" \
-H "Content-Type: application/json" \
-d '{
"query": "What is machine learning?",
"documents": [
"Machine learning is a subset of artificial intelligence...",
"Python is a programming language...",
"Neural networks are computational models..."
],
"top_k": 5,
"return_documents": true
}'Request:
| Parameter | Type | Description |
|---|---|---|
query | string | Search query |
documents | array | Documents to rerank |
top_k | number | Return top K results (default: all) |
return_documents | boolean | Include document text in response |
Response:
{
"results": [
{
"index": 0,
"relevance_score": 0.95,
"document": "Machine learning is a subset of artificial intelligence..."
},
{
"index": 2,
"relevance_score": 0.78,
"document": "Neural networks are computational models..."
}
],
"query": "What is machine learning?",
"total_documents": 3,
"returned_documents": 2
}Health Check
curl http://jina-reranker:8081/healthUsage Example
import requests
# Initial search results (from vector search)
search_results = [
"Python is a programming language used in data science.",
"Machine learning algorithms learn from data patterns.",
"TensorFlow is a deep learning framework by Google.",
"JavaScript is used for web development.",
"Neural networks mimic biological brain structures."
]
# Rerank by relevance to query
response = requests.post(
"http://jina-reranker:8081/rerank",
json={
"query": "How do machine learning models work?",
"documents": search_results,
"top_k": 3,
"return_documents": True
}
)
for result in response.json()["results"]:
print(f"Score: {result['relevance_score']:.2f} - {result['document'][:50]}...")RAG Pipeline Integration
Use reranking to improve RAG quality:
# 1. Vector search (retrieve candidates)
candidates = vector_db.search(query, top_k=20)
# 2. Rerank for precision
reranked = requests.post(
"http://jina-reranker:8081/rerank",
json={
"query": query,
"documents": [c["text"] for c in candidates],
"top_k": 5
}
).json()
# 3. Use top reranked results as context
context = [r["document"] for r in reranked["results"]]Performance
- Throughput: ~100-200 document pairs/second
- Latency: ~50-100ms for 10 documents
- Max batch: 100 documents
Configuration
| Variable | Default | Description |
|---|---|---|
MODEL_ID | jinaai/jina-reranker-m0-GGUF | Model repo |
MODEL_FILE | jina-reranker-m0-Q4_K_M.gguf | GGUF file |
CTX_SIZE | 8192 | Context window |
N_GPU_LAYERS | 999 | GPU layers to offload |