DocsGPU ServicesJina Reranker

Jina Reranker

Fast document reranking service using Jina Reranker M0. Improves search results by scoring document relevance to queries.

Port: 8081
Model: jinaai/jina-reranker-m0-GGUF (Q4_K_M quantized)

What it Does

Reranking improves search quality by rescoring initial retrieval results:

  • Query-document scoring - Relevance scores for query-document pairs
  • Fast inference - Quantized model for speed
  • Batch processing - Handle multiple documents efficiently

Use cases:

  • Improve RAG retrieval accuracy
  • Re-order search results by relevance
  • Filter low-relevance documents

API Endpoints

Rerank Documents

Endpoint: POST /rerank

curl -X POST "http://jina-reranker:8081/rerank" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is machine learning?",
    "documents": [
      "Machine learning is a subset of artificial intelligence...",
      "Python is a programming language...",
      "Neural networks are computational models..."
    ],
    "top_k": 5,
    "return_documents": true
  }'

Request:

ParameterTypeDescription
querystringSearch query
documentsarrayDocuments to rerank
top_knumberReturn top K results (default: all)
return_documentsbooleanInclude document text in response

Response:

{
  "results": [
    {
      "index": 0,
      "relevance_score": 0.95,
      "document": "Machine learning is a subset of artificial intelligence..."
    },
    {
      "index": 2,
      "relevance_score": 0.78,
      "document": "Neural networks are computational models..."
    }
  ],
  "query": "What is machine learning?",
  "total_documents": 3,
  "returned_documents": 2
}

Health Check

curl http://jina-reranker:8081/health

Usage Example

import requests
 
# Initial search results (from vector search)
search_results = [
    "Python is a programming language used in data science.",
    "Machine learning algorithms learn from data patterns.",
    "TensorFlow is a deep learning framework by Google.",
    "JavaScript is used for web development.",
    "Neural networks mimic biological brain structures."
]
 
# Rerank by relevance to query
response = requests.post(
    "http://jina-reranker:8081/rerank",
    json={
        "query": "How do machine learning models work?",
        "documents": search_results,
        "top_k": 3,
        "return_documents": True
    }
)
 
for result in response.json()["results"]:
    print(f"Score: {result['relevance_score']:.2f} - {result['document'][:50]}...")

RAG Pipeline Integration

Use reranking to improve RAG quality:

# 1. Vector search (retrieve candidates)
candidates = vector_db.search(query, top_k=20)
 
# 2. Rerank for precision
reranked = requests.post(
    "http://jina-reranker:8081/rerank",
    json={
        "query": query,
        "documents": [c["text"] for c in candidates],
        "top_k": 5
    }
).json()
 
# 3. Use top reranked results as context
context = [r["document"] for r in reranked["results"]]

Performance

  • Throughput: ~100-200 document pairs/second
  • Latency: ~50-100ms for 10 documents
  • Max batch: 100 documents

Configuration

VariableDefaultDescription
MODEL_IDjinaai/jina-reranker-m0-GGUFModel repo
MODEL_FILEjina-reranker-m0-Q4_K_M.ggufGGUF file
CTX_SIZE8192Context window
N_GPU_LAYERS999GPU layers to offload