GLiNER
Zero-shot named entity recognition (NER) that can extract any entity type without training. Simply provide the entity labels you want to extract.
What Is Named Entity Recognition?
Named Entity Recognition (NER) is the task of identifying and classifying key elements in text into predefined categories such as people, organizations, locations, dates, and monetary values. Traditional NER systems require labeled training data for each entity type, which makes them expensive to adapt to new domains. GLiNER takes a zero-shot approach: instead of training on labeled examples, you simply describe the entity types you want at inference time, and the model extracts matching spans from the input text. This makes it ideal for rapidly prototyping extraction pipelines or working with domain-specific entity types that off-the-shelf models do not cover.
Within the IntelligenceBox pipeline, GLiNER is used during document ingestion to enrich chunks with structured entity metadata. Extracted entities can be stored alongside embeddings in the vector database, enabling filtered retrieval such as finding all documents that mention a specific company or person. This metadata layer significantly improves the precision of RAG queries by allowing the system to narrow search results before semantic ranking occurs.
Service Info
Port
8093
Internal URL
http://gliner:8093
Endpoint
/predict
Type
Zero-shot NER
Use Cases
- Extract custom entities from documents without training
- Identify people, organizations, locations, dates
- Extract domain-specific entities (products, technical terms)
- Build knowledge graphs from unstructured text
- Enrich document metadata for search
Request Format
{
"text": "Apple Inc. was founded by Steve Jobs in Cupertino, California in 1976.",
"labels": ["company", "person", "location", "date"],
"threshold": 0.5, // Confidence threshold (0-1)
"max_chars": 10000, // Max characters to process
"max_tokens": 2048, // Max tokens per chunk
"chunk_overlap": 50 // Token overlap between chunks
}Response Format
{
"success": true,
"model_name": "gliner-base",
"labels": ["company", "person", "location", "date"],
"threshold": 0.5,
"count": 4,
"entities": [
{
"label": "company",
"text": "Apple Inc.",
"score": 0.95,
"start": 0,
"end": 10
},
{
"label": "person",
"text": "Steve Jobs",
"score": 0.92,
"start": 26,
"end": 36
},
{
"label": "location",
"text": "Cupertino, California",
"score": 0.89,
"start": 40,
"end": 61
},
{
"label": "date",
"text": "1976",
"score": 0.87,
"start": 65,
"end": 69
}
]
}TypeScript Client
import { GlinerClient } from '@/services/gpu/glinerClient';
const client = new GlinerClient('http://gliner:8093');
// Extract entities with custom labels
const result = await client.predict(
"The contract was signed by John Smith at Microsoft headquarters on January 15, 2024.",
{
labels: ["person", "company", "location", "date"],
threshold: 0.5,
}
);
// Process extracted entities
for (const entity of result.entities) {
console.log(`${entity.label}: "${entity.text}" (score: ${entity.score})`);
}
// Filter by confidence
const highConfidence = result.entities.filter(e => e.score > 0.8);cURL Example
curl -X POST http://gliner:8093/predict \
-H "Content-Type: application/json" \
-d '{
"text": "Apple Inc. was founded by Steve Jobs in Cupertino.",
"labels": ["company", "person", "location"],
"threshold": 0.5
}'Common Entity Labels
General
person- People namesorganization- Companies, institutionslocation- Places, addressesdate- Dates, time periodsmoney- Currency amounts
Domain-Specific
product- Product namestechnology- Tech termsmedical_condition- Health termslegal_term- Legal conceptscustom_label- Anything you need!
Tips
- Label naming: Use descriptive, lowercase labels. The model understands natural language.
- Threshold tuning: Start with 0.5, increase for precision, decrease for recall.
- Long documents: Use
max_charsandchunk_overlapfor large texts. - Performance: Fewer labels = faster inference. Group related labels when possible.
