GPU Services

GPU-accelerated microservices for document processing, embeddings, entity extraction, and AI operations.

Architecture

GPU services run as Docker containers on the IntelligenceBox internal network. Each service exposes HTTP endpoints that the main box-server calls for specialized AI operations.

Network Architecture

┌──────────────────────────────────────────────────────────┐
│                    Docker Network                         │
│                                                          │
│  ┌─────────────┐    HTTP    ┌────────────────────────┐  │
│  │ box-server  │ ─────────► │  GPU Service Container │  │
│  │   (API)     │            │  (colpali:8001)        │  │
│  └─────────────┘            │  (docling:8080)        │  │
│                             │  (fastembed:8083)      │  │
│                             │  (gliner:8093)         │  │
│                             │  (jina:8080)           │  │
│                             │  (reranker:8081)       │  │
│                             │  (tables:8098)         │  │
│                             └────────────────────────┘  │
└──────────────────────────────────────────────────────────┘

Common Patterns

HTTP Communication

All GPU services use HTTP POST for operations and return JSON responses:

Client Pattern

const response = await fetch(`http://${serviceName}:${port}/endpoint`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify(payload),
});

const result = await response.json();

Error Handling

All clients include timeout mechanisms and error handling:

if (!response.ok) {
  const errorText = await response.text();
  throw new Error(`Service error (${response.status}): ${errorText}`);
}

Available Services

ColPali

:8001

Multi-vector document vision embeddings for visual document retrieval

Docling

:8080

PDF processing with text, table, and figure extraction

FastEmbed

:8083

Sparse embeddings (BM25, SPLADE) for hybrid search

GLiNER

:8093

Zero-shot named entity recognition

Jina Embeddings

:8080

Multimodal embeddings for text and images

Jina Reranker

:8081

Document reranking for improved search relevance

Table Extraction

:8098

Extract structured tables from PDF documents

Service Locations

Service	Internal URL	Client
ColPali	`http://colpali:8001`	colpaliClient.ts
Docling	`http://docling:8080`	doclingClient.ts
FastEmbed	`http://fastembed:8083`	fastEmbedClient.ts
GLiNER	`http://gliner:8093`	glinerClient.ts
Jina Embeddings	`http://jina:8080`	jinaClient.ts
Jina Reranker	`http://jina-reranker:8081`	jinaRerankerClient.ts
Table Extraction	`http://table-extractor:8098`	tableExtractionClient.ts