Skip to content

OpenAI-compatible chunk embedder

Overview

The OpenAICompatibleChunkEmbedder embeds document chunks using any OpenAI-compatible /v1/embeddings endpoint. It works with OpenAI directly as well as self-hosted servers such as Ollama, vLLM, and LiteLLM, making it suitable for both cloud and air-gapped deployments.

Configuration

Configuration happens through Pydantic fields passed at construction:

embedder = OpenAICompatibleChunkEmbedder(
    base_url="http://localhost:11434/v1",
    api_key="ollama",           # Any non-empty string for servers without auth
    model="qwen3-embedding:8b", # Default
    timeout=60.0,               # Optional, default 60 seconds
)

Usage and Practical Examples

Basic Embedding

from database_builder_libs.models.chunk import Chunk
from database_builder_libs.utility.embed_chunk.openai_compatible import OpenAICompatibleChunkEmbedder

embedder = OpenAICompatibleChunkEmbedder(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
)

chunks = [
    Chunk(document_id="doc-1", chunk_index=0, text="Hello world", vector=[], metadata={}),
    Chunk(document_id="doc-1", chunk_index=1, text="Second chunk", vector=[], metadata={}),
]

embedded = embedder.embed(chunks)
# Each chunk now has its vector field populated
print(embedded[0].vector)  # [0.023, -0.117, ...]

Using with OpenAI

embedder = OpenAICompatibleChunkEmbedder(
    base_url="https://api.openai.com/v1",
    api_key="sk-...",
    model="text-embedding-3-small",
)

Using with vLLM

embedder = OpenAICompatibleChunkEmbedder(
    base_url="http://my-vllm-server:8000/v1",
    api_key="token",
    model="BAAI/bge-m3",
    timeout=120.0,  # Increase for large batches on slower hardware
)

Empty Input

# Empty input is handled gracefully without calling the API
result = embedder.embed([])
assert result == []

Behavior and Edge Cases

Response Ordering

The embedder sorts the API response by the index field before reconstructing chunks. This guards against servers that return embeddings out of order:

# Safe regardless of server response ordering
embedded = embedder.embed(chunks)
assert embedded[0].chunk_index == chunks[0].chunk_index

Batch Size Mismatch

If the server returns a different number of vectors than chunks submitted, a RuntimeError is raised before any chunks are reconstructed:

# Raises RuntimeError: Embedding batch size mismatch: got 2 vectors for 3 chunks.
embedded = embedder.embed(three_chunks)

Timeout Configuration

# Disable timeout entirely for very large batches
embedder = OpenAICompatibleChunkEmbedder(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
    timeout=None,
)

Docstring

OpenAICompatibleChunkEmbedder


              flowchart TD
              database_builder_libs.utility.embed_chunk.openai_compatible.OpenAICompatibleChunkEmbedder[OpenAICompatibleChunkEmbedder]
              database_builder_libs.models.abstract_chunk_embedder.AbstractChunkEmbedder[AbstractChunkEmbedder]

                              database_builder_libs.models.abstract_chunk_embedder.AbstractChunkEmbedder --> database_builder_libs.utility.embed_chunk.openai_compatible.OpenAICompatibleChunkEmbedder
                


              click database_builder_libs.utility.embed_chunk.openai_compatible.OpenAICompatibleChunkEmbedder href "" "database_builder_libs.utility.embed_chunk.openai_compatible.OpenAICompatibleChunkEmbedder"
              click database_builder_libs.models.abstract_chunk_embedder.AbstractChunkEmbedder href "" "database_builder_libs.models.abstract_chunk_embedder.AbstractChunkEmbedder"
            

Chunk embedder backed by any OpenAI-compatible /v1/embeddings endpoint.

Works with OpenAI and any OpenAI-compatible server (e.g. Ollama, vLLM, LiteLLM) to embed chunks in a single batched request.

Attributes

base_url: Base URL of the embeddings server, e.g. "http://localhost:11434/v1". api_key: API key passed to the underlying openai.OpenAI client. Use any non-empty string for servers that do not enforce authentication. model: Model identifier forwarded to the /v1/embeddings endpoint. timeout: Per-request timeout in seconds. None disables the timeout.