Skip to content

Qdrant store

Overview

QdrantDatastore is a concrete AbstractVectorStore implementation backed by the Qdrant vector database. It stores document Chunk objects, each identified by a deterministic hash‑derived point ID, and provides fast cosine‑similarity search.

Design notes

Configuration Example

config = {
    "url": "http://localhost:6333",
    "collection": "knowledge_base",
    "vector_size": 768  # Must match embedding model output
}

Docstring

qdrant_store

Classes:

QdrantDatastore

QdrantDatastore()

              flowchart TD
              database_builder_libs.stores.qdrant.qdrant_store.QdrantDatastore[QdrantDatastore]
              database_builder_libs.models.abstract_vector_store.AbstractVectorStore[AbstractVectorStore]

                              database_builder_libs.models.abstract_vector_store.AbstractVectorStore --> database_builder_libs.stores.qdrant.qdrant_store.QdrantDatastore
                


              click database_builder_libs.stores.qdrant.qdrant_store.QdrantDatastore href "" "database_builder_libs.stores.qdrant.qdrant_store.QdrantDatastore"
              click database_builder_libs.models.abstract_vector_store.AbstractVectorStore href "" "database_builder_libs.models.abstract_vector_store.AbstractVectorStore"
            

Qdrant implementation of the semantic vector store.

Stores Chunk embeddings and enables similarity-based retrieval.

Conceptual model

Document → multiple Chunks → embedding vectors → nearest neighbour search

Identity

Each chunk is uniquely identified by: (document_id, chunk_index)

This pair is deterministically mapped to a stable Qdrant point id using hashing. Re-indexing the same document overwrites existing vectors instead of duplicating them.

Stored payload

Each vector stores: document_id chunk_index text metadata...

Retrieval never returns embeddings — only semantic matches.

Consistency guarantees

  • Idempotent writes (upsert)
  • Stable ranking for unchanged index
  • No duplicate chunks returned
  • Full document deletion removes all vectors (GDPR requirement)

Embedding requirements

All stored vectors must: - Match configured dimensionality - Be generated by the same embedding model - Use cosine similarity

Methods:

Source code in src/database_builder_libs/stores/qdrant/qdrant_store.py
69
70
71
72
73
def __init__(self) -> None:
    super().__init__()
    self.client: QdrantClient | None = None
    self.collection: str | None = None
    self.vector_size: int | None = None

connect

connect(config: dict | None = None) -> None

Initialize the vector index and verify accessibility.

This method should: - Create index if missing - Validate embedding dimensionality - Validate distance metric compatibility

Raises

ConnectionError Backend unreachable. RuntimeError Index exists but is incompatible.

Source code in src/database_builder_libs/models/abstract_vector_store.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
def connect(self, config: dict | None = None) -> None:
    """
    Initialize the vector index and verify accessibility.

    This method should:
    - Create index if missing
    - Validate embedding dimensionality
    - Validate distance metric compatibility

    Raises
    ------
    ConnectionError
        Backend unreachable.
    RuntimeError
        Index exists but is incompatible.
    """
    if self._connected:
        return

    self._connecting = True
    try:
        self._connect_impl(config)
        self._connected = True
    finally:
        self._connecting = False

delete_document

delete_document(document_id: DocumentId) -> int

Permanently remove all vectors for a document.

Guarantees

After completion, no chunk from this document will appear in similarity_search() results.

Returns

int Number of deleted chunks.

Source code in src/database_builder_libs/stores/qdrant/qdrant_store.py
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
def delete_document(self, document_id: DocumentId) -> int:
    """
    Permanently remove all vectors for a document.

    Guarantees
    ----------
    After completion, no chunk from this document will appear
    in similarity_search() results.

    Returns
    -------
    int
        Number of deleted chunks.
    """

    self._ensure_connected()

    chunks = self.get_document_chunks(document_id)
    if not chunks:
        return 0

    filt = Filter(
        must=[FieldCondition(key=DOC_ID, match=MatchValue(value=document_id))]
    )

    result = self._client().delete(
        collection_name=self._collection(),
        points_selector=filt,
        wait=True,
    )

    if result.status != "completed":
        raise RuntimeError(f"Qdrant delete failed: {result.status}")

    return len(chunks)

get_document_chunks

get_document_chunks(document_id: DocumentId) -> List[Chunk]

Retrieve all chunks belonging to a document.

Returns chunks ordered by chunk_index to reconstruct document order.

Source code in src/database_builder_libs/stores/qdrant/qdrant_store.py
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
def get_document_chunks(self, document_id: DocumentId) -> List[Chunk]:
    """
    Retrieve all chunks belonging to a document.

    Returns chunks ordered by chunk_index to reconstruct document order.
    """
    self._ensure_connected()

    filt = Filter(
        must=[FieldCondition(key=DOC_ID, match=MatchValue(value=document_id))]
    )

    chunks: List[Chunk] = []
    offset = None

    while True:
        records, offset = self._client().scroll(
            collection_name=self._collection(),
            scroll_filter=filt,
            with_payload=True,
            with_vectors=False,
            limit=512,
            offset=offset,
        )

        for r in records:
            payload = r.payload or {}

            doc_id = payload.get(DOC_ID)
            idx = payload.get(CHUNK_INDEX)
            if doc_id is None or idx is None:
                continue

            chunks.append(
                Chunk(
                    document_id=doc_id,
                    chunk_index=idx,
                    text=payload.get(TEXT, ""),
                    vector=(),
                    metadata={
                        k: v
                        for k, v in payload.items()
                        if k not in (DOC_ID, CHUNK_INDEX, TEXT)
                    },
                )
            )

        if offset is None:
            break

    return sorted(chunks, key=lambda c: c.chunk_index)
similarity_search(vector: Sequence[float], limit: int = 10) -> List[Chunk]

Perform semantic nearest-neighbour search.

Returns

List[Chunk] Ordered by cosine similarity descending.

Notes
  • Returned chunks DO NOT include stored embeddings
  • Metadata and text are preserved
  • Results are deterministic for identical index state
Source code in src/database_builder_libs/stores/qdrant/qdrant_store.py
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
def similarity_search(
    self,
    vector: Sequence[float],
    limit: int = 10,
) -> List[Chunk]:
    """
    Perform semantic nearest-neighbour search.

    Returns
    -------
    List[Chunk]
        Ordered by cosine similarity descending.

    Notes
    -----
    - Returned chunks DO NOT include stored embeddings
    - Metadata and text are preserved
    - Results are deterministic for identical index state
    """
    self._ensure_connected()
    expected_dim = self._vector_size()
    if len(vector) != expected_dim:
        raise ValueError(
            f"Query vector has wrong dimension: expected {expected_dim}, got {len(vector)}"
        )
    response = self._client().query_points(
        collection_name=self._collection(),
        query=list(vector),
        limit=limit,
        with_payload=True,
        with_vectors=False,
    )

    results: List[Chunk] = []

    for point in response.points:
        payload = point.payload or {}
        doc_id = payload.get(DOC_ID)
        idx = payload.get(CHUNK_INDEX)
        if doc_id is None or idx is None:
            continue

        results.append(
            Chunk(
                document_id=doc_id,
                chunk_index=idx,
                text=payload.get(TEXT, ""),
                vector=(),  # never return query vector
                metadata={
                    k: v
                    for k, v in payload.items()
                    if k not in (DOC_ID, CHUNK_INDEX, TEXT)
                },
            )
        )

    return results