Skip to content

Abstract store

Overview

The abstract store is the interface which all database adapters share. It provides a unified API for persisting and retrieving Node objects across different backend implementations (SQL, NoSQL, Graph databases, etc.).

Design Notes

Interaction Pattern

The AbstractStore follows these interaction patterns:

  1. Storage Pattern:

    • Check if node exists (by id)
    • Update if exists, insert if new
    • Maintain idempotency
  2. Retrieval Pattern:

    • String filter: Return direct matches preserving multiplicity
    • None filter: Return canonical, deduplicated node set
    • Stable ordering for identical queries
  3. Deletion Pattern:

    • Single node removal with safety checks
    • Must fail on ambiguous matches

Docstring abstract node

node

Classes:

  • Node

    Canonical storage-agnostic representation of a knowledge entity.

Attributes:

  • EntityType

    Logical category of a Node.

  • KeyAttribute

    Name of the primary human-readable identifier inside a Node payload.

  • NodeId

    Globally stable identifier of a knowledge entity.

  • Payload

    Structured attributes describing a Node.

  • Relation

    Edge description between two Nodes.

EntityType module-attribute

EntityType = NewType('EntityType', str)

Logical category of a Node.

Represents ontology class — NOT a storage label.

Constraints

  • Stable for a given NodeId
  • Low cardinality
  • Used for indexing and filtering

Examples

"person" "document" "organization" "concept"

KeyAttribute module-attribute

KeyAttribute = NewType('KeyAttribute', str)

Name of the primary human-readable identifier inside a Node payload.

This is a presentation hint, not identity.

Examples

"name" "title" "email" "filename"

NodeId module-attribute

NodeId = NewType('NodeId', str)

Globally stable identifier of a knowledge entity.

Properties

  • Uniquely identifies the same real-world entity across all systems
  • Must be deterministic across ingestion runs
  • Must not encode storage-specific information (database IDs, row numbers)
  • Safe for use as foreign key in relations

Examples

"user:42" "doi:10.1000/182" "sharepoint:file:abc123"

Payload module-attribute

Payload = Mapping[str, object]

Structured attributes describing a Node.

Requirements

  • JSON-serializable
  • Order-independent
  • Deterministic for identical source data
  • Values must be immutable or safely copyable

This data may be merged across updates.

Relation module-attribute

Relation = Mapping[str, object]

Edge description between two Nodes.

Minimum required keys

"type" : str Relationship type "target" : NodeId Identifier of the related node

Optional keys

Any additional metadata describing the relation.

Constraints

  • Must not encode storage-specific fields
  • Must be JSON-serializable
  • Duplicate relations should be treated as identical

Node dataclass

Node(id: NodeId, payload_data: Payload = dict(), relations: Sequence[Relation] = tuple(), entity_type: EntityType = EntityType('node'), key_attribute: KeyAttribute = KeyAttribute('id'))

Canonical storage-agnostic representation of a knowledge entity.

A Node is the normalized form of structured information extracted from external sources. All database adapters MUST translate their internal records into this structure before persistence or retrieval.

Identity

The node identity is defined exclusively by id.

Nodes with identical id MUST represent the same real-world entity. Stores must overwrite existing nodes instead of creating duplicates.

Fields

id : NodeId Globally stable identifier of the entity. Must remain constant across synchronization runs and storage backends.

Mapping[str, object]

Structured attributes describing the entity (properties).

Requirements: - JSON-serializable - Deterministic for identical source state - Order-independent - Safe to merge across updates

Sequence[Relation]

Outgoing relationships from this node to other nodes.

Each relation mapping should minimally contain: { "type": , "target": }

Constraints: - Must not contain cyclic self-references unless meaningful - Order does not carry semantic meaning - Duplicate relations should be ignored by stores

EntityType

Logical category of the entity (e.g., "person", "document", "concept"). Used for indexing, filtering and schema interpretation. Must remain stable for a given node id.

KeyAttribute

Name of the primary human-readable identifier inside payload_data (e.g., "email", "title", "name").

Docstring abstract store

abstract_store

Classes:

  • AbstractStore

    Abstract persistence layer for storing and retrieving Nodes.

AbstractStore

AbstractStore()

              flowchart TD
              database_builder_libs.models.abstract_store.AbstractStore[AbstractStore]

              

              click database_builder_libs.models.abstract_store.AbstractStore href "" "database_builder_libs.models.abstract_store.AbstractStore"
            

Abstract persistence layer for storing and retrieving Nodes.

A Store represents a backend capable of persisting structured nodes and retrieving them via identifier lookup, textual filtering, or vector similarity.

Typical implementations: - SQL/NoSQL database - Vector database (e.g., embeddings search) - Graph database - In-memory index

Consistency requirements

Implementations must ensure: - Stable node identity across reads - Deterministic retrieval for identical queries - Idempotent storage: storing the same Node twice must not create duplicates

Methods:

  • connect

    Establish connection to the backend.

  • get_nodes

    Retrieve nodes from the store.

  • remove_node

    Remove a single node identified by filter.

  • store_node

    Persist a Node into the store.

Source code in src/database_builder_libs/models/abstract_store.py
27
28
29
def __init__(self) -> None:
    self._connected: bool = False
    self._connecting: bool = False

connect

connect(config: dict | None = None) -> None

Establish connection to the backend.

This method is idempotent. Calling it multiple times must be safe.

Parameters

config : Any | None Backend-specific configuration object.

Raises

ConnectionError Backend unreachable. RuntimeError Backend misconfigured.

Source code in src/database_builder_libs/models/abstract_store.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def connect(self, config: dict | None = None) -> None:
    """
    Establish connection to the backend.

    This method is idempotent. Calling it multiple times must be safe.

    Parameters
    ----------
    config : Any | None
        Backend-specific configuration object.

    Raises
    ------
    ConnectionError
        Backend unreachable.
    RuntimeError
        Backend misconfigured.
    """
    if self._connected:
        return

    self._connecting = True
    try:
        self._connect_impl(config)
        self._connected = True
    finally:
        self._connecting = False

get_nodes abstractmethod

get_nodes(filter: str | None) -> List[Node]

Retrieve nodes from the store.

Retrieval Modes

filter is interpreted as:

  • str → Selection query Returns nodes that directly match stored records. Multiple results representing different stored entities MUST be preserved (no merging/deduplication).

  • None → Reconstruction query Returns the canonical set of nodes represented by the backend. Implementations MUST merge overlapping representations and return a normalized, duplicate-free set of Nodes.

Returns

List[Node] Deterministically ordered list of nodes.

Guarantees
  • Stable ordering for identical queries if backend unchanged
  • filter=None returns a duplicate-free canonical node set
  • filter=str preserves multiplicity of stored entities
Raises

RuntimeError If called before connect().

Source code in src/database_builder_libs/models/abstract_store.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
@abstractmethod
def get_nodes(self, filter: str | None) -> List[Node]:
    """
    Retrieve nodes from the store.

    Retrieval Modes
    ---------------
    filter is interpreted as:

    - str  → Selection query
        Returns nodes that directly match stored records.
        Multiple results representing different stored entities
        MUST be preserved (no merging/deduplication).

    - None → Reconstruction query
        Returns the canonical set of nodes represented by the backend.
        Implementations MUST merge overlapping representations and
        return a normalized, duplicate-free set of Nodes.

    Returns
    -------
    List[Node]
        Deterministically ordered list of nodes.

    Guarantees
    ----------
    - Stable ordering for identical queries if backend unchanged
    - filter=None returns a duplicate-free canonical node set
    - filter=str preserves multiplicity of stored entities

    Raises
    ------
    RuntimeError
        If called before connect().
    """
    raise NotImplementedError

remove_node abstractmethod

remove_node(filter: str) -> Node

Remove a single node identified by filter.

Parameters

filter : str Unique identifier of the node to remove.

Returns

Node The removed node.

Behaviour
  • Must remove exactly one node
  • Must fail if multiple or zero matches
Raises

KeyError If no node matches the filter. ValueError If multiple nodes match the filter.

Source code in src/database_builder_libs/models/abstract_store.py
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
@abstractmethod
def remove_node(self, filter: str) -> Node:
    """
    Remove a single node identified by filter.

    Parameters
    ----------
    filter : str
        Unique identifier of the node to remove.

    Returns
    -------
    Node
        The removed node.

    Behaviour
    ---------
    - Must remove exactly one node
    - Must fail if multiple or zero matches

    Raises
    ------
    KeyError
        If no node matches the filter.
    ValueError
        If multiple nodes match the filter.
    """
    raise NotImplementedError

store_node abstractmethod

store_node(node: Node) -> None

Persist a Node into the store.

Behaviour
  • If the node already exists (same unique identifier), it must be updated.
  • Operation must be idempotent.
Parameters

node : Node The node to persist.

Raises

RuntimeError If called before connect_to_source(). ValueError If the node is invalid for this backend.

Source code in src/database_builder_libs/models/abstract_store.py
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
@abstractmethod
def store_node(self, node: Node) -> None:
    """
    Persist a Node into the store.

    Behaviour
    ---------
    - If the node already exists (same unique identifier), it must be updated.
    - Operation must be idempotent.

    Parameters
    ----------
    node : Node
        The node to persist.

    Raises
    ------
    RuntimeError
        If called before connect_to_source().
    ValueError
        If the node is invalid for this backend.
    """
    raise NotImplementedError