Abstract source¶
Overview¶
The AbstractSource class defines the contract that all concrete data‑source adapters must implement. It encapsulates the lifecycle of a synchronizable external system, providing a clear separation between connection handling, artefact discovery, and content retrieval. By adhering to this interface, different back‑ends (e.g., Zotero, SharePoint, REST APIs) can be swapped interchangeably while the rest of the pipeline remains agnostic to the underlying source.
Design notes¶
Interaction Pattern¶
The AbstractSource follows a three-phase interaction pattern:
- Connection Phase: Establish connection to external system using backend-specific configuration
- Discovery Phase: Query for artefacts modified since last synchronization timestamp
- Retrieval Phase: Fetch normalized content for discovered artefacts
This design enables efficient incremental synchronization while maintaining consistency through stable identifiers and deterministic content serialization.
Docstring¶
abstract_source
¶
Classes:
-
AbstractSource–Abstract interface describing a synchronizable external data source.
-
Content–Representation of a single artefact retrieved from a source.
AbstractSource
¶
flowchart TD
database_builder_libs.models.abstract_source.AbstractSource[AbstractSource]
click database_builder_libs.models.abstract_source.AbstractSource href "" "database_builder_libs.models.abstract_source.AbstractSource"
Abstract interface describing a synchronizable external data source.
A Source implementation is responsible for: 1. Establishing a connection to a remote system 2. Discovering which artefacts changed since a timestamp 3. Retrieving normalized content for those artefacts
The interface is designed for incremental synchronization workflows.
Lifecycle¶
connect_to_source() MUST be called before any other method.
Consistency guarantees¶
Implementations must ensure: - Stable artefact identifiers across runs - Monotonic modification timestamps per artefact - Deterministic content serialization
Typical implementations: SharePoint, Zotero, REST APIs, file repositories, databases.
Methods:
-
connect–Establish connection to the external source.
-
get_content–Retrieve normalized content for provided artefacts.
-
get_list_artefacts–Return identifiers of artefacts modified since
last_synced.
connect
¶
Establish connection to the external source.
Idempotent: safe to call multiple times.
Raises¶
ConnectionError PermissionError ValueError
Source code in src/database_builder_libs/models/abstract_source.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
get_content
abstractmethod
¶
Retrieve normalized content for provided artefacts.
Parameters¶
artefacts : list[tuple[str, datetime]] Artefacts returned from get_list_artefacts().
Returns¶
list[Content] Content objects corresponding to requested artefacts.
Guarantees¶
- One Content object per artefact_id
- Returned content.date must match the provided timestamp unless the source updated during retrieval.
Notes¶
Implementations should batch requests where possible.
Raises¶
RuntimeError If called before connect_to_source(). KeyError If an artefact no longer exists.
Source code in src/database_builder_libs/models/abstract_source.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | |
get_list_artefacts
abstractmethod
¶
Return identifiers of artefacts modified since last_synced.
Parameters¶
last_synced : datetime | None UTC timestamp of last successful synchronization. If None, the implementation must return ALL available artefacts.
Returns¶
list[tuple[str, datetime]] A list of (artefact_id, last_modified_timestamp).
Requirements¶
- Returned timestamps must be timezone-aware.
- Each artefact_id must appear at most once.
- The list should be ordered by timestamp ascending if possible.
Raises¶
RuntimeError If called before connect_to_source().
Source code in src/database_builder_libs/models/abstract_source.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
Content
¶
flowchart TD
database_builder_libs.models.abstract_source.Content[Content]
click database_builder_libs.models.abstract_source.Content href "" "database_builder_libs.models.abstract_source.Content"
Representation of a single artefact retrieved from a source.
An artefact corresponds to a uniquely identifiable entity in the external system (e.g., SharePoint document, Zotero item, database record).
Attributes¶
date : datetime Last modification timestamp of the artefact in the source system. Must be timezone-aware (UTC recommended). id_ : str Stable unique identifier of the artefact in the source. This identifier MUST remain constant across synchronizations. content : dict Normalized payload retrieved from the source.
The structure is implementation specific but must be JSON-serializable
and deterministic: identical source state must produce identical dict.
- Main modules Sources