Zotero source¶
Overview¶
The Zotero source allows you to retrieve documents and metadata from a Zotero database using its API. It implements the AbstractSource interface to provide incremental synchronization of Zotero library items.
Design Notes¶
Interaction Patterns¶
The ZoteroSource follows these interaction patterns:
-
Connection Pattern:
- Initialize with library credentials
- Create pyzotero client instance
- Optional collection filtering
-
Synchronization Pattern:
- Query items modified since last sync
- Convert Zotero timestamps to UTC
- Return stable item keys
-
Content Retrieval Pattern:
- Fetch full item metadata
- Normalize to Content objects
- Preserve Zotero data structure
-
Attachment Download Pattern:
- Check for local file availability first
- Fall back to API download if needed
- Save as
{item_id}.pdf
Implementation Details¶
- Timestamp Handling: All timestamps converted to UTC for consistency
- Deletion Limitation: Zotero API doesn't report deleted items in sync
- Attachment Priority: Prefers local Zotero storage over API downloads for performance
- Error Handling: Gracefully handles missing attachments and continues processing
Docstring¶
zotero_source
¶
Classes:
-
ZoteroSource–Zotero implementation of AbstractSource.
ZoteroSource
¶
flowchart TD
database_builder_libs.sources.zotero_source.ZoteroSource[ZoteroSource]
database_builder_libs.models.abstract_source.AbstractSource[AbstractSource]
database_builder_libs.models.abstract_source.AbstractSource --> database_builder_libs.sources.zotero_source.ZoteroSource
click database_builder_libs.sources.zotero_source.ZoteroSource href "" "database_builder_libs.sources.zotero_source.ZoteroSource"
click database_builder_libs.models.abstract_source.AbstractSource href "" "database_builder_libs.models.abstract_source.AbstractSource"
Zotero implementation of AbstractSource.
Provides incremental synchronization of a Zotero library and exposes items
as canonical Content objects.
Mapping¶
Zotero item → Content item.key → Content.id_ item.data → Content.content item.dateModified → Content.date
Synchronization semantics¶
- get_list_artefacts() performs incremental sync using Zotero
since - Returned timestamps are UTC
- Identifiers are stable across runs
- Deleted items are NOT reported (Zotero API limitation)
Attachment handling¶
download_zotero_item() retrieves the first attachment: - Prefers local Zotero storage when available - Falls back to API download
Lifecycle¶
connect() must be called before using the source.
Methods:
-
connect–Establish connection to the external source.
-
download_zotero_item–Download the first attachment of specified zotero item to specified path
-
get_all_documents_metadata–Retrieve the metadata of all documents within collection
-
get_content–Fetch normalized content for Zotero items.
-
get_list_artefacts–Return Zotero items modified after
last_synced.
connect
¶
Establish connection to the external source.
Idempotent: safe to call multiple times.
Raises¶
ConnectionError PermissionError ValueError
Source code in src/database_builder_libs/models/abstract_source.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
download_zotero_item
¶
Download the first attachment of specified zotero item to specified path
This function is a wrapper around the dump api to provide a means to download attachments of zotero items using local & cloud api. As the default (at this time) dump api_call only provides cloud download functionality.
Parameters:
-
–`item_id`¶The specific item_id of the item to get the attachment/pdf from (
keyattribute from above mentioned zotero dict) -
–`download_path`¶The folder to download the item to, the file_path will be ->
/ .pdf
Source code in src/database_builder_libs/sources/zotero_source.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
get_all_documents_metadata
¶
Retrieve the metadata of all documents within collection
This function calls the zotero collection items api:
'https://api.zotero.org/users/
Parameters:
-
–`collection_id`¶The collection to retrieve document metadata from (should be visible in WebURL when using zotero webportal)
Yields:
-
List[dict[str, Any]]–List containing document-metadata dict for all documents in the library (one dict per document).
-
List[dict[str, Any]]–The dict output closely resembles the dict output format of pyzotero:
-
https(List[dict[str, Any]]) –//pyzotero.readthedocs.io/en/latest/#zotero.Zotero.collection_items_top
Source code in src/database_builder_libs/sources/zotero_source.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
get_content
¶
Fetch normalized content for Zotero items.
Each artefact is retrieved individually and converted to Content.
Guarantees¶
- One Content object per artefact
- Content.date reflects the modification timestamp observed during listing.
- Content.content may represent a newer revision if the item changed during retrieval.
- Content.content contains raw Zotero
datafield
This method does not download attachments.
Source code in src/database_builder_libs/sources/zotero_source.py
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | |
get_list_artefacts
¶
Return Zotero items modified after last_synced.
Parameters¶
last_synced : datetime | None UTC timestamp of last successful sync. If None, all items are returned.
Returns¶
list[(item_key, modified_time)]
Sync guarantees¶
- item_key is stable across runs
- timestamps are timezone-aware UTC
- includes newly created and modified items
- DOES NOT include deleted items (Zotero limitation)
Notes¶
Zotero since uses server modification time, not file change time.
Source code in src/database_builder_libs/sources/zotero_source.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | |