Skip to content

Commit 4887eae

Browse files
authored
Merge pull request #432 from softwarepub/refactor/423-implement-public-api
Implement public API
2 parents 6bc1648 + d46394e commit 4887eae

File tree

14 files changed

+866
-250
lines changed

14 files changed

+866
-250
lines changed

REUSE.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,9 @@ path = ["REUSE.toml"]
1717
precedence = "aggregate"
1818
SPDX-FileCopyrightText = "German Aerospace Center (DLR), Helmholtz-Zentrum Dresden-Rossendorf, Forschungszentrum Jülich"
1919
SPDX-License-Identifier = "CC0-1.0"
20+
21+
[[annotations]]
22+
path = ["src/**/*.py", "test/**/*.py"]
23+
precedence = "aggregate"
24+
SPDX-FileCopyrightText = "German Aerospace Center (DLR), Helmholtz-Zentrum Dresden-Rossendorf, Forschungszentrum Jülich"
25+
SPDX-License-Identifier = "Apache-2.0"

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ def read_version_from_pyproject():
102102
'sphinx_togglebutton',
103103
'sphinxcontrib.datatemplates',
104104
# Custom extensions, see `_ext` directory.
105-
'plugin_markup',
105+
# 'plugin_markup',
106106
]
107107

108108
language = 'en'

docs/source/dev/data_model.md

Lines changed: 272 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,286 @@
11
<!--
2-
SPDX-FileCopyrightText: 2022 German Aerospace Center (DLR)
2+
SPDX-FileCopyrightText: 2025 German Aerospace Center (DLR)
33
44
SPDX-License-Identifier: CC-BY-SA-4.0
55
-->
66

77
<!--
8-
SPDX-FileContributor: Michael Meinel
8+
SPDX-FileContributor: Stephan Druskat <[email protected]>
99
-->
1010

11-
# HERMES Data Model
11+
# Data model
1212

13-
*hermes* uses an internal data model to store the output of the different stages.
14-
All the data is collected in a directory called `.hermes` located in the root of the project directory.
13+
`hermes`' internal data model acts like a contract between `hermes` and plugins.
14+
It is based on [**JSON-LD (JSON Linked Data)**](https://json-ld.org/), and
15+
the public API simplifies interaction with the data model through Python code.
1516

16-
You should not need to interact with this data directly.
17-
Instead, use {class}`hermes.model.context.HermesContext` and respective subclasses to access the data in a consistent way.
17+
Output of the different `hermes` commands consequently is valid JSON-LD, serialized as JSON, that is cached in
18+
subdirectories of the `.hermes/` directory that is created in the root of the project directory.
1819

20+
The cache is purely for internal purposes, its data should not be interacted with.
1921

20-
## Harvest Data
22+
Depending on whether you develop a plugin for `hermes`, or you develop `hermes` itself, you need to know either [_some_](#json-ld-for-plugin-developers),
23+
or _quite a few_ things about JSON-LD.
2124

22-
The data of the havesters is cached in the sub-directory `.hermes/harvest`.
23-
Each harvester has a separate cache file to allow parallel harvesting.
24-
The cache file is encoded in JSON and stored in `.hermes/harvest/HARVESTER_NAME.json`
25-
where `HARVESTER_NAME` corresponds to the entry point name.
25+
The following sections provide documentation of the data model.
26+
They aim to help you get started with `hermes` plugin and core development,
27+
even if you have no previous experience with JSON-LD.
2628

27-
{class}`hermes.model.context.HermesHarvestContext` encapsulates these harvester caches.
29+
## The data model for plugin developers
30+
31+
If you develop a plugin for `hermes`, you will only need to work with a single Python class and the public API
32+
it provides: {class}`hermes.model.SoftwareMetadata`.
33+
34+
To work with this class, it is necessary that you know _some_ things about JSON-LD.
35+
36+
### JSON-LD for plugin developers
37+
38+
```{attention}
39+
Work in progress.
40+
```
41+
42+
43+
### Working with the `hermes` data model in plugins
44+
45+
> **Goal**
46+
> Understand how plugins access the `hermes` data model and interact with it.
47+
48+
`hermes` aims to hide as much of the data model as possible behind a public API
49+
to avoid that plugin developers have to deal with some of the more complex features of JSON-LD.
50+
51+
#### Model instances in different types of plugin
52+
53+
You can extend `hermes` with plugins for three different commands: `harvest`, `curate`, `deposit`.
54+
55+
The commands differ in how they work with instances of the data model.
56+
57+
- `harvest` plugins _create_ a single new model instance and return it.
58+
- `curate` plugins are passed a single existing model instance (the output of `process`),
59+
and return a single model instance.
60+
- `deposit` plugins are passed a single existing model instance (the output of `curate`),
61+
and return a single model instance.
62+
63+
#### How plugins work with the API
64+
65+
```{important}
66+
Plugins access the data model _exclusively_ through the API class {class}`hermes.model.SoftwareMetadata`.
67+
```
68+
69+
The following sections show how this class works.
70+
71+
##### Creating a data model instance
72+
73+
Model instances are primarily created in `harvest` plugins, but may also be created in other plugins to map
74+
existing data into.
75+
76+
To create a new model instance, initialize {class}`hermes.model.SoftwareMetadata`:
77+
78+
```{code-block} python
79+
:caption: Initializing a default data model instance
80+
from hermes.model import SoftwareMetadata
81+
82+
data = SoftwareMetadata()
83+
```
84+
85+
`SoftwareMetadata` objects initialized without arguments provide the default _context_
86+
(see [_JSON-LD for plugin developers_](#json-ld-for-plugin-developers)).
87+
This means that now, you can use terms from the schemas included in the default context to describe software metadata.
88+
89+
Terms from [_CodeMeta_](https://codemeta.github.io/terms/) can be used without a prefix:
90+
91+
```{code-block} python
92+
:caption: Using terms from the default schema
93+
data["readme"] = ...
94+
```
95+
96+
Terms from [_Schema.org_](https://schema.org/) can be used with the prefix `schema`:
97+
98+
```{code-block} python
99+
:caption: Using terms from a non-default schema
100+
data["schema:copyrightNotice"] = ...
101+
```
102+
103+
You can also use other linked data vocabularies. To do this, you need to identify them with a prefix and register them
104+
with the data model by passing it `extra_vocabs` as a `dict` mapping prefixes to URLs where the vocabularies are
105+
provided as JSON-LD:
106+
107+
```{code-block} python
108+
:caption: Injecting additional schemas
109+
from hermes.model import SoftwareMetadata
110+
111+
# Contents served at https://bar.net/schema.jsonld:
112+
# {
113+
# "@context":
114+
# {
115+
# "name": "https://schema.org/name"
116+
# }
117+
# }
118+
119+
data = SoftwareMetadata(extra_vocabs={"foo": "https://bar.net/schema.jsonld"})
120+
121+
data["foo:name"] = ...
122+
```
123+
124+
##### Adding data
125+
126+
Once you have an instance of {class}`hermes.model.SoftwareMetadata`, you can add data to it,
127+
i.e., metadata that describes software:
128+
129+
```{code-block} python
130+
:caption: Setting data values
131+
data["name"] = "My Research Software" # A simple "Text"-type value
132+
# → Simplified model representation : { "name": [ "My Research Software" ] }
133+
# Cf. "Accessing data" below
134+
data["author"] = {"name": "Shakespeare"} # An object value that uses terms available in the defined context
135+
# → Simplified model representation : { "name": [ "My Research Software" ], "author": [ { "name": "Shakespeare" } ] }
136+
# Cf. "Accessing data" below
137+
```
138+
139+
##### Accessing data
140+
141+
You need to be able to access data in the data model instance to add, edit or remove data.
142+
Data can be accessed by using term strings, similar to how values in Python `dict`s are accessed by keys.
143+
144+
```{important}
145+
When you access data from a data model instance,
146+
it will always be returned in a **list**-like object!
147+
```
148+
149+
The reason for providing data in list-like objects is that JSON-LD treats all property values as arrays.
150+
Even if you add "single value" data to a `hermes` data model instance via the API, the underlying JSON-LD model
151+
will treat it as an array, i.e., a list-like object:
152+
153+
```{code-block} python
154+
:caption: Internal data values are arrays
155+
data["name"] = "My Research Software" # → [ "My Research Software" ]
156+
data["author"] = {"name": "Shakespeare"} # → [ { "name": [ "Shakespeare" ] } ]
157+
```
158+
159+
Therefore, you access data in the same way you would access data from a Python `list`:
160+
161+
1. You access single values using indices, e.g., `data["name"][0]`.
162+
2. You can use a list-like API to interact with data objects, e.g.,
163+
`data["name"].append("Hamilton")`, `data["name"].extend(["Hamilton", "Knuth"])`, `for name in data["name"]: ...`, etc.
164+
165+
##### Interacting with data
166+
167+
The following longer example shows different ways that you can interact with `SoftwareMetadata` objects and the data API.
168+
169+
```{code-block} python
170+
:caption: Building the data model
171+
from hermes.model import SoftwareMetadata
172+
173+
# Create the model object with the default context
174+
data = SoftwareMetadata()
175+
176+
# Let's create author metadata for our software!
177+
# Below each line of code, the value of `data["author"]` is given.
178+
179+
data["author"] = {"name": "Shakespeare"}
180+
# → [{'name': ['Shakespeare']}]
181+
182+
data["author"].append({"name": "Hamilton"})
183+
# [{'name': ['Shakespeare']}, {'name': ['Hamilton']}]
184+
185+
data["author"][0]["email"] = "[email protected]"
186+
# [{'name': ['Shakespeare'], 'email': ['[email protected]']}, {'name': ['Hamilton']}]
187+
188+
data["author"][1]["email"].append("[email protected]")
189+
# [{'name': ['Shakespeare'], 'email': ['[email protected]']}, {'name': ['Hamilton'], 'email': ['[email protected]']}]
190+
191+
data["author"][1]["email"].extend(["[email protected]", "[email protected]"])
192+
# [
193+
# {'name': ['Shakespeare'], 'email': ['[email protected]']},
194+
# {'name': ['Hamilton'], 'email': ['[email protected]', '[email protected]', '[email protected]']}
195+
# ]
196+
```
197+
198+
The example continues to show how to iterate through data.
199+
200+
```{code-block} python
201+
:caption: for-loop, containment check
202+
for i, author in enumerate(data["author"], start=1):
203+
if author["name"][0] in ["Shakespeare", "Hamilton"]:
204+
print(f"Author {i} has expected name.")
205+
else:
206+
raise ValueError("Unexpected author name found!", author["name"][0])
207+
208+
# Mock output:
209+
# $> Author 1 has expected name.
210+
# $> Author 2 has expected name.
211+
```
212+
213+
```{code-block} python
214+
:caption: Value check
215+
for email in data["author"][0]["email"]:
216+
if email.endswith(".edu"):
217+
print("Shakespeare has an email address at an educational institution.")
218+
else:
219+
print("Cannot confirm affiliation with educational institution for Shakespeare.")
220+
221+
# Mock output
222+
# $> Cannot confirm affiliation with educational institution for author.
223+
```
224+
225+
```{code-block} python
226+
:caption: Value check and list comprehension
227+
if all(["hamilton" in email for email in data["author"][1]["email"]]):
228+
print("Author has only emails with their name in it.")
229+
230+
# Mock output
231+
# $> Author has only emails with their name in it.
232+
```
233+
234+
The example continues to show how to assert data values.
235+
236+
As mentioned in the [introduction to the data model](#data-model),
237+
`hermes` uses a JSON-LD-like internal data model.
238+
The API class {class}`hermes.model.SoftwareMetadata` hides many
239+
of the more complex aspects of JSON-LD and makes it easy to work
240+
with the data model.
241+
242+
So the API class hides the internal model objects.
243+
Therefore, they work as you would expect from plain
244+
Python data:
245+
246+
```{code-block} python
247+
:caption: Naive containment assertion that raises
248+
:emphasize-lines: 5,13
249+
try:
250+
assert (
251+
{'name': ['Shakespeare'], 'email': ['[email protected]']}
252+
in
253+
data["author"]
254+
)
255+
print("The author was found!")
256+
except AssertionError:
257+
print("The author could not be found.")
258+
raise
259+
260+
# Mock output
261+
# $> The author was found!
262+
#
263+
#
264+
# Internal Model from data["author"]:
265+
# {'@list': [
266+
# {
267+
# 'http://schema.org/name': [{'@value': 'Shakespeare'}],
268+
# 'http://schema.org/email': [{'@value': '[email protected]'}]
269+
# },
270+
# {
271+
# 'http://schema.org/name': [{'@value': 'Hamilton'}],
272+
# 'http://schema.org/email': [
273+
# {'@list': [
274+
# {'@value': '[email protected]'}, {'@value': '[email protected]'}, {'@value': '[email protected]'}
275+
# ]}
276+
# ]
277+
# }]
278+
# }
279+
# )
280+
```
281+
282+
---
283+
284+
## See Also
285+
286+
- API reference: {class}`hermes.model.SoftwareMetadata`

0 commit comments

Comments
 (0)