Status of this Document
This is a draft API specification. Expect changes to this document.
This Version: 0.0.1
Latest Stable Version: None
Previous Version: None
Editors
-
David Eichmann, School of Library & Information Science, University of Iowa
-
Christine Fernsebner Eslao, Harvard Library
-
Nancy J. Fallgren, National Library of Medicine
-
Steven Folsom, Cornell University Library
-
Kevin M. Ford, Library of Congress
-
Jim Hahn, University of Pennsylvania Libraries
-
Kirk Hess, OCLC R&D
-
Anna Lionetti, Casalini Libri
-
Tiziana Possemato, Casalini Libri
-
Erik Radio, University of Colorado Boulder
-
E. Lynette Rayle, Cornell University Library
-
Gregory F. Reeve, Brigham Young University
-
Vitus Tang, Stanford University Libraries
-
Simeon Warner, Cornell University Library
Copyright © 2021-2023 Editors and contributors. Published by the LD4 under the CC-BY license, see disclaimer.
1. Introduction
The Entity Metadata Management API is intended to establish a pattern that supports sharing changes to entities and their metadata curated by entity metadata providers with the community of entity metadata consumers (e.g. libraries, museums, galleries, archives). Use of a consistent pattern allows for the creation of software tools for producing and consuming changes in entity metadata.
This specification is based on the Activity Streams 2.0 specification. It defines a usage pattern and minor extensions specific to entity metadata management.
1.1. Objectives and Scope
The objective of this specification is to provide a machine to machine API that conveys the information needed for an entity metadata consumer to understand all the changes to entity metadata across the lifecycle of an entity. The intended audiences are Entity Metadata Providers who curate and publish entity metadata within an area of interest, Entity Metadata Consumers who use the entity metadata, and developers who create applications and tools that help consumers connect to entity metadata from providers.
The discovery of changes to entity metadata requires a consistent pattern for entity metadata providers to publish lists of links to entities that have metadata changes and the types of changes that have occurred. Changes include newly available entities with metadata, removed entities, as well as changes to entities and their metadata.
This process can be optimized if metadata providers publish changes in chronological order, including descriptions of how their entity descriptions have changed, enabling consuming systems to retrieve only the resources that have been modified since they were last retrieved.
This specification does not include a mechanism for enabling change notifications to be pushed to remote systems. Only periodic polling for the set of changes that must be processed is supported. Addition of a push mechanism may be added in a future version.
Work that is out of scope of this API includes the recommendation or creation of any descriptive metadata formats, and the recommendation or creation of metadata search APIs or protocols. The diverse domains represented across the entity metadata already have successful standards fulfilling these use cases. Also out of scope is optimization of the transmission mechanisms providing access points for consumers to query.
1.2. Use Cases
The following three use cases motivate this specification. They are drawn from workflows needed by libraries, museums, galleries, and archives.
1.2.1. Entity Change Activities List
Entity metadata consumers want to learn of any modifications or deletions for entities on their interest list, as well as new entities. This allows for a comparison between the consumer’s list and the provider’s entity change activity list of modified and deleted entities. For any that overlap, the consumer will take additional actions if needed.
To address this use case, the provider creates and makes available a list of activities with the URIs for any new, modified, or deleted entities. While the provider may have internal needs for tracking more than these three moments in an entity’s lifecycle (e.g. if the provider workflow requires a review activity), this specification focuses on public changes to the dataset that may require action from a consumer. The consumer will need to take additional actions to identify specific changes to entities of interest.
1.2.2. Local Cache of Labels
Entity metadata consumers persist references to entity metadata by saving the URI as part of their local datastore. URIs may not be understandable to application users. In order to be able to display a human readable label, a label may be retrieved from the provider’s datastore by dereferencing the URI. For performance reasons, the label is generally cached in the local datastore to avoid having to fetch the label every time the entity reference is displayed to an end user. If the label changes in the provider’s datastore, the consumer would like to update the local cache of the label.
To address this use case, the provider creates and makes available a list of URIs and their new labels. The consumer can compare the list of URIs with those stored in the local application and update the cached labels.
In some cases, additional metadata is also cached as part of the external reference, but this is less common. Verification of the additional metadata may require the consumer to take additional actions.
1.2.3. Local Cache of Full Dataset
A consumer may decide to make a full cache of a dataset of entity metadata. This is commonly done for increased control over uptime, throughput, and indexing for search. The cache needs timely updates to stay in sync with the source dataset.
To address this use case, the provider creates and makes available a dated list of all new, modified, and deleted entities along with specifics about how the entities have changed. The consumer can process a stream of change documents that was published since their last incremental update. Specific details about each change can be used to update the local cache.
1.3. Terminology
1.3.1. Roles
- Entity Metadata Provider: An organization that collects, curates, and provides access to metadata about entities within an area of interest.
- Entity Metadata Consumer: Any institution that references or caches entity metadata from a provider.
1.3.2. Terms about Entities
- Entity: An entity is any resource (a thing or a concept) identified with a URI that we may want to reference or make use of in data set. Entities include, but are not limited to, what are referred to authorities, controlled vocabulary terms, or real world objects (RWOs) in library, archives, and museum domains.
- Entity Set: A set of entities that are grouped together by an Entity Metadata Provider. Entities can be grouped based on various criteria (e.g. subject headings, names, thesaurus, controlled vocabulary).
1.3.3. Terms from Activity Streams
This specification is based on the Activity Streams 2.0 specification and uses the following key terms from Activity Streams:
- Activity:
Activity
objects are used to describe an individual change to the metadata of an Entity Set. These often affect just one Entity but in some cases more than one Entity may be affected by related changes that are reflected in multiple sequenced Activities. - Collection: The entry point for all the information about changes to the metadata of an Entity Set is modeled as a Collection, using the
OrderedCollection
type to indicate that the activities in the collection are in time order. - OrderedCollectionPage: The complete Collection of changes is expressed as a set of
OrderedCollectionPage
objects to ensure that there are manageable chunks of change activities described even for large and long-running sets of updates.
Many properties from Activity Streams are used, and are described throughout this document.
1.3.4. Terms from Other Specifications
This specification uses the following terms:
- HTTP(S): The HTTP or HTTPS URI scheme and internet protocol.
- Javascript Object Notation (JSON): The terms array, JSON object, number, and string in this document are to be interpreted as defined by the Javascript Object Notation (JSON) specification.
- JSON-LD: Entitiy Metadata Management context is defined following JSON-LD specification.
- RFC 2119: The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.
- URI: URIs are defined following the IANA URI-Schemes specification.
2. Architecture
This specification is based on the Activity Streams 2.0 specification. Changes in entity metadata over time are communicated from providers to consumers via Entity Change Activities that describe a change to an entity. These are collected together in Change Set documents that are organized as shown in the diagram below.
Entity Metadata Management API Architecture representing changes using Activity Streams
2.1. JSON-LD Representation
The use of JSON-LD with a specific @context
that extends the Activity Streams 2.0 specification allows Entity Metadata Consumers to parse the resulting documents using standard JSON tools, and also allows the data to be interpreted according to the RDF Data Model (see Relationship to RDF).
3. Organizational Structures
3.1. Entry Point
Reference: Ordered Collection in the Activity Stream specification
Each Entity Set MUST have at least one Entry Point. It MAY have multiple Entry Points to satisfy different use cases. For example, one Entry Point may provide detailed changes to support incremental updates of a full cache and a second may only provide a list of primary label changes.
The Entry Point MUST be implemented as an Ordered Collection following the definition in the Activity Stream specification. The key points are repeated here with examples specific to Entity Metadata Management.
FULL EXAMPLE for Entry Point:
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
"summary": "My Authority - Change Documents",
"type": "OrderedCollection",
"id": "https://data.my.authority/change_documents/2021/activity-stream",
"url": "https://my.authority/2021-01-01/full_download",
"first": {
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/1"
},
"last": {
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/12"
},
"totalItems": 123
}
Reference: JSON-LD scoped context
The @context
is used to refer a JSON-LD context which, in its simplest form, maps terms to IRIs.
Entity Metadata Management activity streams MUST include the following context definition at the top-level of each API response:
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
// rest of API response
}
The Entity Metadata Management context includes information for all parts of this specification and thus the same context definition is used in all API responses. The Entity Metadata Management context imports the Activity Stream context and thus it is not necessarily to specify that explicitly in API responses.
Implementations MAY include additional extension contexts, in which case the value of @context
will be a list with the Entity Metadata Management context first. Extension contexts MUST NOT override terms defined in the Entity Metadata Management context or the underlying Activity Stream context. Implementations MAY also use additional properties and values not defined in a JSON-LD @context
with the understanding that any such properties will likely be unsupported and ignored by consuming implementations that use the standard JSON-LD algorithms.
Reference: summary property definition
The summary is a natural language summarization of the purpose of the Entry Point
The Entry Point SHOULD have a summary property. For an Entry Point, the summary MAY be a brief description of the Entity Set in which the described changes occurred. If there are multiple entry points to the same collection, it is RECOMMENDED that the summary include information that distinguishes each entry point from the others.
{ "summary": "My Authority - Entity Change List" }
{ "summary": "My Authority - Incremental Updates from 2022-01-01 Full Download" }
Reference: type property definition
The type property identifies the Activity Stream type for the Entry Point.
The Entry Point MUST have a type property. The value MUST be OrderedCollection
.
{ "type": "OrderedCollection" }
Reference: id property definition
The id is a unique identifier of the Entry Point.
The Entry Point MUST have an id property. The value MUST be a string and it MUST be an HTTP(S) URI. The JSON representation of the Ordered Collection Entry Point MUST be available at the URI.
{ "id": "https://data.my.authority/change_documents/2021/activity-stream" }
Reference: url property definition
The Entry Point MAY have a url
property providing one or more links to representations of the Entity Set. If there are multiple links then the value of the url
property will be an array.
A common use of the url
property is a link to the full download for the collection.
{ "url": "https://my.authority/2021-01-01/full_download" }
Reference: first property definition
A link to the first Change Set in this Entry Point for the Entity Collection.
The Entry Point MUST have a first property. The value MUST be either:
- a string that is HTTP(S) URI of the first page of items in the Entry Point, or
- a JSON object, with at least the id and type properties. The value of the id property MUST be a string that is the HTTP(S) URI of the first page of items in the Entry Point. The value of the type property MUST be the string
OrderedCollectionPage
.
{
"first": {
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/1"
}
}
Reference: last property definition
A link to the last Change Set in this Entry Point for the Entity Collection.
The Entry Point MUST have a last property. The value MUST be either:
- a string that is HTTP(S) URI of the last page of items in the Entry Point, or
- a JSON object, with at least the id and type properties. The value of the id property MUST be a string that is the HTTP(S) URI of the last page of items in the Entry Point. The value of the type property MUST be the string
OrderedCollectionPage
.
{
"last": {
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/12"
}
}
Reference: totalItems property definition
The count of all Entity Change Activities across all Change Sets in the Entry Point for the Entity Collection.
The Entry Point MAY have a totalItems property. If included, the value MUST be an integer, and it SHOULD be the cumulative count of Entity Change Activities across all Change Sets.
{
"totalItems": 123
}
3.2. Change Set
Reference: Ordered Collection Page in the Activity Stream specification
Each time a set of changes is published, changes MUST be released in at least one Change Set. Changes MAY be published across multiple Change Sets. For example, a site may decide that each Change Set will have at most 50 changes and if that maximum is exceeded during the release time period, then a second Change Set will be created.
- The Activities within a Change Set MUST be sorted in date-time order in the
orderedItems
array. The Activities MAY be in ascending or descending order, but the order MUST be consistent within the Collection. - Where there are multiple Change Sets, these sets MUST be arranged in ascending or descending date-time order, consistent with the Activity ordering within each Change Set.
It is RECOMMENDED that change sets be published on a regular schedule. It is recognized that there are many factors that can impact implementation, including but not limited to, the volume of changes, the consistency of timing of changes, the tolerance of consumers for delays in the publication schedule, resources for producing Change Sets.
Change Sets MUST be implemented as an Ordered Collection Page following the definition in the Activity Stream specification. The key points are repeated here with examples specific to Entity Metadata Management.
FULL EXAMPLE for Change Set:
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/2",
"partOf": {
"type": "OrderedCollection",
"id": "https://data.my.authority/change_documents/2021/activity-stream"
},
"totalItems": 2,
"prev": {
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/1",
"type": "OrderedCollectionPage"
},
"next": {
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/3",
"type": "OrderedCollectionPage"
},
"orderedItems": [
{
"type": "Create",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd1",
"published": "2021-02-01T15:04:22Z",
"object": {
"id": "https://my.authority/term/milk",
"type": "Term",
"updated": "2021-02-01T15:04:22Z"
}
},
{
"type": "Create",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd2",
"published": "2021-02-01T17:11:03Z",
"object": {
"id": "https://my.authority/term/bovine_milk",
"type": "Term",
"updated": "2021-02-01T17:11:03Z"
}
},
{
"type": "Deprecate",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd2",
"published": "2021-02-01T17:11:03Z",
"object": {
"id": "https://my.authority/term/cow_milk",
"type": "Term",
"updated": "2021-02-01T17:11:03Z"
}
}
]
}
NOTE: See Entity Change Activity under Entity Level Structures for more information on the data to be included in the orderedItems
property.
Reference: next property definition
A link to the next Change Set in this Entry Point for the Entity Collection.
The Change Set MUST have a next property if there are subsequent Change Sets in the Entry Point for this Entity Collection. The value MUST be either:
- a string that is HTTP(S) URI of the next page of items in the Entry Point, or
- a JSON object, with at least the id and type properties. The value of the id property MUST be a string that is the HTTP(S) URI of the next page of items in the Entry Point. The value of the type property MUST be the string
OrderedCollectionPage
.
{
"next": "https://data.my.authority/change_documents/2021/activity-stream/page/3"
}
Reference: prev property definition
A link to the previous Change Set in this Entry Point for the Entity Collection.
The Change Set MAY have a prev property if there are preceding Change Sets in the Entry Point for this Entity Collection. If present, the value MUST be either:
- a string that is HTTP(S) URI of the previous page of items in the Entry Point, or
- a JSON object, with at least the id and type properties. The value of the id property MUST be a string that is the HTTP(S) URI of the previous page of items in the Entry Point. The value of the type property MUST be the string
OrderedCollectionPage
.
{
"prev": "https://data.my.authority/change_documents/2021/activity-stream/page/1"
}
4. Entity Level Structures
Entity level structures describe the individual changes to entity metadata within an Entity Set.
The structures described in this section are used in the orderedItems property of the Change Set. The level of detail in the orderedItems depends on the use case being addressed. The Entity Change Activities List use case can be addressed by the Entity Change Activity. The Local Cache of Labels and Local Cache of Full Dataset use cases can be addressed more efficiently by also including an Entity Patch. Without an Entity Patch, the consumer must dereference the entity URI to obtain the updated entity description.
4.1. Entity Change Activities
Reference: Activity in the Activity Stream specification
A change to Entity Metadata MUST be described in an Entity Change Activity. An Entity Change Activity MUST be implemented as an Activity Streams Activity
. The activity MUST provide information about the type of change and the entity or entities changed. It MAY provide links that facilitate the consumer gathering additional information from the source dataset.
Not all implementations will store every change for an entity over time. A Collection MAY provide feeds of only the last known metadata update for each entity. In the case where the Collection provides feeds of only the last known metadata update for each entity case, the page identifier cannot be used to know the last Activities processed by a consumer. For this reason the Activities within the Collection MUST have a date property and SHOULD include the date using the published
property. The updated
property SHOULD be used on the Object for when the entity actually occurred.
This level is sufficient to address the Entity Change Activities List use case.
FULL EXAMPLE for Entity Change Activity:
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
"summary": "Add entity for subject Science",
"published": "2021-08-02T16:59:54Z",
"type": "Add",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd11",
"partOf": {
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
},
"object": {
"type": "Subject",
"id": "http://my_repo/entity/science",
"updated": "2021-08-02T16:59:54Z"
}
}
Properties shared across all Entity Change Activity types are described here. If a specific activity type handles a property differently, it will be described with that activity type in section Types of Change.
See @context as described in the Entry Point section.
Reference: summary property definition
For an Entity Change Activity, the summary is a brief description of the change to entity metadata that the activity represents. It is RECOMMENDED that a summary be included and that it reference the type of change (e.g. “Add entity”) and the entity being changed (e.g. “subject Science”).
There are a limited set of types of change. See Types of Change section for a list of types and example summaries for each. Identification of the entity will vary depending on the data represented in the Entity Set.
{ "summary": "Add entity for subject Science" }
Reference: type property definition
Each Entity Change Activity MUST have a type property.
The type is the one of a set of predefined Entity Change Activity types. See Types of Change section for a list of types and values for each activity type.
{ "type": "Create" }
Reference: id property definition
The unique identifier of the Entity Change Activity.
The Entity Change Activity MUST have an id property. The value MUST be a string and it MUST be an HTTP(S) URI. The JSON representation of the Entity Change Activity MUST be available at the URI.
{ "id": "https://data.my.authority/change_documents/2021/activity-stream/cd11" }
Reference: partOf property definition
The partOf property identifies the Change Set in which this activity was published.
An Entity Change Activity MAY use the partOf property to refer back to the Change Set that includes the activity. When used on an Activity, the partOf property MUST NOT be used for any other purpose. The value MUST be a string and it MUST be an HTTP(S) URI. The JSON representation of the Change Set publishing this activity MUST be available at the URI.
"partOf": {
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
}
4.2. Entity Patch
To support the Local Cache of Labels or the Local Cache of Full Dataset use cases efficiently, it is OPTIONAL that each Entity Change Activity include the instrument property which provides a link an Entity Patch. Without an Entity Patch, the consumer must dereference the entity URI to obtain the updated entity description.
FULL EXAMPLE for Entity Patch
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
"summary": "rdf_patch to create entity for term milk",
"type": "rdf_patch",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd11/instrument/1",
"partOf": {
"type": "Create",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd11"
},
"content":
"A <https://my_repo/entity/cow_milk>
<https://my.authority/vocab/hasLabel> 'cow milk'@en.
A <https://my_repo/entity/cow_milk>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept>.
A <https://my_repo/entity/cow_milk>
<https://my.authority/vocab/broaderTerm> <https://my_repo/entity/milk>.
A <https://my_repo/entity/cow_milk>
<https://my.authority/vocab/narrow_term>
<https://my_repo/entity/bovine_milk>."
5. Types of Change
All Entity Change Activities have a core set of properties that are described in the Entity Change Activity section. Some properties are specific to the Types of Change. This section provides examples and descriptions of the Entity Change Notification and Entity Patch for each type of change. They also describe the relationship between similar Activity Types (e.g. Create vs. Add).
5.1. New Entity
Reference: add activity definition
Reference: create activity definition
A new entity SHOULD have an Entity Change Activity with a type of either “Create” or “Add”.
A new entity MUST be implemented as an Activity following the Create type definition or the Add type definition in the Activity Stream specification. The key points are repeated here with examples specific to Entity Metadata Management.
Create vs. Add
An entity appearing in an Entry Point stream for the first time MUST use Activity type Create and/or Add
Create SHOULD be used when the entity is new in the source dataset and available for use. A provider MUST NOT use Create to broadcast that an entity exists unless it can be dereferenced at the entity URI. A Create activity indicates that the entity is new and available for use by consumers, see also Add below.
Add SHOULD be used when the entity exists in the source dataset, but was previously not available through the Entry Point and now is being made available in the stream. Situations where this might happen include, but are not limited to, change in permissions, end of an embargo, temporary removal and now being made available.
A new Entry Point MAY choose to populate the stream with all existing entities. In this case, the initial population of the stream with all existing entities SHOULD use Add.
EXAMPLE Entity Change Activity for Create
Complete Example
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
"summary": "New entity for term milk",
"published": "2021-08-02T16:59:54Z",
"type": "Create",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd11",
"partOf": {
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
},
"object": {
"type": "Term",
"id": "http://my_repo/entity/cow_milk",
"updated": "2021-08-02T16:59:54Z"
},
"instrument":
{
"type": "rdf_patch",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd11/instrument/1"
}
}
summary
Reference: summary property definition
A summary is a brief description of a change to entity metadata. It is RECOMMENDED that a summary be included and that it reference the type of change and the entity being changed.
{ "summary": "Add entity for subject Science" }
type
Reference: type property definition
The type is the one of a set of predefined Entity Change Activity types.
Each Entity Change Activity MUST have a type property. For an activity for a newly available entity, the value SHOULD be one of either Create
or Add
.
{ "type": "Create" }
or
{ "type": "Add" }
id
Reference: id property definition
The unique identifier of the Entity Change Activity.
The Entity Change Activity MUST have an id property. The value MUST be a string and it MUST be an HTTP(S) URI. The JSON representation of the Entity Change Activity MUST be available at the URI.
{ "id": "https://data.my.authority/change_documents/2021/activity-stream/cd11" }
partOf
Reference: partOf property definition
The partOf property identifies the Change Set in which this activity was published.
An Entity Change Activity MAY use the partOf property to refer back to the Change Set that includes the activity. The partOf property MUST NOT be used for any other purpose.
"partOf": {
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
}
EXAMPLE Entity Patch for Create
Complete Example
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
"summary": "rdf_patch to create entity for term milk",
"type": "rdf_patch",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd11/instrument/1",
"partOf": {
"type": "Create",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd11"
},
"content":
"A <https://my_repo/entity/cow_milk>
<https://my.authority/vocab/hasLabel> 'cow milk'@en.
A <https://my_repo/entity/cow_milk>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept>.
A <https://my_repo/entity/cow_milk>
<https://my.authority/vocab/broaderTerm> <https://my_repo/entity/milk>.
A <https://my_repo/entity/cow_milk>
<https://my.authority/vocab/narrow_term>
<https://my_repo/entity/bovine_milk>."
}
5.2. Update Entity
An updated entity SHOULD have an Entity Change Activity with a type of “Update”.
An updated entity MUST be implemented as an Activity following the Update type definition in the Activity Stream specification. The key points are repeated here with examples specific to Entity Metadata Management.
Examples of updates in the library domain include splits and merges. See the Deprecate Entity below for an illustration of how you can reflect these scenarios without explicitly typing them as splits or merge activities using a sequence of related activities.
EXAMPLE Entity Change Activity for Update
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
"summary": "Update entity term milk",
"published": "2021-08-02T16:59:54Z",
"type": "Update",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd31",
"partOf": {
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
},
"object": {
"type": "Term",
"id": "http://my_repo/entity/milk",
"updated": "2021-08-02T16:59:54Z"
},
"instrument": {
"type": "rdf_patch",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd31/instrument/1"
}
}
EXAMPLE Entity Patch for Update
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
"summary": "rdf_patch to update entity term milk",
"type": "rdf_patch",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd31/instrument/1",
"partOf": {
"type": "Update",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd31",
},
"content":
"A <http://my_repo/entity/milk> <http://my.authority/vocab/hasLabel> 'Milk'@en.
D <http://my_repo/entity/milk> <http://my.authority/vocab/hasLabel> 'milk'@en."
5.3. Delete Entity
It is RECOMMENDED that entities be marked as Deprecated in the source dataset instead of deleting the entity from the source dataset. If the entity is deprecated, follow the Entity Change Activity described in Deprecate Entity.
An entity that has been fully deleted from the source dataset where the entity URI is no longer dereferencable SHOULD have an Entity Change Activity with a type of “Delete” or “Remove”.
A deleted entity MUST be implemented as an Activity following the Delete type definition or the Remove type definition in the Activity Stream specification. The key points are repeated here with examples specific to Entity Metadata Management.
EXAMPLE Entity Change Activity for Delete
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
"summary": "Delete term cow_milk",
"published": "2021-08-02T16:59:54Z",
"type": "Delete",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd21",
"partOf": {
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
},
"object": {
"type": "Term",
"id": "http://my_repo/entity/cow_milk",
"updated": "2021-08-02T16:59:54Z"
},
"instrument": {
"type": "rdf_patch",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd21/instrument/1"
}
}
EXAMPLE Entity Patch for Delete
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
"summary": "rdf_patch to delete entity term cow_milk",
"type": "rdf_patch",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd21/instrument/1",
"partOf": {
"type": "Delete",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd21"
},
"content":
"D <http://my_repo/entity/cow_milk>
<http://my.authority/vocab/hasLabel> 'cow milk'@en.
D <https://my_repo/entity/cow_milk>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept>.
D <http://my_repo/entity/cow_milk>
<http://my.authority/vocab/broaderTerm> <http://my_repo/entity/milk>.
D <http://my_repo/entity/cow_milk>
<http://my.authority/vocab/narrow_term>
<http://my_repo/entity/bovine_milk>."
}
5.4. Deprecate Entity
Deprecation indicates that an existing entity in the authority has been updated to reflect that it should no longer be used though the URI remains dereferencable reflecting the deprecation. Whenever possible, the entity description should indicate which entity should be used instead.
There are two common scenarios. In the first, the replacement entity already exists and the deprecation updates the deprecated entity only. In the second scenario, the replacement entity does not exist prior to the deprecation. In this case, the replacement entity is created and the status of the original entity is changed to deprecrated.
An entity that has been deprecated SHOULD have an Entity Change Activity with the type Deprecate
. The Deprecate
activity MUST be implemented as either a single activity (when the entity that is replacing the deprecated entity already exists, or if the deprecated entity is not replaced), or two activities a Create
activity for the replacement entity, and an Deprecate
activity for the deprecated entity. Without an Entity Patch on the Deprecate
activity, the consumer must dereference the deprecated entity URI to obtain the updated entity description including whether it was replaced by a new or existing entity.
EXAMPLE Entity Change Activity for Deprecate in the Scenario where a Replacement Entity Already Exists
{
"@context": "https://ld4.github.io/entity_metadata_management/0.1/context.json",
"summary": "Deprecate term cow milk",
"published": "2021-08-02T16:59:57Z",
"type": "Deprecate",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd47",
"partOf": {
"type": "OrderedCollectionPage",
"id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
},
"object": {
"type": "Term",
"id": "http://my_repo/entity/cow_milk",
"updated": "2021-08-02T16:59:57Z"
},
"instrument": {
"type": "rdf_patch",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd42/instrument/2"
}
}
EXAMPLE Entity Change Activity for Deprecate in the Scenario where a Replacement Entity is Created
[
{
"type": "Create",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd2",
"published": "2021-02-01T17:11:03Z",
"object": {
"id": "https://my.authority/term/bovine_milk",
"type": "Term",
"updated": "2021-02-01T17:11:03Z"
},
"instrument": {
"type": "rdf_patch",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd42/instrument/2"
}
},
{
"type": "Deprecate",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd2",
"published": "2021-02-01T17:11:03Z",
"object": {
"id": "https://my.authority/term/cow_milk",
"type": "Term",
"updated": "2021-02-01T17:11:03Z"
},
"instrument": {
"type": "rdf_patch",
"id": "https://data.my.authority/change_documents/2021/activity-stream/cd42/instrument/2"
}
}
]
6. Provider Workflows
The section describes how an Entity Metadata Provider can implement this specification to allow consumer to follow changes in a set of entities they manage.
6.1 Provider Decisions
The choice of how often to create new Change Sets will depend upon how frequently entities are updated, expected needs of consumers for timely updates, resource constraints, and likely other local consideration. Two common approaches are to create Change Sets at predetermined time intervals (e.g. hourly, nightly, weekly, monthly), or after a certain number of changes have occurred (e.g. 10, 20, 100, 500 changes).
The Local Cache of Labels and Local Cache of Full Dataset use cases require the consumer to be able to download a copy of all entities in the dataset before following changes. Coordination of snapshots with the production of Changes Sets will make this easier.
Support for RDF patches to efficiently convey changes is optional.
6.2 Creating Full Downloads
When a full download of the dataset is created, the producer should:
- If already creating Change Sets, write any unrecorded entity changes to a last Change Set before the snapshot.
- Record the datetime when the snapshot for the download was taken.
- On the human-readable download page, include a link to the download file and indicate the datatime of creation.
- Create or update the Entry Point to include the new download in the
url
property.
6.3 Creating Change Sets
The provider must record information about changes in the Entity Set as they occur, then at some point write a Change Set and make accompanying changes to the Entry Point.
Recording Changes Made to the Entity Set
For each change in an Entity Set, the provider must record all information necessary to write the Activity entry in a Change Set. This includes:
- The dereferencable URI for the entity
- The
type
of entity (e.g.http://vocab.getty.edu/ontology#Subject
) - The Activity
type
of change (e.g.Add
,Update
,Deprecate
) - The datetime of the change to the entity
- A recommended summary description of change (e.g. “Add term Science”)
- Optionally, the RDF patch describing the change in the entity RDF
Publishing a Change Set
After some time recording changes, a provider publishes a new Change Set linked from an Entry Point. Several URIs for new and existing objects will be referenced in the algorithm below:
- entry_point_uri - the URI of the newly created or existing Entry Point
- prev_change_set_uri - the previous Change Set URI in the Entry Point’s
last
property - change_set_uri - the new URI that will resolve to the new Change Set
and for each change recorded:
- entity_uri - the URI of the entity changed
- change_activity_uri - the new URI that will resolve to the new Entity Change Activity describing the change
- change_rdf_patch_uri - optionally, the new URI that will resolve to the new RDF patch describing the change
With these URIs the new Change Set can be created as follows:
- set the
id
property to change_set_uri - set the
partOf
property to use entry_point_uri forid
- set the
prev
property to use prev_change_set_uri forid
- set the
totalItems
property to the number of change activities that will be in this change set - for each change, from oldest to newest or newest to oldest, add an Activity to the
orderedItems
property array, and:- set the
type
property to the change type (e.g.Add
,Update
, etc.) - set the
id
property to the change_activity_uri for this change - set the
published
property to the datetime the change set is being published - set the
object
property to be a JSON object with the following properties:- set the
id
property to the entity_uri - set the
type
property to the entity type - set the
updated
property to the datetime of the change to the entity - set the
summary
property to the human readable description of the change
- set the
- if RDF patch is supported, set the
instrument
proporty to be a JSON object with the following properties:- set the
type
property to the stringrdf_patch
- set the
id
property to the change_rdf_patch_uri
- set the
- set the
Update the previous Change Set:
- add a
next
property that points to the new Change Set
Update the Entry Point:
- if this is the first Change Set published, add the
first
property in the entry point with:- set the
type
property toOrderCollectionPage
- set the
id
property to the change_set_uri - set the
published
property to the datetime the Change Set is being published
- set the
- add or update the
last
property in the Entry Point:- set the
type
property toOrderCollectionPage
- set the
id
property to the change_set_uri - set the
published
property to the datetime the Change Set is being published
- set the
For each change create a separate Entity Change Activity document at the change_activity_uri with the same information used in the Change Set.
If RDF Patch is supported then for each change create a separate Entity Patch document at the change_rdf_patch_uri.
7. Consuming Entity Change Sets
7.1 Example consuming Library of Congress Activity Stream
CAUTION: This section is under construction. This section may or may not be removed from the final draft, in lieu of, a section that is a general example.
Library of Congress provides an activity stream for several authorities (e.g. names, genre/forms, subjects, etc.).
Characteristics:
- an entity will appear in the activity stream at most one time
- the date of the activity for an entity will be the date of the most recent change
- the first page of the stream has the most recent activities
- activities on a page are listed from newest to oldest
- the date of an activity is the time the ???
What does the date of an activity represent?
Assumptions:
- the activity MUST includes a URL that dereferences to a first order graph that
- MUST include all triples where the entity is the subject (
- MUST include all blanknodes, and related sub-graph, where the blanknode is the object and the entity is the subject (
<_:b1>) - MAY include triples for entities that are external to the base datasource if the entity is not available in another activity stream
- The activity MAY include another URL that dereferences to a graph that
- MAY include additional triple for other entities that are external to the base datasource that serve as object of the entity’s triple (
<another_entity_uri)
- MUST include all triples where the entity is the subject (
NOTE: A site may choose to use the second graph if they do not process other activity streams nor maintain their cache of each datasource in a separate triple store.
Recommendations:
- if maintaining a full cache, ingest latest full download before processing the related activity stream
- each time the activity stream is processed, save the date of the more recently processed entity
Processing for a full cache:
- navigate to the entry point for the activity stream
- navigate to the first page of the activity stream
- starting with the first activity on the first page and continue processing until the date of the activity is older than the date recorded the last time the stream was processed
- if activity type == REMOVE, remove the following triples from the cache
- blank nodes, and related sub-graph, where the blank nodes is the object for a triple with the entity as subject (
<_:b1>) - triples where the entity is the subject (
- blank nodes, and related sub-graph, where the blank nodes is the object for a triple with the entity as subject (
- if activity type == ADD, dereference the entity URI and add the following triples to the cache
- all triples where the entity is subject (
- all triples, and related sub-graph, where the entity is subject and a blank node is object (
<_:b1>)
- all triples where the entity is subject (
- if activity type == UPDATE, dereference the entity URI and add the following triples to the cache
- perform the steps for a REMOVE
- perform the steps for an ADD
- next activity if there is one OR first activity on the next page OR stop if no next page
- stop if date of the activity is later than saved last processed date
- if activity type == REMOVE, remove the following triples from the cache
Pseudocode:
go to activity stream
page = activity_stream.first
activity = page.activities.first
LOOP
switch(activity.type)
case REMOVE, UPDATE
remove all triples, and sub-graph, where <subject_uri> == activity.object.id && <object_uri>.is_a? blank_node
remove all triples where <subject_uri> == activity.object.id
case ADD, UPDATE
graph = dereference(activity.object.url.skos.nt)
add all triples, and sub-graph, where graph.triple.subject == activity.object.id && <object_uri>.is_a? blank_node
add all triples where graph.triple.subject == activity.object.id
end
if activity == page.last_activity
page = page.next
activity == page.first_activity
else
activity == activity.next_activity
end
STOP if activity.date < last_process_date
end
Acknowledgements
This specification was influenced by prior implementations for Library of Congress entity sets and Getty vocabularies.