Entity Metadata Management API 0.1

Status of this Document

This is a historical draft API specification. Please use instead the latest stable version.

This Version: 0.1.0

Latest Stable Version: 1.0.0

Previous Version: None

Editors

Copyright © 2021-2024 Editors and contributors. Published by the LD4 under the CC-BY license, see disclaimer.


1. Introduction

The Entity Metadata Management API is intended to establish a consistent pattern that supports Entity Metadata Providers sharing changes to curated entities and their metadata with the community of Entity Metadata Consumers (e.g. libraries, museums, archives). Use of a consistent pattern allows for the creation of software tools for producing and consuming changes in entity metadata.

This specification is based on the Activity Streams 2.0 specification. It defines a usage pattern and minor extensions specific to entity metadata management.

1.1. Objectives and Scope

The objective of this specification is to provide a machine to machine API that conveys the information needed for an Entity Metadata Consumer to understand all the changes to entity metadata across the lifecycle of an entity. The intended audiences are Entity Metadata Providers who curate and publish entity metadata within an area of interest, Entity Metadata Consumers who use the entity metadata, and developers who create applications and tools that help consumers connect to entity metadata from providers.

The discovery of changes to entity metadata requires a consistent pattern of publication which must include a link to the entity and indication of the change or changes made. Such changes may include adding new entities, removing existing entities, and any other edits to current entities and/or their metadata.

This process can be optimized if Entity Metadata Providers publish changes in chronological order, including descriptions of how each entity’s metadata has changed, enabling consuming systems to retrieve only the resources that have been modified since they were last retrieved.

This specification does not include a mechanism for enabling change notifications to be pushed to remote systems. Only periodic polling for the set of changes that must be processed is supported. A push mechanism may be added in a future version.

Work that is out of scope of this API includes the recommendation or creation of any descriptive metadata formats, and the recommendation or creation of metadata search APIs or protocols. The diverse domains represented across entity metadata already have standards fulfilling these use cases. Also out of scope is optimization of the transmission mechanisms providing access points for consumers to query.

1.2. Use Cases

The following three use cases motivate this specification. They are drawn from workflows needed by libraries, museums, and archives.

1.2.1. Change Tracking

Entity Metadata Consumers may want to learn about modifications or deletions for entities they use, or about the creation of new entities by the same provider.

To address this generic use case, the provider creates and makes available a list of changes with the URIs for any new, modified, or deleted entities. While the provider may have internal needs for tracking more than these three moments in an entity’s lifecycle (e.g. if the provider workflow requires a review activity), this specification focuses on public changes to the dataset that may require action from a consumer. The consumer will need to take additional actions to identify and act on changes to entities of interest, which many be automatic or manual.

1.2.2. Local Cache of Labels

Entity Metadata Consumers persist references to entity metadata by saving the URI as part of their local datastore. URIs may not be understandable to end users. In order to be able to display a human readable label, a label may be retrieved from the provider’s datastore by dereferencing the URI. For performance reasons, the label is generally cached in the local datastore to avoid having to fetch the label every time the entity reference is displayed to an end user. If the label changes in the provider’s datastore, the consumer would like to update the local cache of the label.

To address this use case, the provider creates and makes available a list of URIs and their new labels. The consumer can compare the list of URIs with those stored in the local application and update the cached labels.

In some cases, additional metadata is also cached as part of the external reference, but this is less common. Verification of the additional metadata may require the consumer to take additional actions.

1.2.3. Local Cache of Full Entity Metadata

A consumer may decide to make a cache of a dataset of full entity metadata. This is commonly done for increased control over uptime, throughput, and indexing for search. The cache needs timely updates to stay in sync with the source dataset.

To address this use case, the provider creates and makes available a dated list of all new, modified, and deleted entities along with specifics about how the entities have changed. The consumer can process a stream of change documents that was published since their last incremental update. Specific details about each change can be used to update the local cache.

In some cases, caching of full descriptions of a subset of entities may be desired, for example limiting to only those entities referenced in local bibliographic data.

1.3. Terminology

1.3.1. Roles

  • Entity Metadata Provider: An organization that collects, curates, and provides access to metadata about entities within an area of interest.
  • Entity Metadata Consumer: An organization that references or caches entity metadata from an Entity Metadata Provider.

1.3.2. Terms about Entities

  • Entity: An entity is any resource (a thing or a concept) identified with a URI that we may want to reference or make use of in a data set. Entities include, but are not limited to, what are referred to authorities, controlled vocabulary terms, or real world objects (RWOs) in library, archives, and museum domains.
  • Entity Set: A set of entities that are grouped together by an Entity Metadata Provider. Entities can be grouped based on various criteria (e.g. subject headings, names, thesaurus, controlled vocabulary).

1.3.3. Terms from Activity Streams

This specification is based on the Activity Streams 2.0 specification and uses the following key terms from Activity Streams:

  • Activity: Activity objects are used to describe an individual change to the metadata of an Entity Set.
  • OrderedCollection: The entry point for all the information about changes to the metadata of an Entity Set. The OrderedCollection type indicates that the activities in the collection are in time order.
  • OrderedCollectionPage: The complete collection of changes is expressed as a set of OrderedCollectionPage objects to ensure that there are manageable chunks of change activities described even for large and long-running sets of updates.

Many properties from Activity Streams are used, and are described throughout this document.

1.3.4. Terms from Other Specifications

This specification uses the following terms:

  • HTTP(S): The HTTP or HTTPS URI scheme and internet protocol.
  • Javascript Object Notation (JSON): The terms array, JSON object, number, and string in this document are to be interpreted as defined by the Javascript Object Notation (JSON) specification.
  • JSON-LD: Entitiy Metadata Management context is defined following JSON-LD specification.
  • RFC 2119: The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.
  • URI: URIs are defined following the IANA URI-Schemes specification.

2. Architecture

This specification provides an API via which Entity Metadata Providers can publish information about changes in entity metadata, which Entity Metadata Consumers can follow. Changes in entity metadata over time are communicated from providers to consumers via Entity Change Activities. These are collected together in Change Set documents that are organized under an Entry Point as shown in the diagram below.

Entity Metadata Management API Architecture representing changes using Activity Streams

2.1. Activity Streams and Extensibility

This specification is based on the Activity Streams 2.0 specification. The following sections describe the use of Activity Streams to meet Entity Metadata Management use cases. They describe only the Activity Streams classes and properties used, and any restrictions or additional semantics in the context of this specification. Implementations MAY use other properties from Activity Streams or elsewhere for extension, and consumers SHOULD ignore any properties not defined in this specification that they don’t understand.

2.2. JSON-LD Representation

The use of JSON-LD with a specific @context allows Entity Metadata Consumers to parse the resulting documents using standard JSON tools, and also allows the data to be interpreted according to the RDF Data Model (see Relationship to RDF).

In the simplest form, a JSON-LD @context maps terms to IRIs. All Entity Metadata Management API responses MUST include the Activity Streams 2.0 context definition at the top-level of each API response:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  // rest of API response
}

It is RECOMMENDED that implementations also include the Entity Metadata Management context, in which case the value of @context will be a list. The Entity Metadata Management context includes definition of the term Deprecate and MUST thus be included if the Deprecate activity is used. Including the Entity Metadata Management context also serves to signal to consumers that this specification is being followed.

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    "https://emm-spec.org/0.1/context.json"
  ],
  // rest of API response
}

Implementations MAY include additional extension contexts. Extension contexts MUST be listed before the Activity Stream context and Entity Metadata Management contexts. Implementations MAY also use additional properties and values not defined in a JSON-LD @context with the understanding that any such properties will likely be unsupported and ignored by consuming implementations that use the standard JSON-LD algorithms.

3. API Responses

3.1. Entry Point

Reference: OrderedCollection description

An Entry Point is an Activity Streams OrderedCollection resource identifying a dataset whose changes are published using the Activity Streams vocabulary with Entity Metadata Management enhancements. It provides pointers to one or more Change Sets.

The Entry Point MUST be implemented as an OrderedCollection following the definition in the Activity Stream specification. The key points are repeated here with examples specific to Entity Metadata Management.

Each Entity Set MUST have at least one Entry Point. It MAY have multiple Entry Points to satisfy different use cases. For example, one Entry Point may provide detailed changes to support incremental updates of a full cache and a second may only provide a list of primary label changes.

Complete example for an Entry Point

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    "https://emm-spec.org/0.1/context.json"
  ],
  "summary": "My Authority - Change Documents",
  "type": "OrderedCollection",
  "id": "https://data.my.authority/change_documents/2021/activity-stream",
  "url": "https://my.authority/2021-01-01/full_download",
  "first": {
    "type": "OrderedCollectionPage",
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/1"
  },
  "last": {
    "type": "OrderedCollectionPage",
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/12"
  },
  "totalItems": 123
}

@context

Reference: JSON-LD context

The Entry Point MUST have a @context property as described in JSON-LD Representation.

summary

Reference: summary property definition

The summary is a natural language summarization of the purpose of the Entry Point

The Entry Point SHOULD have a summary property. For an Entry Point, the summary MAY be a brief description of the Entity Set in which the described changes occurred. If there are multiple entry points to the same collection, it is RECOMMENDED that the summary include information that distinguishes each entry point from the others.

  "summary": "My Authority - Entity Change List"
  "summary": "My Authority - Incremental Updates from 2022-01-01 Full Download"

type

Reference: type property definition

The type property identifies the Activity Stream type for the Entry Point.

The Entry Point MUST have a type property. The value MUST be OrderedCollection.

  "type": "OrderedCollection"

id

Reference: id property definition

The id is a unique identifier of the Entry Point.

The Entry Point MUST have an id property. The value MUST be a string and it MUST be an HTTP(S) URI. The JSON representation of the Entry Point MUST be available at the URI.

  "id": "https://data.my.authority/change_documents/2021/activity-stream"

url

Reference: url property definition

The Entry Point MAY have a url property providing one or more links to representations of the Entity Set. If there are multiple links then the value of the url property will be an array.

A common use of the url property is a link to the full download for the collection.

  "url": "https://my.authority/2021-01-01/full_download"

first

Reference: first property definition

The Entry Point SHOULD have a first property to indicate the first Change Set in this Entry Point for the Entity Collection. If present, the value MUST be either:

  • a string that is the HTTP(S) URI of the first page of items in the Entry Point, or
  • a JSON object, with at least the id and type properties. The value of the id property MUST be a string that is the HTTP(S) URI of the first page of items in the Entry Point. The value of the type property MUST be the string OrderedCollectionPage.
  "first": {
    "type": "OrderedCollectionPage",
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/1"
  }  

last

Reference: last property definition

The Entry Point SHOULD have a last property to indicate the last Change Set in this Entry Point for the Entity Collection. If present, the value MUST be either:

  • a string that is the HTTP(S) URI of the last page of items in the Entry Point, or
  • a JSON object, with at least the id and type properties. The value of the id property MUST be a string that is the HTTP(S) URI of the last page of items in the Entry Point. The value of the type property MUST be the string OrderedCollectionPage.
  "last": {
    "type": "OrderedCollectionPage",
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/12"
  }

totalItems

Reference: totalItems property definition

The count of all Entity Change Activities across all Change Sets in the Entry Point for the Entity Collection.

The Entry Point MAY have a totalItems property. If included, the value MUST be an integer, and it SHOULD be the cumulative count of Entity Change Activities across all Change Sets.

  "totalItems": 123

3.2. Change Set

Reference: OrderedCollectionPage description

A Change Set is an Activity Streams OrderedCollectionPage resource identifying individual Entity Change Activities, which are resources that have been created, modified, or deprecated. It may additionally help with identifying preceding or subsequent Change Sets for automated crawling.

Each time a set of changes is published, changes MUST be released in at least one Change Set. Changes MAY be published across multiple Change Sets. For example, a site may decide that each Change Set will have at most 50 changes and if that maximum is exceeded during the release time period, then a second Change Set will be created.

The Entity Change Activities within a Change Set MUST be sorted in date-time order in the orderedItems array. The Entity Change Activities MAY be in ascending or descending order, but the order MUST be consistent within the Collection.

Where there are multiple Change Sets, these sets MUST be arranged in ascending or descending date-time order, consistent with the Entity Change Activity ordering within each Change Set.

It is RECOMMENDED that change sets be published on a regular schedule. It is recognized that there are many factors that can impact implementation, including but not limited to, the volume of changes, the consistency of timing of changes, the tolerance of consumers for delays in the publication schedule, and resources for producing Change Sets.

Change Sets MUST be implemented as an OrderedCollectionPage following the definition in the Activity Stream specification. The key points are repeated here with examples specific to Entity Metadata Management.

Complete example for a Change Set

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    "https://emm-spec.org/0.1/context.json"
  ],
  "type": "OrderedCollectionPage",
  "id": "https://data.my.authority/change_documents/2021/activity-stream/page/2",
  "partOf": {
    "type": "OrderedCollection",
    "id": "https://data.my.authority/change_documents/2021/activity-stream"
  },
  "totalItems": 2,
  "prev": {
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/1",
    "type": "OrderedCollectionPage"
  },
  "next": {
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/3",
    "type": "OrderedCollectionPage"
  },
  "orderedItems": [
    {
      "type": "Create",
      "published": "2021-02-01T15:04:22Z",
      "object": {
        "id": "https://my.authority/term/milk",
        "type": "http://www.w3.org/2004/02/skos/core#Concept",
        "updated": "2021-02-01T15:04:22Z"
      }
    },
    {
      "type": "Create",
      "published": "2021-02-01T17:11:03Z",
      "object": {
        "id": "https://my.authority/term/bovine_milk",
        "type": "http://www.w3.org/2004/02/skos/core#Concept",
        "updated": "2021-02-01T17:11:03Z"
      }
    },
    {
      "type": "Deprecate",
      "published": "2021-02-01T17:11:03Z",
      "object": {
        "id": "https://my.authority/term/cow_milk",
        "type": "http://www.w3.org/2004/02/skos/core#Concept",
        "updated": "2021-02-01T17:11:03Z"
      }
    }
  ]
}

@context

Reference: JSON-LD context

The Change Set MUST have a @context property as described in JSON-LD Representation.

type

Reference: type property definition

The type property identifies the Activity Stream type for the Change Set.

The Change Set MUST have a type property. The value MUST be OrderedCollectionPage.

  "type": "OrderedCollectionPage"

id

Reference: id property definition

The id is a unique identifier of the Change Set.

The Change Set MUST have an id property. The value MUST be a string and it MUST be an HTTP(S) URI. The JSON representation of the Change Set MUST be available at the URI given.

  "id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"

partOf

Reference: id property definition

The partOf property provides a link from the Change Set to the Entry Point is it part of.

The Change Set MUST have a partOf property. The value MUST be either:

  • a string that is the HTTP(S) URI of the Entry Point, or
  • a JSON object, with at least the id and type properties. The value of the id property MUST be a string that is the HTTP(S) URI of the Entry Point. The value of the type property MUST be the string OrderedCollection.
  "partOf": {
    "type": "OrderedCollection",
    "id": "https://data.my.authority/change_documents/2021/activity-stream"
  }

totalItems

Reference: id property definition

A count of the number of items in the Change Set.

The Change Set SHOULD have a totalItems property. If present, the value MUST be a non-negative integer that corresponds with the number of items in the orderedItems array in this Change Set.

  "totalItems": 3

prev

Reference: prev property definition

A link to the previous Change Set in this Entry Point for the Entity Collection.

The Change Set MAY have a prev property if there are preceding Change Sets in the Entry Point for this Entity Collection. If present, the value MUST be either:

  • a string that is the HTTP(S) URI of the previous page of items in the Entry Point, or
  • a JSON object, with at least the id and type properties. The value of the id property MUST be a string that is the HTTP(S) URI of the previous page of items in the Entry Point. The value of the type property MUST be the string OrderedCollectionPage.
  "prev": "https://data.my.authority/change_documents/2021/activity-stream/page/1"

next

Reference: next property definition

A link to the next Change Set in this Entry Point for the Entity Collection.

The Change Set MUST have a next property if there are subsequent Change Sets in the Entry Point for this Entity Collection. The value MUST be either:

  • a string that is the HTTP(S) URI of the next page of items in the Entry Point, or
  • a JSON object, with at least the id and type properties. The value of the id property MUST be a string that is the HTTP(S) URI of the next page of items in the Entry Point. The value of the type property MUST be the string OrderedCollectionPage.
  "next": "https://data.my.authority/change_documents/2021/activity-stream/page/3"

orderedItems

The list of Entity Change Activity entries in the Change Set.

The Change Set MUST have an orderedItems property which is an array of Entity Change Activity objects as described below.

  "ordredItems": [
    // Entity Change Activity objects inserted here
  ]

3.3. Entity Change Activities

Reference: Activity description

An Entity Change Activity advertises a change to a resource. The change may be its creation, a modification, or its deprecation, among others.

A change to Entity Metadata MUST be described in an Entity Change Activity. An Entity Change Activity MUST be implemented as an Activity Streams Activity. The activity MUST provide information about the type of change and the entity or entities changed. It MAY provide links that facilitate the consumer gathering additional information from the source dataset.

Not all implementations will store every change for an entity over time. A Collection MAY provide feeds of only the last known metadata update for each entity. In the case where the Collection provides feeds of only the last known metadata update for each entity case, the page identifier cannot be used to know the last Activities processed by a consumer. For this reason the Activities within the Collection MUST have either a published or endTime datetime property as described below. The updated property SHOULD be used on the entity description object to indicate when the entity change actually occurred. This level is sufficient to address the Change Tracking use case.

Entity Change Activity objects appear in the orderedItems array within a Change Set response.

Example excerpt for an Entity Change Activity

{
  "summary": "Add entity for subject Science",
  "published": "2021-08-02T16:59:54Z",
  "type": "Add",
  "partOf": {
    "type": "OrderedCollectionPage",
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
  },
  "object": {
    "type": "http://www.w3.org/2004/02/skos/core#Concept",
    "id": "http://my_repo/entity/science",
    "updated": "2021-08-02T16:59:54Z"
  }
}

Properties shared across all Entity Change Activity types are described here. Specific activity types relevant to Entity Metadata Management are described in the Types of Change section.

summary

Reference: summary property definition

For an Entity Change Activity, the summary is a brief description of the change to entity metadata that the activity represents. It is RECOMMENDED that a summary be included and that it reference the type of change (e.g. “Add entity”) and the entity being changed (e.g. “subject Science”).

  "summary": "Add entity for subject Science"

There are a limited set of types of change. See Types of Change section for a list of types and example summaries for each. Identification of the entity will vary depending on the data represented in the Entity Set.

published or endTime

Reference: published and endTime property definitions

The datetime at which the Entity Change Activity ended or was added to the Change Set.

Each Entity Change Activity MUST have either a published property or an endTime property. It is RECOMMENDED that the published property is used. In either case, the value must be a datetime as defined in the corresponding Activity Streams property definitions (e.g. published).

  "published": "2021-08-02T16:59:54Z"
  "endTime": "2021-08-02T16:59:54Z"

type

Reference: type property definition

Each Entity Change Activity MUST have a type property.

The type is the one of a set of predefined Entity Change Activity types: Create, Add, Update, Deprecate, Delete or Remove. See Types of Change section for more details.

  "type": "Create"

partOf

Reference: partOf property definition

The partOf property identifies the Change Set in which this activity was published.

An Entity Change Activity MAY use the partOf property to refer back to the Change Set that includes the activity. When used on an Activity, the partOf property MUST NOT be used for any other purpose. The value MUST be a string and it MUST be an HTTP(S) URI. The JSON representation of the Change Set publishing this activity MUST be available at the URI.

  "partOf": {
    "type": "OrderedCollectionPage",
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
  }

object

Reference: [object][org-w3c-activitystreams-property-object] property definition

The entity that is the subject of the Entity Change Activity, along with its update datetime.

An Entity Change Activity MUST include an object property. The value MUST be a JSON object with the following sub-properties:

  • A RECOMMENDED type property that is either a URI string or a plain string indicating the entity type.
  • A REQUIRED id property that is the URI of the entity involved in the Entity Change Activity.
  • A RECOMMENDED updated property that gives the datetime of the change to the entity.
  "object": {
    "type": "http://www.w3.org/2004/02/skos/core#Concept",
    "id": "http://my_repo/entity/science",
    "updated": "2021-08-02T16:59:54Z"
  }

4. Types of Change

All Entity Change Activities have a core set of properties that are described in the Entity Change Activity section. Some properties are specific to the Types of Change. This section provides examples and descriptions of the Entity Change Notification for each type of change. They also describe the relationship between similar Activity Types (e.g. Create vs. Add).

4.1. New Entity

Reference: add and create activity definitions

A new entity, either a newly created entity or existing entity added to Entry Point stream for the first time, SHOULD have an Entity Change Activity with a type of either Create or Add.

A new entity MUST be implemented as an Activity following the Create type definition or the Add type definition in the Activity Stream specification. The key points are repeated here with examples specific to Entity Metadata Management.

Create vs. Add

An entity appearing in an Entry Point stream for the first time MUST use Activity type Create and/or Add.

Create SHOULD be used when the entity is new in the source dataset and available for use. A provider MUST NOT use Create to broadcast that an entity exists unless it can be dereferenced at the entity URI. A Create activity indicates that the entity is new and available for use by consumers, see also Add below.

Add SHOULD be used when the entity exists in the source dataset, but was previously not available through the Entry Point and now is being made available in the stream. Situations where this might happen include, but are not limited to, change in permissions, end of an embargo, temporary removal and now being made available.

A new Entry Point stream MAY choose to populate the stream with all existing entities from the source dataset. In this case, the initial population of the stream with all existing entities SHOULD use Add.

Example Entity Change Activity excerpt for Create

{
  "summary": "New entity for term milk",
  "published": "2021-08-02T16:59:54Z",
  "type": "Create",
  "partOf": {
    "type": "OrderedCollectionPage",
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
  },
  "object": {
    "type": "http://www.w3.org/2004/02/skos/core#Concept",
    "id": "http://my_repo/entity/cow_milk",
    "updated": "2021-08-02T16:59:54Z"
  }
}

4.2. Update Entity

An updated entity SHOULD have an Entity Change Activity with a type of Update.

An updated entity MUST be implemented as an Activity following the Update type definition in the Activity Stream specification. The key points are repeated here with examples specific to Entity Metadata Management.

Examples of updates in the library domain include splits and merges. See Deprecate Entity below for an illustration of how to reflect these scenarios without explicitly typing them as splits or merge activities using a sequence of related activities.

Example Entity Change Activity excerpt for Update

{
  "summary": "Update entity term milk",
  "published": "2021-08-02T16:59:54Z",
  "type": "Update",
  "partOf": {
    "type": "OrderedCollectionPage",
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
  },
  "object": {
    "type": "http://www.w3.org/2004/02/skos/core#Concept",
    "id": "http://my_repo/entity/milk",
    "updated": "2021-08-02T16:59:54Z"
  }
}

4.3. Deprecate Entity

Deprecation indicates that an existing entity in the authority has been updated to reflect that it should no longer be used though the URI remains dereferencable. Whenever possible, the entity description should indicate which entity or entities should be used instead.

There are two common scenarios. In the first, the replacement entity or entities already exist and the Deprecate activity updates the deprecated entity only. In the second scenario, the replacement entity or entities do not exist prior to the deprecation. In this case, the replacement entity or entities are created and the status of the original entity is changed to deprecrated.

An entity that has been deprecated SHOULD have an Entity Change Activity with the type Deprecate. The two scenarios are implemented as follows:

  • A single Deprecate activity when the entity that is replacing the deprecated entity already exists, or if the deprecated entity is not replaced.
  • A Deprecate activity for the deprecated entity, and one or more activities (e.g. Create, Update, Add) for the replacement entity or entities. In all cases, it is expected that the consumer will dereference the deprecated entity URI to obtain the updated entity description, including whether it was replaced.

Note that the Entity Metadata Management context includes definition of the term Deprecate and thus MUST be included in the @context definition if Deprecate activities are used. See JSON-LD Representation for more details.

Example Entity Change Activity excerpt for Deprecate in the Scenario where a Replacement Entity Already Exists

{
  "summary": "Deprecate term cow milk",
  "published": "2021-08-02T16:59:57Z",
  "type": "Deprecate",
  "partOf": {
    "type": "OrderedCollectionPage",
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
  },
  "object": {
    "type": "http://www.w3.org/2004/02/skos/core#Concept",
    "id": "http://my_repo/entity/cow_milk",
    "updated": "2021-08-02T16:59:57Z"
  }
}

Example Entity Change Activity excerpt for Deprecate in the Scenario where a Replacement Entity is Created

  {
    "type": "Create",
    "published": "2021-02-01T17:11:03Z",
    "object": {
      "id": "https://my.authority/term/bovine_milk",
      "type": "http://www.w3.org/2004/02/skos/core#Concept",
      "updated": "2021-02-01T17:11:03Z"
    }
  },
  {
    "type": "Deprecate",
    "published": "2021-02-01T17:11:03Z",
    "object": {
      "id": "https://my.authority/term/cow_milk",
      "type": "http://www.w3.org/2004/02/skos/core#Concept",
      "updated": "2021-02-01T17:11:03Z"
    }
  }

4.4. Delete Entity

It is RECOMMENDED that entities be marked as Deprecated in the source dataset instead of deleting the entity from the source dataset. If the entity is deprecated, follow the Entity Change Activity described in Deprecate Entity.

An entity that has been fully deleted from the source dataset where the entity URI is no longer dereferencable SHOULD have an Entity Change Activity with a type of Delete or Remove.

A deleted entity MUST be implemented as an Activity following the Delete type definition or the Remove type definition in the Activity Stream specification. The key points are repeated here with examples specific to Entity Metadata Management.

Example Entity Change Activity excerpt for Delete

{
  "summary": "Delete term cow_milk",
  "published": "2021-08-02T16:59:54Z",
  "type": "Delete",
  "partOf": {
    "type": "OrderedCollectionPage",
    "id": "https://data.my.authority/change_documents/2021/activity-stream/page/2"
  },
  "object": {
    "type": "http://www.w3.org/2004/02/skos/core#Concept",
    "id": "http://my_repo/entity/cow_milk",
    "updated": "2021-08-02T16:59:54Z"
  }
}

Appendices

A. Provider Workflows

This section describes how an Entity Metadata Provider can implement this specification to allow consumers to follow changes to entity descriptions in the provider’s data set.

A.1 Provider Decisions

The choice of how often to create new Change Sets will depend upon how frequently entities are updated, consumers’ expectations for timely updates, resource constraints, and likely other local considerations. Two common approaches are to create Change Sets at predetermined time intervals (e.g. hourly, nightly, weekly, monthly), or after a certain number of changes have occurred (e.g. 10, 20, 100, 500 changes).

The Local Cache of Labels and Local Cache of Full Dataset use cases require the consumer to be able to download a copy of all entities in the dataset before following changes. Coordination of snapshots of the full dataset with the production of Changes Sets will make this easier.

A.2 Creating Full Downloads

When a full download of the dataset is created, the producer should:

  • If already creating Change Sets, write any unrecorded entity changes to a last Change Set before the snapshot.
  • Record the datetime when the snapshot for the download was taken.
  • On the human readable download page, include a link to the download file and indicate the datetime of creation.
  • Create or update the Entry Point to include the new download in the url property.

A.3 Creating Change Sets

The provider must record information about changes in the Entity Set as they occur, then at some point write a Change Set and make accompanying changes to the Entry Point.

Recording Changes Made to the Entity Set

For each change in an Entity Set, the provider must record all information necessary to write the Activity entry in a Change Set. This includes:

  • The dereferenceable URI for the entity
  • The type of entity (e.g. http://vocab.getty.edu/ontology#Subject)
  • The Activity type of change (e.g. Add, Update, Deprecate)
  • The datetime of the change to the entity
  • A recommended summary description of change (e.g. “Add term Science”)
Publishing a Change Set

After some time recording changes, a provider publishes a new Change Set linked from an Entry Point. Several URIs for new and existing objects will be referenced in the algorithm below:

  • entry_point_uri - the URI of the newly created or existing Entry Point
  • prev_change_set_uri - the previous Change Set URI in the Entry Point’s last property
  • change_set_uri - the new URI that will resolve to the new Change Set

and for each change recorded:

  • entity_uri - the URI of the entity changed
  • change_activity_uri - the new URI that will resolve to the new Entity Change Activity describing the change

With these URIs the new Change Set can be created as follows:

  • set the id property to change_set_uri
  • set the partOf property to use entry_point_uri for id
  • set the prev property to use prev_change_set_uri for id
  • set the totalItems property to the number of change activities that will be in this change set
  • for each change, from oldest to newest or newest to oldest, add an Activity to the orderedItems property array, and:
    • set the summary property to the human readable description of the change
    • set the published (or endTime) property to the datetime the activity is being published
    • set the type property to the change type (e.g. Add, Update)
    • set the id property to the change_activity_uri for this change
    • set the object property to be a JSON object with the following properties:
      • set the id property to the entity_uri
      • set the type property to the entity type
      • set the updated property to the datetime of the change to the entity
      • set the summary property to the human readable description of the change

Update the previous Change Set:

  • add a next property that points to the new Change Set

Update the Entry Point:

  • if this is the first Change Set published, add the first property in the entry point with:
    • set the type property to OrderCollectionPage
    • set the id property to the change_set_uri
    • set the published property to the datetime the Change Set is being published
  • add or update the last property in the Entry Point:
    • set the type property to OrderCollectionPage
    • set the id property to the change_set_uri
    • set the published property to the datetime the Change Set is being published

For each change create a separate Entity Change Activity document at the change_activity_uri with the same information used in the Change Set.

B. Consuming Entity Change Sets

Activity streams are inherently temporal constructs, and as such, the order of presentation in a stream may be forward (i.e. the starting point in the stream reflects its oldest elements and consuming the stream involves newer and newer elements) or it may be reverse (i.e. the starting point in the stream reflects its most recent elements and consuming the stream involves older and older elements). Likewise, the content of a given page in the stream may be immutable (i.e. once published the content of a given page never changes) or it may be mutable (i.e. the content of a given page can be updated and can differ from release to release). This specification espouses no preference of either approach. Rather, example approaches to each are presented below.

  Forward Reverse
Mutable Mutable Forward Pages can be updated, and the content of a given page can differ by release. Older pages appear earlier in the stream than newer pages. Mutable Reverse Pages can be updated, and the content of a given page can differ by release. Older pages appear later in the stream than newer pages.
Immutable Immutable Forward Once published, pages never change. Older pages appear earlier in the stream than newer pages. Immutable Reverse Once published, pages never change. Older pages appear later in the stream than newer pages.

Of these four possibilities, we describe mutable reverse, of which the Library of Congress’s activity stream is an example, and immutable forward, of which Getty Vocabularies’ activity stream is an example. Regarding the remaining two possibilities, mutable forward, while feasible, requires the entire stream to be processed at each release, as there is no way of establishing where in the stream a change might occur. Immutable reverse is inherently infeasible, as it requires that new content appear first, but on a page that cannot be changed.

B.1 Consuming a mutable reverse stream (e.g. Library of Congress)

The Library of Congress provides an activity stream for several authorities (e.g. names, genre/forms, subjects). See: https://id.loc.gov/techcenter/.

Characteristics:

  • an entity will appear in the activity stream at most one time
  • the published date of the activity for an entity will be the date of the most recent change of the object of the activity
  • the first page of the stream has the most recent activities
  • activities on a page are listed from newest to oldest

Assumptions:

  • the consumer processes activities in descending date order, as presented in the stream
  • the consumer maintains a persistent reference to the last activity date processed in the stream (published)

Recommendations:

  • if maintaining a full cache, ingest latest full download before processing the related activity stream
  • each time the activity stream is processed, save the date of the more recently processed entity

Pseudocode (to consume updated resources since a specific date):

# uri_of_first_activity_stream_page = Input URI of first Activity Stream page
# date_from = Date of last activity processed in previous processing run.
# last_update = Date of last activity processed in current processing run.

func process_as(date_from, as_uri)
    activity_stream_page = get as_uri
    for each activity in activity_stream_page
        if activity.published >= date_from then
            process activity by type
            last_update = activity.published
        else
            return

        if activity.last == true and activity.published >= date_from then
            process_as(date_from, activity_stream.next)
end func

process_as(date_from, uri_of_first_activity_stream_page)
# for next run: date_from = last_update

B.2 Consuming an immutable forward stream (e.g. Getty)

The Getty Research Institute provides activity streams for vocabularies (https://data.getty.edu/vocab/activity-stream) and museum objects (https://data.getty.edu/museum/collection/activity-stream).

Characteristics:

  • an entity will appear in the activity stream one or more times
  • the first page of the stream has the oldest activities
  • activities on a page are listed from oldest to newest
  • the date of an activity is the time of that given modification

Assumptions:

  • the consumer fully processes all activities appearing in a given page in the stream
  • the consumer maintains a persistent reference to the last page processed in the stream (last_page)

Pseudocode:

go to the activity stream
current = activity.last_page

while (current.next != null)
  for each activity in current
    process activity by type
  end
end

B.3 Discussion

The Library of Congress’ mutable reverse approach is inherently the most compact, as any given entity appears in the stream exactly once, at its most recent point of modification. However, this is accomplished by completely regenerating the activity stream in its entirety whenever new content is made available. Getty’s immutable forward approach yields pages that are immutable – once issued a page will never be altered – with new content appearing incrementally on new pages attached to the end of the page sequence comprising the stream. Any given entity may appear multiple times in the stream, reflecting the number of modifications it has undergone over its life, and each appearance need only update the entity rather than provide a complete representation.

Acknowledgements

We are grateful to all participants in the LD4 Best Practices for Authoritative Data Working Group, within which this specification was created. E. Lynette Rayle (formerly at Cornell University) led the initial development of this specification. Jim Hahn (University of Pennsylvania Libraries), Kirk Hess (OCLC R&D), Anna Lionetti (Casalini Libri), Tiziana Possemato (Casalini Libri), and Erik Radio (University of Colorado Boulder) also contributed to this work.

This specification was informed by prior Activity Streams implementations for Library of Congress entity sets and Getty Vocabularies.