TL:DR How Islandora 8 uses and extends Drupal fields to manage descriptive metadata. Metadata types, migrating, attributions & licensing, can pulling from external sources.
In Islandora 8, metadata is stored in Drupal, in fields attached to entities (nodes or media). This allows us to interact with metadata (add, edit, remove, display, index in a search engine…) almost entirely using standard Drupal processes. If exporting this metadata to Fedora and/or a triplestore, the values are serialized to RDF using mappings that can be set for each bundle. Islandora 8 digital objects are comprised of Drupal nodes for descriptive metadata, Drupal media for technical metadata, and Drupal files for the binary objects. This section describes how Islandora 8 uses and extends Drupal fields to manage descriptive metadata. For more information on Islandora Documentation on resource nodes.
7.x Migration Note: What about the XML?
In 7.x, metadata was usually stored using an XML schema such as MODS or DC, as datastreams attached to an object. In Islandora 8, metadata is stored as fields. This means we are breaking out individual elements from a hierarchical structure to being individual independent values. Where some hierarchy or field grouping is necessary, this can be done in Drupal using Paragraphs, a widely-used Drupal contrib module. At the moment (Nov 2019) we are working on the technical challenge of mapping data from paragraphs into RDF in Fedora. The Metadata Interest Group has developed a default mapping (spreadsheet, guidance document) which provides a basic, yet customizable, transform between MODS metadata and Drupal fields in Islandora Defaults. It is suggested that individual institutions customize the mapping to meet their unique needs.
That said, if keeping the “legacy” XML metadata from 7.x is important to you, it can be attached to an Islandora 8 resource node as a Media entity. However, there is no mechanism in Islandora 8 for editing XML in a user-friendly way.
Content Types
In Drupal, Nodes come in different subtypes called Content Types (e.g. Article, Basic page, Repository item). Content types contain fields, and configurations for how those fields can be edited or displayed. Each content type is essentially a metadata profile that can be used for a piece of web content, or to describe a digital resource. For each field in a content type, an administrator can configure how data is entered, how it can be displayed, how many values can be stored, and how long the value can be. Some configurations, such as data entry and display, can be changed at any time. Others, such as how long a value can be or what options are available in a select list, cannot be changed once content has been created without first deleting all content of that type. However, fields can be added to existing content types with no consequence.
Not all content types in your Drupal site need be Islandora “resource nodes”. A “resource node” content type will likely have behaviours (such as syncing to Fedora or causing derivatives to be generated) associated with it. This configuration, and the communication to the user of which content types are and are not considered to be Islandora resource nodes is left to the discretion of the site manager. In Islandora, a “resource node” is usually considered a descriptive record for “a thing”, and is conceptually similar to an “Islandora Object” in 7.x, i.e. a “Fedora Object” in Fedora 3.x and below.
Vocabularies
In Drupal, Taxonomy Vocabularies (or simply Vocabularies) are also entity subtypes that define a set of fields and their configurations. Whereas instances of content types are called nodes, items in a vocabulary are called taxonomy terms (or simply terms). Traditionally, taxonomy terms are used to classify content in Drupal. For instance, the Article content type includes a field field_tags that can refer to terms in the Tags vocabulary.
Islandora (through the Islandora Core Feature) creates the ‘Islandora Models’ vocabulary which includes the terms ‘Audio’, ‘Binary’, ‘Collection’, ‘Digital Document’, ‘Image’, ‘Page’, ‘Paged Content’, ‘Publication Issue’, and ‘Video’. Islandora Defaults provides contexts that cause certain actions (e.g. derivatives to happen, or blocks to appear) based on which term is used.
The Controlled Access Terms module provides additional vocabularies: - Corporate Body - Country - Family - Form - Genre - Geographic Location - Language - Person - Resource Types - Subject
Each of these vocabularies has its own set of fields allowing repositories to further describe them. The Repository Item content type has fields that can reference terms in these vocabularies. See ‘Entity Reference fields’ in the ‘Field Types’ section below.
The vocabularies provided by default are a starting point, and a repository administrator can create whatever vocabularies are desired.
Vocabularies can also be used to display sets of content. When displaying nodes, both in teaser listings on the Drupal home pages and in full, single-node view, many Drupal themes display the categories applied to the node. If the user selects any category term, Drupal will then display a browsable listing for all nodes tagged with that term. To display content tagged with both terms or with either term is possible. Here are a few examples.
Fields are where Drupal entities store their data. There are different types of fields including boolean, datetime, entity reference, integer, string, text, and text_with_summary. These field types also have widgets (controlling how data is entered) and formatters (controlling how data is displayed). The Drupal 8 documentation on FieldTypes, FieldWidgets, and FieldFormatters includes a list of the core field types. Modules can provide their own field types, formatters, and widgets. The Controlled Access Terms module provides two additional types for use with Islandora: ETDF, and Typed Relation. These are described below.
Entity Reference fields are a special type of field built into Drupal that creates a relationship between two entities. The field’s configuration options include which kind of entities can be referenced. The ‘Repository Item’ content type, provided by islandora_defaults, includes several entity reference fields that reference vocabularies defined by the islandora and controlled_access_terms modules.
The ‘Member Of’ field is an entity reference field, defined by Islandora, which is the Islandora way of imposing a hierarchical order on resource nodes. This can be used to show membership in a collection, for pages that are members of a paged item, and for members of a complex object.
EDTF
The EDTF field type is for recording dates in Extended Date Time Format, which is a format based off of the hyphenated form of ISO 8601 (e.g. 1991-02-03 or 1991-02-03T10:00:00), but also allows expressions of different granularity and uncertainty. The Default EDTF widget has a validator that only allows strings that conform to the EDTF standard. The Default EDTF formatter allows these date string to be displayed in a variety of human readable ways, including big- or little-endian, and presenting months as numbers, abbreviations, or spelling month names out in full.
Example of the same EDTF dates displayed using little-endian format:
Typed Relation
A Typed Relation field is an extension of Drupal’s Entity Reference field, which allows the user to qualify the relation. It was created for describing a resource’s contributors (modelled as taxonomy terms or some other Drupal entities) as well as their roles in this resource node (such as ‘author’, ‘illustrator’, or ‘architect’). With only Drupal’s Entity Reference fields, we would need individual fields for ‘author’, ‘illustrator’, ‘architect’, and any other roles that may need to be made available. Using a Typed Relation field, we can have one field for “Contributors” and let the user pick the role from a dropdown list.
Key features
RDF Mapping; “In Islandora, the JSON-LD Module transforms nodes (or media, or taxonomy terms) into the RDF that is synced into Fedora and the Triplestore. It uses RDF mappings, a concept defined by the RDF Module, and exposes them through the REST API” (RDF Generation, n.d.).
“The RDF Module is part of Drupal Core, but has no official documentation. The RDF Module embeds RDFa, a form of linked data, within the Drupal-generated HTML when you load the web page for a node, media, or taxonomy term. Official line is that this will allow Google to provide “rich snippets” such as star-ratings, contact info, and business hours.” [link]
A custom module rdfui exists, and is installed-but-not-enabled on boxes provisioned by the islandora-playbook. We don’t use it because it is very rudimentary and limited to the schema.org vocabulary.
URI Mapping with persistent identifiers. With access to an API for minting persistent identifiers (e.g., DataCite, EZID, etc.).[link]
The Dupal Schema.org Dataset module recently updated to extend the schema.org metatag module to support schema.org dataset spec. “DOI minting is part of UPEI’s publication workflow, where the system only registers the dataset with Datacite when a curator user ‘publishes’ the dataset, making it public. At that point it sends an XML version of the node fields content to Dataset’s API” [paraphrased from Alexander O’Neill from UPEI via slack chat]
Schema.org Dataset This module extends Schema.org Metatag module to display structured data representing datasets as JSON-LD in the head of web pages.
Migrating metadata (even complex metadata) is a simple process. In controlled_access_terms, we have an understanding of a typed_relation, which is an entity reference coupled with a MARC relator. The migration script expects an associative array that looks like this:
This is an example of the JSON output for an imported value. To see any entity’s json value add ?_format=json at the end of the URL of the entity. jsonld will work as well to get the JSON-LD values as well. Example, http://localhost:8000/taxonomy/term/339?_format=json
Although not absolutely essential it is my understanding that taxonomy vocabulary to categorized metadata should be migrated as entities first before the Islandora 7 objects and their metadata. Generating taxonomy from within a node’s migration is limited. For instance, a Subject’s authority has a title, associated authority and a URI. Automatically generating the subject from within the node migration limits to only the title. Whereas a separate migration allows for all 3 fields to be imported and referenced. Here is the list of taxonomy vocabulary list of “_categories_” of metadata.
Corporate Body: an organization or group of persons that is identified by a particular name
Family: two or more who present themselves as a family
Genre: a category characterizing a particular style
Geographic Location: Geographic area
Islandora access: Terms used to limit, restrict or coordinate access
Person: an individual of the human species.
Subject: topics, events, people, and organizations.
Media Attribution: Attach attribution and license information to media images. If you want to add an image to an article that is licensed under a Creative Commons license, this module lets you add source, author and license terms links to a caption under the image, as per the guidelines published by Creative Commons here: Best practices for attribution.
Another solution suggested To create a rights statement taxonomy from https://rightsstatements.org and then create a format that uses the term’s field_external_authority to create a link on the node’s page with a custom PHP script. This is fairly basic to accomplish the expected behaviors.
Pulling Linked Data Authority Content from External sources
Just like Samvera’s “Questioning Authority” scripts, Drupal provides several options for keeping its controlled vocabulary and authority terms in sync with 3rd party resources like the Library of Congress, Getty, GeoNames to name a few. Most of which is provided as a turnkey solution that requires not code development, dramatically speeding up development time. The Library of Congress offers a Linked Data Service to pull content for offline use. Although a real time option is available and shown in the IR section of this report, pulling content at regular intervals is ideal. The full LC Subject Headings (LCSH) export depending on which flavor is up to 430MB in size. A completely manageable size import/update and can be run occasionally without impact performance. Treating this as a simple migration or cron job activity is fairly painless to setup and maintain. The process is no different than any other content import/migration. Starting with a basic migration template, the fields can be imported and updated easily just ike subject. For an example of migrating taxonomy terms with a CSV see migrate_islandora_csv / migrate_plus.migration.knoxgardens_subject.yml.
Examples of authorities that are included for management in Islandora 8 include (but are not restricted to):
People
Faculty members
Students
Alumni
Historical persons
Organizations
Universities
Departments
Corporations
Governmental Agencies
Publishers
Journals
Events
Conferences
Symposia
Workshops
To see more Islandora / Controlled Access Terms , a Drupal 8 module creates vocabularies to represent common named entities in archival description (Corporate Bodies, Families, and Persons) as well as subject terms.