Github

Uploading, ingesting, and migration

Typical Drupal 8 migration
Typical Drupal 8 migration https://nuvole.org/sites/default/files/migration-middle-format-process.jpg

TL:DR Basically charter questions.

Ingesting can be done in the UI but there are additional options for ingestion of content with complicated setups and complex metadata. Ingesting either by batch or UI offers a lot of flexibility and there are a few ways to achieve an almost completely lossless migration. This may seem a bit misleading vut not all MODS/DC values can be identically represented in RDF form. In 7.x, metadata was usually stored using an XML schema such as MODS or DC, as datastreams attached to an object. In Islandora 8, metadata is stored as fields. This means we are breaking out individual elements from a hierarchical structure to being individual independent values allowing for a potentially lossless process for data migration. Migrating URIs associated with a person, place, event, etc. is done by creating a taxonomy reference URI and associating it with a relationship (for a person Drupal uses a MARC relator) to a node/object. URI graphs support relationships in ways that weren’t available before. Automatically fetching content for DOIs or rights statements can be done. A few options are covered at OAI-PMH & DOI Minting. If keeping the “legacy” XML metadata from 7.x is important, it can be attached to an Islandora 8 resource node as a Media entity and treated in a similar way as a binary file in 7.x. Either way no metadata is lost. And the only obvious gotcha is there is no mechanism in place for migrating all previous versions of an object maintaining that history. There are custom methods that can be done to achieve this but a discussion needs to happen on whether this is worth the effort.

Overview

The Migrate API is the main way to ingest batches of data into Drupal (and because Islandora 8 is Drupal, into Islandora). The Migrate module only provides the framework, it’s up to you to create the rules that take data from a source, through a process (i.e. a mapping) to a destination. A set of these rules is called a “migration.” It has to be set up (as a Configuration Entity, either by importing a YAML file or by installing a Feature) and then it has to be run.

Once a migration has been run, it will have created (or updated) a bunch of Drupal entities of one type - whether that’s taxonomy terms, nodes, files, etc. Since an Object in Islandora 8 is made up of several different Drupal entities that refer to each other, it’s going to take multiple migrations to create an Islandora object, and it’s important to perform these migrations in a sensible order.

A basic Islandora object is at minimum: - a file, which holds the actual binary contents of an item - a node, which holds the descriptive metadata for an item and makes it discoverable - a media, which holds technical metadata and references the file and the node, linking the two together.

A typical migration migrates files and CSV metadata into Islandora using only YAML files. This process transforms data with pipelines of processing plugins and can handle numeric, text, and entity reference fields. It can handle multiple values for fields, and even more complicated things like typed_relation fields. This description only scratches the surface of what can be done with the Migrate API.

There’s certainly more you can do with Drupal 8’s Migrate API. There’s a plethora of source and processing plugins out there that can handle pretty much anything you throw at it; XML and JSON are fair game. You can also request sources using HTTP, so you can always point it at an existing systems REST API and go from there. If you can’t make the Migrate API’s existing workflow make the necessary changes to your data, you can expand its capabilities by writing your own process plugins. Reading the Drupal.org documentation on the Migrate API would be a good place to start or ask the community if someone already has a solution for the challenge you’re facing.

Charter questions

  1. Supports migration of objects and their associated datastreams from the Fedora 3 instance to Islandora 8. There are a couple of prebuilt solutions including a process that utilizes a Solr query and preconfigured mappings to achieve a full migration. Another technique utilizes the Drupal 8 module “Migrate Islandora CSV” (Lamb, 2019). As the title suggests this module utilizes a CSV file to migrate and can be generated via the Islandora 7x module “Islandora Get CSV” (Jordan, 2020).
  2. Islandora has a few options for migrating from Islandora 7. There are 2 core migration options that Islandora supports directly; Islandora CSV migration and the migrate 7x claw (a fedora to fedora option). Another option developed was the Islandora workbench.
    • Islandora CSV is the preferred option for it’s simplicity. All of the migration processes in this option are both iterable and reversible. If changes need to be made to the object or it’s metadata the changes can be either updated or rolled back. This is an extremely important feature. This allows for the migration of collections of objects to be migrated with only minimal mappings (if needed). This means all RDF mapping doesn’t have to be complete to migrate the entire collection. RDF is not a flat MODS file that references some URIs. This is a problem with MODS. If a URI was to change, every MODS file must be updated to reflect the new destination. In RDF, the URI is stored and referenced as an entity and not statically in a metadata file. It’s ideal to separate out any value that will be an entity; a subject, person, place, corporation, event, etc. Regardless of the entity type the process of migration includes 3 basic steps; download the source, process it and then save it.
      ETL Process
    • Islandora’s migrate_7x_claw module is a Drupal 8 Migrate Plus migration that pulls content directly from an Islandora 7.x Solr/Fedora server. This module assumes all PIDs are indexed in Solr.
             - Currently, the following content models can be migrated over with full functionality:
                 * Collection
                 * Basic Image
                 * Large Image
                 * Audio
                 * Video
                 * PDF
                 * Binary
    • Islandora 8 supports batch ingest of objects through the Islandora Workbench repository (Jordan, 2020). This app utilizes a CSV file to import the content (objects), the base fields, and can generate new taxonomy terms when configured to do so.
  3. Supports ingesting objects via webforms. This is a basic fundamental functionality of Drupal 8 that Islandora leverages for all types of UI ingests and modification.
  4. Supports uploading objects via a form and having that object approved and published by another user. This mediated process is done through a standard Drupal concept called an editorial workflow. This process is for an organization that utilizes several workflows such as create, edit, review, and publish and would like to restrict access to those workflows based on roles. A full breakdown of how this is accomplished can be located at Drupal’s Editorial Workflow page (Lakatos & Hodgdon, 2019).
  5. Supports self and mediated deposit. This process is identical to the previous comments on mediated workflows with the only difference being that for a self published deposit, the user should be granted the corresponding roles needed to accomplish the task. Please see Step by Step Setup of Islandora 8 Scholar for more information on how to set it up.
    Islandora 8 Scholar Screenshots
  6. Supports uploading any binary file type. Islandora 8 treats all repository items as a binary content type. Binary files are treated the same as any other content type, a model tag is applied to an object/node indicating to Drupal which context(s) to use resulting in corresponding actions to be fired.